Re: table inheritance versus column compression and storage settings

2024-02-06 Thread Ashutosh Bapat
On Wed, Jan 31, 2024 at 1:29 PM Ashutosh Bapat
 wrote:
>
> On Tue, Dec 5, 2023 at 6:28 PM Peter Eisentraut  wrote:
> >
> > On 05.12.23 05:26, Ashutosh Bapat wrote:
> > >> - When inheriting from multiple parents with different settings, an
> > >> explicit setting in the child is required.
> > > When no explicit setting for child is specified, it will throw an
> > > error as it does today. Right?
> >
> > Yes, it would throw an error, but a different error than today, saying
> > something like "the settings in the parents conflict, so you need to
> > specify one here to override the conflict".
> >
>
> PFA patch fixing inheritance and compression. It also fixes a crash
> reported in [1].
>
> The storage looks more involved. The way it has been coded, the child
> always inherits the parent's storage properties.
> #create table t1 (a text storage plain);
> CREATE TABLE
> #create table c1 (b text storage main) inherits(t1);
> CREATE TABLE
> #create table c1 (a text storage main) inherits(t1);
> NOTICE:  merging column "a" with inherited definition
> CREATE TABLE
> #\d+ t1
>   Table "public.t1"
>  Column | Type | Collation | Nullable | Default | Storage |
> Compression | Stats target | Description
> +--+---+--+-+-+-+--+-
>  a  | text |   |  | | plain   |
>  |  |
> Child tables: c1
> Access method: heap
> #\d+ c1
>   Table "public.c1"
>  Column | Type | Collation | Nullable | Default | Storage |
> Compression | Stats target | Description
> +--+---+--+-+-+-+--+-
>  a  | text |   |  | | plain   |
>  |  |
> Inherits: t1
> Access method: heap
>
> Observe that c1.a did not have storage "main" but instead inherits
> "plain" from t1.
>
> According to the code at
> https://github.com/postgres/postgres/blob/6a1ea02c491d16474a6214603dce40b5b122d4d1/src/backend/commands/tablecmds.c#L3253,
> there is supposed to be a conflict error. But that does not happen
> since child's storage specification is in ColumnDef::storage_name
> which is never consulted. The ColumnDef::storage_name is converted to
> ColumnDef::storage only in BuildDescForRelation(), after
> MergeAttribute() has been finished. There are two ways to fix this
> 1. In MergeChildAttribute() resolve ColumnDef::storage_name to
> ColumnDef::storage before comparing it against inherited property. I
> don't like this approach since a. we duplicate the conversion logic in
> MergeChildAttribute() and BuildDescForRelation(), b. the conversion
> happens only for the attributes which are inherited.
>
> 2. Deal with it the same way as compression. Get rid of
> ColumnDef::storage altogether. Instead set ColumnDef::storage_name at
> https://github.com/postgres/postgres/blob/6a1ea02c491d16474a6214603dce40b5b122d4d1/src/backend/commands/tablecmds.c#L2723.
>
> I am inclined to take the second approach. Let me know if you feel otherwise.

Took the second approach. PFA patches
0001 fixes compression inheritance
0002 fixes storage inheritance

The patches may be committed separately or as a single patch. Keeping
them separate in case we decide to commit one but not the other.

We always set storage even if it's not specified, in which case the
column type's default storage is used. This is slightly different from
compression which defaults to the GUC's value if not set. This has led
to slight difference in the tests.

-- 
Best Wishes,
Ashutosh Bapat
From 4d7ba689782d2dfd142ec0f27095ee386543f761 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Thu, 1 Feb 2024 12:58:36 +0530
Subject: [PATCH 2/2] Column storage and inheritance

The storage method specified while creating a child table overrides any
parent storage method, even if parents have conflicting storage methods.
If the child doesn't specify a storage method and all its parents use
the same storage method, the child inherits parents' storage method. We
don't allow a child without storage specification when its parents use
conflicting storage methods.

The storage property remains unchanged in a child inheriting from
parent/s after its creation i.e. using ALTER TABLE ... INHERIT.

NOTEs to reviewer (may be included in the final commit message.)

Before this commit the specified storage method could be stored in
ColumnDef::storage (CREATE TABLE ... LIKE) or ColumnDef::storage_name
(CREATE TABLE ...). This caused the MergeChildAttribute() and
MergeInheritedAttribute() to ignore storage method specified in the
child definition since it looked only at ColumnDef::storage. This commit
removes ColumnDef::storage and instead uses ColumnDef::storage_name to
save any storage method specification. This is similar to how
compression method specification is handled.

Before this commit the error detail would mention the first 

Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-02-06 Thread Andrey M. Borodin



> On 7 Feb 2024, at 10:58, Dilip Kumar  wrote:
> 
>> commit_timestamp_slru_buffers
>> transaction_slru_buffers
>> etc
> 
> I am not sure we are exposing anything related to SLRU to the user, 

I think we already tell something about SLRU to the user. I’d rather consider 
if “transaction_slru_buffers" is easier to understand than 
“transaction_buffers” ..
IMO transaction_buffers is clearer. But I do not have strong opinion.

> I
> mean transaction_buffers should make sense for the user that it stores
> transaction-related data in some buffers pool but whether that buffer
> pool is called SLRU or not doesn't matter much to the user IMHO.
+1


Best regards, Andrey Borodin.



Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-02-06 Thread Dilip Kumar
On Tue, Feb 6, 2024 at 8:55 PM Alvaro Herrera  wrote:
>
> Here's the rest of it rebased on top of current master.  I think it
> makes sense to have this as one individual commit.
>
> I made CLOGShmemBuffers, CommitTsShmemBuffers and SUBTRANSShmemBuffers
> compute a number that's multiple of SLRU_BANK_SIZE.  But it's a crock,
> because we don't have that macro at that point, so I just used constant
> 16.  Obviously need a better solution for this.

If we define SLRU_BANK_SIZE in slur.h then we can use it here right,
because these files are anyway include slur.h so.

>
> I also changed the location of bank_mask in SlruCtlData for better
> packing, as advised by pahole; and renamed SLRU_SLOTNO_GET_BANKLOCKNO()
> to SlotGetBankNumber().

Okay.

> Some very critical comments still need to be updated to the new design,
> particularly anything that mentions "control lock"; but also the overall
> model needs to be explained in some central location, rather than
> incongruently some pieces here and other pieces there.  I'll see about
> this later.  But at least this is code you should be able to play with.

Okay, I will review and test this

> I've been wondering whether we should add a "slru" to the name of the
> GUCs:
>
> commit_timestamp_slru_buffers
> transaction_slru_buffers
> etc

I am not sure we are exposing anything related to SLRU to the user, I
mean transaction_buffers should make sense for the user that it stores
transaction-related data in some buffers pool but whether that buffer
pool is called SLRU or not doesn't matter much to the user IMHO.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




RE: speed up a logical replica setup

2024-02-06 Thread Hayato Kuroda (Fujitsu)
Dear Euler,

Sorry for posting e-mail each other. I will read your patch
but I want to reply one of yours.

>
I started reviewing v16-0007 but didn't finish yet. The general idea is ok.
However, I'm still worried about preventing some use cases if it provides only
the local connection option. What if you want to keep monitoring this instance
while the transformation is happening? Let's say it has a backlog that will
take some time to apply. Unless, you have a local agent, you have no data about
this server until pg_createsubscriber terminates. Even the local agent might
not connect to the server unless you use the current port.
>

(IIUC, 0007 could not overwrite a port number - refactoring is needed)

Ah, actually I did not have such a point of view. Assuming that changed port 
number
can avoid connection establishments, there are four options:
a) Does not overwrite port and listen_addresses. This allows us to monitor by
   external agents, but someone can modify GUCs and data during operations.
b) Overwrite port but do not do listen_addresses. Not sure it is useful... 
c) Overwrite listen_addresses but do not do port. This allows us to monitor by
   local agents, and we can partially protect the database. But there is still 
a 
   room.
d) Overwrite both port and listen_addresses. This can protect databases 
perfectly
but no one can monitor.

Hmm, which one should be chosen? I prefer c) or d).
Do you know how pglogical_create_subscriber does?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/ 


Re: Synchronizing slots from primary to standby

2024-02-06 Thread shveta malik
On Tue, Feb 6, 2024 at 7:21 PM Zhijie Hou (Fujitsu)
 wrote:
>
> > ---
> > +/* Slot sync worker objects */
> > +extern PGDLLIMPORT char *PrimaryConnInfo; extern PGDLLIMPORT char
> > +*PrimarySlotName;
> >
> > These two variables are declared also in xlogrecovery.h. Is it intentional? 
> > If so, I
> > think it's better to write comments.
>
> Will address.

Added comments in v79_2.

>
> >
> > ---
> > +   SELECT r.srsubid AS subid, CONCAT('pg_' || srsubid ||
> > '_sync_' || srrelid || '_' || ctl.system_identifier) AS slotname
> >
> > and
> >
> > +   SELECT (CASE WHEN r.srsubstate = 'f' THEN
> > pg_replication_origin_progress(CONCAT('pg_' || r.srsubid || '_' || 
> > r.srrelid), false)
> >
> > If we use CONCAT function, we can replace '||' with ','.
> >

Modified in v79_2.

thanks
Shveta




Re: Why is subscription/t/031_column_list.pl failing so much?

2024-02-06 Thread vignesh C
On Tue, 6 Feb 2024 at 08:30, Alexander Lakhin  wrote:
>
> Hello Amit,
>
> 05.02.2024 15:20, Amit Kapila wrote:
> > If this can be reproduced frequently then we can even try to test the
> > patch in that thread by Osumi-San [1] (I haven't tested that it
> > applies cleanly but shouldn't be difficult to make it work) based on
> > the theory that walsender is starting up at an LSN somewhere before
> > where the publication is created.
> >
> > [1] - 
> > https://www.postgresql.org/message-id/TYCPR01MB83737A68CD5D554EA82BD7B9EDD39%40TYCPR01MB8373.jpnprd01.prod.outlook.com
> >
>
> Yes, with the aforementioned modification of bgwriter.c and when running
> 20 tests in parallel, I got failures on iterations 20. 3, 21 ..., but with the
> updated Osumi-San's patch (which adds wait_for_catchup('sub1') before every
> ALTER SUBSCRIPTION sub1 SET PUBLICATION ...) applied, 300 iterations ran
> with no failures.

I was able to reproduce the issue with the patch changes suggested in
bgwriter, but for me it failed in the 287th run.
Then I had run the test 1000 times with the test changes shared at [1]
and the test had passed all the 1000 times successfully.

I have measured the test execution with average of 10 runs and found
that it takes about 1.2 seconds more time to execute with the changes:
Without patch: 8.454 seconds
With test change patch:   9.672 seconds

For the test execution comparison I had used a machine which has total
Memory of 755.536 GB, 120 CPUs and RHEL 7 Operating System.

[1] - 
https://www.postgresql.org/message-id/e6ce3cf7-4025-f129-e3ac-0f778469f720%40gmail.com

Regards,
Vignesh




RE: speed up a logical replica setup

2024-02-06 Thread Hayato Kuroda (Fujitsu)
Dear Shlok,

Thanks for updating the patch!

> I have created a topup patch 0007 on top of v15-0006.
> 
> I revived the patch which removes -S option and adds some options
> instead. The patch add option for --port, --username and --socketdir.
> This patch also ensures that anyone cannot connect to the standby
> during the pg_createsubscriber, by setting listen_addresses,
> unix_socket_permissions, and unix_socket_directories.

IIUC, there are two reasons why removing -S may be good:

* force users to specify a local-connection, and
* avoid connection establishment on standby during the pg_createsubscriber.

First bullet is still valid, but we should describe that like pg_upgrade: 

>
pg_upgrade will connect to the old and new servers several times, so you might
want to set authentication to peer in pg_hba.conf or use a ~/.pgpass file
(see Section 33.16).
>

Regarding the second bullet, this patch cannot ensure it. pg_createsubscriber
just accepts port number which has been already accepted by the standby, it does
not change the port number. So any local applications on the standby server can
connect to the server and may change primary_conninfo, primary_slot_name, data, 
etc.
So...what should be? How do other think?

Beside, 0007 does not follow the recent changes on 0001. E.g., options should be
record in CreateSubscriberOptions. Also, should we check the privilege of socket
directory?

[1]: 
https://www.postgresql.org/message-id/TY3PR01MB988902B992A4F2E99E1385EDF56F2%40TY3PR01MB9889.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



Re: RFC: Logging plan of the running query

2024-02-06 Thread Ashutosh Bapat
On Wed, Feb 7, 2024 at 9:38 AM torikoshia  wrote:
>
> Hi Ashutosh,
>
> On 2024-02-06 19:51, Ashutosh Bapat wrote:
>
> > Thanks for the summary. It is helpful. I think patch is also getting
> > better.
> >
> > I have a few questions and suggestions
>
> Thanks for your comments.
>
> > 1. Prologue of GetLockMethodLocalHash() mentions
> >  * NOTE: When there are many entries in LockMethodLocalHash, calling
> > this
> >  * function and looking into all of them can lead to performance
> > problems.
> >  */
> > How bad this performance could be. Let's assume that a query is taking
> > time and pg_log_query_plan() is invoked to examine the plan of this
> > query. Is it possible that the looping over all the locks itself takes
> > a lot of time delaying the query execution further?
>
> I think it depends on the number of local locks, but I've measured cpu
> time for this page lock check by adding below codes and
> v27-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch[1], which
> calls ProcessLogQueryPlanInterrupt() in every CFI on my laptop just for
> your information:
>
>diff --git a/src/backend/commands/explain.c
> b/src/backend/commands/explain.c
>index 5f7d77d567..65b7cb4925 100644
>--- a/src/backend/commands/explain.c
>+++ b/src/backend/commands/explain.c
>@@ -44,6 +44,8 @@
>
>+#include "time.h"
>...
>@@ -5287,6 +5292,7 @@ ProcessLogQueryPlanInterrupt(void)
> * we check all the LocalLock entries and when finding even
> one, give up
> * logging the plan.
> */
>+   start = clock();
>hash_seq_init(, GetLockMethodLocalHash());
>while ((locallock = (LOCALLOCK *) hash_seq_search()) !=
> NULL)
>{
>if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE)
>{
>ereport(LOG_SERVER_ONLY,
>errmsg("ignored request for logging query plan due
> to page lock conflicts"),
>errdetail("You can try again in a moment."));
>hash_seq_term();
>
>ProcessLogQueryPlanInterruptActive = false;
>return;
>}
>}
>+   end = clock();
>+   cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
>+
>+   ereport(LOG,
>+   errmsg("all locallock entry search took: %f",
> cpu_time_used));
>+
>
> There were about 3 million log lines which recorded the cpu time, and
> the duration was quite short:
>
>=# -- Extracted cpu_time_used from log and loaded it to cpu_time.d.
>=# select max(d), min(d), avg(d) from cpu_time ;
>   max| min |  avg
>--+-+---
> 0.000116 |   0 | 4.706274625332238e-07
>
> I'm not certain that this is valid for actual use cases, but these
> results seem to suggest that it will not take that long.

What load did you run? I don't think any query in make check would
take say thousands of locks. The prologue refers to a very populated
lock hash table. I think that will happen if thousands of tables are
queried in a single query OR a query runs on a partitioned table with
thousands of partitions. May be we want to try that scenario.

>
>
> > 2. What happens if auto_explain is enabled in the backend and
> > pg_log_query_plan() is called on the same backend? Will they conflict?
> > I think we should add a test for the same.
>
> Hmm, I think they don't conflict since they just refer QueryDesc and
> don't modify it and don't use same objects for locking.
> (I imagine 'conflict' here is something like 'hard conflict' in
> replication[2].)

By conflict, I mean the two features behave weird when used together
e.g give wrong results or crash etc.

>
> Actually using both auto_explain and pg_log_query_plan() output each
> logs separately:
>
>(pid:62835)=# select pg_sleep(10);
>(pid:7)=# select pg_log_query_plan(62835);
>
>(pid:7)=# \! cat data/log/postgres.log
>...
>2024-02-06 21:44:17.837 JST [62835:4:0] LOG:  0: query plan
> running on backend with PID 62835 is:
>  Query Text: select pg_sleep(10);
>  Result  (cost=0.00..0.01 rows=1 width=4)
>Output: pg_sleep('10'::double precision)
>  Query Identifier: 3506829283127886044
>2024-02-06 21:44:17.837 JST [62835:5:0] LOCATION:
> ProcessLogQueryPlanInterrupt, explain.c:5336
>2024-02-06 21:44:26.974 JST [62835:6:0] LOG:  0: duration:
> 1.868 ms  plan:
>  Query Text: select pg_sleep(10);
>  Result  (cost=0.00..0.01 rows=1 width=4) (actual
> time=1.802..1.804 rows=1 loops=1)
>
> > Using injection point support we should be able to add tests for
> > testing pg_log_query_plan behaviour when there are page locks held or
> > when auto_explain (with instrumentation) and pg_log_query_plan() work
> > on the same query plan. Use injection point to make the backend
> > running query wait at a suitable 

Re: speed up a logical replica setup

2024-02-06 Thread Euler Taveira
On Fri, Feb 2, 2024, at 6:41 AM, Hayato Kuroda (Fujitsu) wrote:
> Thanks for updating the patch!

Thanks for taking a look.

> >
> I'm still working on the data structures to group options. I don't like the 
> way
> it was grouped in v13-0005. There is too many levels to reach database name.
> The setup_subscriber() function requires the 3 data structures.
> >
> 
> Right, your refactoring looks fewer stack. So I pause to revise my refactoring
> patch.

I didn't complete this task yet so I didn't include it in this patch.

> >
> The documentation update is almost there. I will include the modifications in
> the next patch.
> >
> 
> OK. I think it should be modified before native speakers will attend to the
> thread.

Same for this one.

> >
> Regarding v13-0004, it seems a good UI that's why I wrote a comment about it.
> However, it comes with a restriction that requires a similar HBA rule for both
> regular and replication connections. Is it an acceptable restriction? We might
> paint ourselves into the corner. A reasonable proposal is not to remove this
> option. Instead, it should be optional. If it is not provided, 
> primary_conninfo
> is used.
> >
> 
> I didn't have such a point of view. However, it is not related whether -P 
> exists
> or not. Even v14-0001 requires primary to accept both normal/replication 
> connections.
> If we want to avoid it, the connection from pg_createsubscriber can be 
> restored
> to replication-connection.
> (I felt we do not have to use replication protocol even if we change the 
> connection mode)

That's correct. We made a decision to use non physical replication connections
(besides the one used to connect primary <-> standby). Hence, my concern about
a HBA rule falls apart. I'm not convinced that using a modified
primary_conninfo is the only/right answer to fill the subscription connection
string. Physical vs logical replication has different requirements when we talk
about users. The physical replication requires only the REPLICATION privilege.
On the other hand, to create a subscription you must have the privileges of
pg_create_subscription role and also CREATE privilege on the specified
database. Unless, you are always recommending the superuser for this tool, I'm
afraid there will be cases that reusing primary_conninfo will be an issue. The
more I think about this UI, the more I think that, if it is not hundreds of
lines of code, it uses the primary_conninfo the -P is not specified.

> The motivation why -P is not needed is to ensure the consistency of nodes.
> pg_createsubscriber assumes that the -P option can connect to the upstream 
> node,
> but no one checks it. Parsing two connection strings may be a solution but be
> confusing. E.g., what if some options are different?
> I think using a same parameter is a simplest solution.

Ugh. An error will occur the first time (get_primary_sysid) it tries to connect
to primary.

> I found that no one refers the name of temporary slot. Can we remove the 
> variable?

It is gone. I did a refactor in the create_logical_replication_slot function.
Slot name is assigned internally (no need for slot_name or temp_replslot) and
temporary parameter is included.

> Initialization by `CreateSubscriberOptions opt = {0};` seems enough.
> All values are set to 0x0.

It is. However, I keep the assignments for 2 reasons: (a) there might be
parameters whose default value is not zero, (b) the standard does not say that
a null pointer must be represented by zero and (c) there is no harm in being
paranoid during initial assignment.

> You said "target server must be a standby" in [1], but I cannot find checks 
> for it.
> IIUC, there are two approaches:
> a) check the existence "standby.signal" in the data directory
> b) call an SQL function "pg_is_in_recovery"

I applied v16-0004 that implements option (b).

> I still think they can be combined as "bindir".

I applied a patch that has a single variable for BINDIR.

> WriteRecoveryConfig() writes GUC parameters to postgresql.auto.conf, but not
> sure it is good. These settings would remain on new subscriber even after the
> pg_createsubscriber. Can we avoid it? I come up with passing these parameters
> via pg_ctl -o option, but it requires parsing output from 
> GenerateRecoveryConfig()
> (all GUCs must be allign like "-c XXX -c XXX -c XXX...").

I applied a modified version of v16-0006.

> Functions arguments should not be struct because they are passing by value.
> They should be a pointer. Or, for modify_subscriber_sysid and 
> wait_for_end_recovery,
> we can pass a value which would be really used.

Done.

> 07.
> ```
> static char *get_base_conninfo(char *conninfo, char *dbname,
>const char *noderole);
> ```
> 
> Not sure noderole should be passed here. It is used only for the logging.
> Can we output string before calling the function?
> (The parameter is not needed anymore if -P is removed)

Done.

> 08.
> The terminology is still not consistent. Some functions call the target 

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-06 Thread Michael Paquier
On Tue, Feb 06, 2024 at 03:33:36PM -0800, Andres Freund wrote:
> Well, you can't just do that, because there's only one caller, namely
> CopyToTextOneRow(). What I am trying to suggest is something like the
> attached, just a quick hacky POC. Namely to split out CSV support from
> CopyToTextOneRow() by introducing CopyToCSVOneRow(), and to avoid code
> duplication by moving the code into a new CopyToTextLikeOneRow().

Ah, OK.  Got it now.

> I named it CopyToTextLike* here, because it seems confusing that some Text*
> are used for both CSV and text and others are actually just for text. But if
> were to go for that, we should go further.

This can always be argued later.

> To test the performnce effects I chose to remove the pointless encoding
> "check" we're discussing in the other thread, as it makes it harder to see the
> time differences due to the per-attribute code.  I did three runs of pgbench
> -t of [1] and chose the fastest result for each.
> 
> With turbo mode and power saving disabled:
>   Avg Time
> HEAD   995.349
> Remove Encoding Check  870.793
> v13-0001   869.678
> Remove out callback839.508

Hmm.  That explains why I was not seeing any differences with this
callback then.  It seems to me that the order of actions to take is
clear, like:
- Revert 2889fd23be56 to keep a clean state of the tree, now done with
1aa8324b81fa.
- Dive into the strlen() issue, as it really looks like this can
create more simplifications for the patch discussed on this thread
with COPY TO.
- Revisit what we have here, looking at more profiles to see how HEAD
an v13 compare.  It looks like we are on a good path, but let's tackle
things one step at a time.
--
Michael


signature.asc
Description: PGP signature


RE: Psql meta-command conninfo+

2024-02-06 Thread Maiquel Grassi
  On Tue, Feb 06, 2024 at 09:45:54PM +, Maiquel Grassi wrote:
  > My initial idea has always been that they should continue to appear
  > because \conninfo+ should show all the things that \conninfo shows and
  > add more information. I think that's the purpose of the 'plus.' Now we're
  > on a better path than the initial one. We can still add the socket
  > directory and the host.

  Agreed.

--//--

I believe it's resolved reasonably well this way:

SELECT
  pg_catalog.current_database() AS "Database",
  current_user AS "User",
  pg_catalog.current_setting('server_version') AS "Server Version",
  CASE
WHEN pg_catalog.inet_server_addr() IS NULL
THEN 'NULL'
ELSE pg_catalog.inet_server_addr()::text
  END AS "Server Address",
  pg_catalog.current_setting('port') AS "Port",
  CASE
WHEN pg_catalog.inet_client_addr() IS NULL
THEN 'NULL'
ELSE pg_catalog.inet_client_addr()::text
  END AS "Client Address",
  CASE
WHEN pg_catalog.inet_client_port() IS NULL
THEN 'NULL'
ELSE pg_catalog.inet_client_port()::text
  END AS "Client Port",
  pg_catalog.pg_backend_pid() AS "Session PID",
  CASE
WHEN pg_catalog.current_setting('unix_socket_directories') = ''
THEN 'NULL'
ELSE pg_catalog.current_setting('unix_socket_directories')
  END AS "Socket Directory",
  CASE
WHEN
  pg_catalog.inet_server_addr() IS NULL
  AND pg_catalog.inet_client_addr() IS NULL
THEN 'NULL'
WHEN
  pg_catalog.inet_server_addr() = pg_catalog.inet_client_addr()
THEN 'localhost'
ELSE pg_catalog.inet_server_addr()::text
  END AS "Host";

See below the tests:

[postgres@localhost bin]$ ./psql
psql (17devel)
Type "help" for help.

postgres=# \conninfo
You are connected to database "postgres" as user "postgres" via socket in 
"/tmp" at port "5432".
postgres=# \conninfo+
   Current Connection 
Information
 Database |   User   | Server Version | Server Address | Port | Client Address 
| Client Port | Session PID | Socket Directory | Host
--+--+++--++-+-+--+--
 postgres | postgres | 17devel| NULL   | 5432 | NULL   
| NULL|   14348 | /tmp | NULL
(1 row)


[postgres@localhost bin]$ ./psql -h localhost
psql (17devel)
Type "help" for help.

postgres=# \conninfo
You are connected to database "postgres" as user "postgres" on host "localhost" 
(address "::1") at port "5432".
postgres=# \conninfo+
  Current Connection 
Information
 Database |   User   | Server Version | Server Address | Port | Client Address 
| Client Port | Session PID | Socket Directory |   Host
--+--+++--++-+-+--+---
 postgres | postgres | 17devel| ::1/128| 5432 | ::1/128
| 46988   |   14353 | /tmp | localhost
(1 row)


[postgres@localhost bin]$ ./psql -h ::1
psql (17devel)
Type "help" for help.

postgres=# \conninfo
You are connected to database "postgres" as user "postgres" on host "::1" at 
port "5432".
postgres=# \conninfo+
  Current Connection 
Information
 Database |   User   | Server Version | Server Address | Port | Client Address 
| Client Port | Session PID | Socket Directory |   Host
--+--+++--++-+-+--+---
 postgres | postgres | 17devel| ::1/128| 5432 | ::1/128
| 46990   |   14356 | /tmp | localhost
(1 row)


[postgres@localhost bin]$ ./psql -h 127.0.0.1
psql (17devel)
Type "help" for help.

postgres=# \conninfo
You are connected to database "postgres" as user "postgres" on host "127.0.0.1" 
at port "5432".
postgres=# \conninfo+
  Current Connection 
Information
 Database |   User   | Server Version | Server Address | Port | Client Address 
| Client Port | Session PID | Socket Directory |   Host
--+--+++--++-+-+--+---
 postgres | postgres | 17devel| 127.0.0.1/32   | 5432 | 127.0.0.1/32   
| 35042   |   14359 | /tmp | localhost
(1 row)


[postgres@localhost bin]$ ./psql -h 192.168.0.5 -p 5433 -U postgres -d postgres
psql (17devel, server 16.1)
Type "help" for help.

postgres=# \conninfo
You are connected to database "postgres" as user "postgres" on host 
"192.168.0.5" at port "5433".
postgres=# \conninfo+
  Current Connection 
Information
 Database |   User   | Server Version | Server Address | Port | Client Address 
| Client Port | Session PID 

Re: RFC: Logging plan of the running query

2024-02-06 Thread torikoshia

Hi Ashutosh,

On 2024-02-06 19:51, Ashutosh Bapat wrote:

Thanks for the summary. It is helpful. I think patch is also getting 
better.


I have a few questions and suggestions


Thanks for your comments.


1. Prologue of GetLockMethodLocalHash() mentions
 * NOTE: When there are many entries in LockMethodLocalHash, calling 
this
 * function and looking into all of them can lead to performance 
problems.

 */
How bad this performance could be. Let's assume that a query is taking
time and pg_log_query_plan() is invoked to examine the plan of this
query. Is it possible that the looping over all the locks itself takes
a lot of time delaying the query execution further?


I think it depends on the number of local locks, but I've measured cpu 
time for this page lock check by adding below codes and 
v27-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch[1], which 
calls ProcessLogQueryPlanInterrupt() in every CFI on my laptop just for 
your information:


  diff --git a/src/backend/commands/explain.c 
b/src/backend/commands/explain.c

  index 5f7d77d567..65b7cb4925 100644
  --- a/src/backend/commands/explain.c
  +++ b/src/backend/commands/explain.c
  @@ -44,6 +44,8 @@

  +#include "time.h"
  ...
  @@ -5287,6 +5292,7 @@ ProcessLogQueryPlanInterrupt(void)
   * we check all the LocalLock entries and when finding even 
one, give up

   * logging the plan.
   */
  +   start = clock();
  hash_seq_init(, GetLockMethodLocalHash());
  while ((locallock = (LOCALLOCK *) hash_seq_search()) != 
NULL)

  {
  if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE)
  {
  ereport(LOG_SERVER_ONLY,
  errmsg("ignored request for logging query plan due 
to page lock conflicts"),

  errdetail("You can try again in a moment."));
  hash_seq_term();

  ProcessLogQueryPlanInterruptActive = false;
  return;
  }
  }
  +   end = clock();
  +   cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
  +
  +   ereport(LOG,
  +   errmsg("all locallock entry search took: %f", 
cpu_time_used));

  +

There were about 3 million log lines which recorded the cpu time, and 
the duration was quite short:


  =# -- Extracted cpu_time_used from log and loaded it to cpu_time.d.
  =# select max(d), min(d), avg(d) from cpu_time ;
 max| min |  avg
  --+-+---
   0.000116 |   0 | 4.706274625332238e-07

I'm not certain that this is valid for actual use cases, but these 
results seem to suggest that it will not take that long.




2. What happens if auto_explain is enabled in the backend and
pg_log_query_plan() is called on the same backend? Will they conflict?
I think we should add a test for the same.


Hmm, I think they don't conflict since they just refer QueryDesc and 
don't modify it and don't use same objects for locking.
(I imagine 'conflict' here is something like 'hard conflict' in 
replication[2].)


Actually using both auto_explain and pg_log_query_plan() output each 
logs separately:


  (pid:62835)=# select pg_sleep(10);
  (pid:7)=# select pg_log_query_plan(62835);

  (pid:7)=# \! cat data/log/postgres.log
  ...
  2024-02-06 21:44:17.837 JST [62835:4:0] LOG:  0: query plan 
running on backend with PID 62835 is:

Query Text: select pg_sleep(10);
Result  (cost=0.00..0.01 rows=1 width=4)
  Output: pg_sleep('10'::double precision)
Query Identifier: 3506829283127886044
  2024-02-06 21:44:17.837 JST [62835:5:0] LOCATION:  
ProcessLogQueryPlanInterrupt, explain.c:5336
  2024-02-06 21:44:26.974 JST [62835:6:0] LOG:  0: duration: 
1.868 ms  plan:

Query Text: select pg_sleep(10);
Result  (cost=0.00..0.01 rows=1 width=4) (actual 
time=1.802..1.804 rows=1 loops=1)



Using injection point support we should be able to add tests for
testing pg_log_query_plan behaviour when there are page locks held or
when auto_explain (with instrumentation) and pg_log_query_plan() work
on the same query plan. Use injection point to make the backend
running query wait at a suitable point to delay its execution and fire
pg_log_query_plan() from other backend. May be the same test could
examine the server log file to see if the plan is indeed output to the
server log file.

Given that the feature will be used when the things have already gone
wrong, it should not make things more serious. So more testing and
especially automated would help.


Thanks for the advice, it seems a good idea.
I'm going to try to add tests using injection point.


[1] 
https://www.postgresql.org/message-id/CAAaqYe8LXVXQhYy3yT0QOHUymdM%3Duha0dJ0%3DBEPzVAx2nG1gsw%40mail.gmail.com
[2] 
https://www.postgresql.org/docs/devel/hot-standby.html#HOT-STANDBY-CONFLICT


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation




Re: Synchronizing slots from primary to standby

2024-02-06 Thread Peter Smith
Here are some review comments for v79-0001

==
Commit message

1.
The logical replication slots on the primary can be synchronized to the hot
standby by enabling the "failover" parameter during
pg_create_logical_replication_slot() or by enabling "failover" option of the
CREATE SUBSCRIPTION command and calling pg_sync_replication_slots() function
on the standby.

~

SUGGESTION
The logical replication slots on the primary can be synchronized to
the hot standby by enabling failover during slot creation (e.g. using
the "failover" parameter of pg_create_logical_replication_slot(), or
using the "failover" option of the CREATE SUBSCRIPTION command), and
then calling pg_sync_replication_slots() function on the standby.

==

2.
+   
+
+  If after executing the function, hot_standby_feedback is disabled on
+  the standby or the physical slot configured in primary_slot_name is
+  removed, then it is possible that the necessary rows of the
+  synchronized slot will be removed by the VACUUM process on
the primary
+  server, resulting in the synchronized slot becoming invalidated.
+
+   

2a.
/If after/If, after/

~

2b.
Use SGML  for the GUC names (hot_standby_feedback, and
primary_slot_name), and consider putting links for them as well.

==
src/sgml/logicaldecoding.sgml


3.
+   
+Replication Slot Synchronization
+
+ A logical replication slot on the primary can be synchronized to the hot
+ standby by enabling the failover option for the slot
+ and calling pg_sync_replication_slots
+ on the standby. The failover option of the slot
+ can be enabled either by enabling
+ 
+ failover
+ option during subscription creation or by providing
failover
+ parameter during
+ 
+ pg_create_logical_replication_slot.

IMO it will be better to slightly reword this (like was suggested for
the Commit Message). I felt it is also better to refer/link to "CREATE
SUBSCRIPTION" instead of saying "during subscription creation".

SUGGESTION
The logical replication slots on the primary can be synchronized to
the hot standby by enabling failover during slot creation (e.g. using
the "failover" parameter of pg_create_logical_replication_slot, or
using the "failover" option of the CREATE SUBSCRIPTION command), and
then calling pg_sync_replication_slots() function on the standby.

~~~

4.
+ There are chances that the old primary is up again during the promotion
+ and if subscriptions are not disabled, the logical subscribers may keep
+ on receiving the data from the old primary server even after promotion
+ until the connection string is altered. This may result in the data
+ inconsistency issues and thus the logical subscribers may not be able
+ to continue the replication from the new primary server.
+

4a.
/There are chances/There is a chance/

/may keep on receiving the data/may continue to receive data/

~

4b.
BEFORE
This may result in the data inconsistency issues and thus the logical
subscribers may not be able to continue the replication from the new
primary server.

SUGGESTION
This might result in data inconsistency issues, preventing the logical
subscribers from being able to continue replication from the new
primary server.

~

4c.
I felt this whole part "There is a chance..." should be rendered as a
 or a  or something.

==
src/backend/replication/logical/slotsync.c

5.
+/*
+ * Return true if all necessary GUCs for slot synchronization are set
+ * appropriately, otherwise return false.
+ */
+bool
+ValidateSlotSyncParams(void)
+{
+ char*dbname;
+
+ /*
+ * A physical replication slot(primary_slot_name) is required on the
+ * primary to ensure that the rows needed by the standby are not removed
+ * after restarting, so that the synchronized slot on the standby will not
+ * be invalidated.
+ */
+ if (PrimarySlotName == NULL || *PrimarySlotName == '\0')
+ {
+ ereport(ERROR,
+ /* translator: %s is a GUC variable name */
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("\"%s\" must be defined.", "primary_slot_name"));
+ return false;
+ }
+
+ /*
+ * hot_standby_feedback must be enabled to cooperate with the physical
+ * replication slot, which allows informing the primary about the xmin and
+ * catalog_xmin values on the standby.
+ */
+ if (!hot_standby_feedback)
+ {
+ ereport(ERROR,
+ /* translator: %s is a GUC variable name */
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("\"%s\" must be enabled.", "hot_standby_feedback"));
+ return false;
+ }
+
+ /*
+ * Logical decoding requires wal_level >= logical and we currently only
+ * synchronize logical slots.
+ */
+ if (wal_level < WAL_LEVEL_LOGICAL)
+ {
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("\"wal_level\" must be >= 

Re: Allow passing extra options to initdb for tests

2024-02-06 Thread Ian Lawrence Barwick
2024年2月6日(火) 19:54 Peter Eisentraut :
>
> I'm proposing here a way to pass extra options to initdb when run
> internally during test setup in pg_regress or
> PostgreSQL::Test::Cluster's init (which covers just about all test
> suites other than initdb's own tests).
>
> For example:
>
>  make check PG_TEST_INITDB_EXTRA_OPTS='-k -c work_mem=50MB'
>
> We currently document at [0] a way to pass custom server settings to the
> tests via PGOPTIONS.  But this only works for pg_regress and only for
> options that can be changed at run time.  My proposal can set initdb
> options, and since initdb has the -c option now, it can set any GUC
> parameter as well.
>
> I think this can be useful for a wide variety of uses, like running all
> tests with checksums enabled, or with JIT enabled, or with different GUC
> settings, or with different locale settings.  (The existing pg_regress
> --no-locale option is essentially a special case of this, but it only
> provides one particular locale setting, not things like changing the
> default provider etc.)
>
> Of course, not all tests are going to pass with arbitrary options, but
> it is useful to run this against specific test suites.
>
> This patch also updates the documentation at [0] to explain the new
> method and distinguish it from the previously documented methods.

+1 for this, I recently ran into an issue with the regression tests for an
extension where it would have been very useful to provide some initdb
options.

Patch works as expected after a quick initial test.

Regards

Ian Barwick




Re: clarify equalTupleDescs()

2024-02-06 Thread jian he
On Tue, Feb 6, 2024 at 8:59 PM Peter Eisentraut  wrote:
>
> In a recent patch thread it was discussed[0] which fields should be
> compared by equalTupleDescs() and whether it is ok to remove a field
> from tuple descriptors and how that should affect their equality
> (attstattarget in that case).
>
> After analyzing all the callers, I have noticed that there are two
> classes of callers of equalTupleDescs():
>
> The first want to compare what I call row-type equality, which means
> they want to check specifically for equal number of attributes, and the
> same attribute names, types, and typmods for each attribute.  Most
> callers actually want that behavior.  The remaining callers just want to
> compare the tuple descriptors as they are, they don't care why the
> fields are in there, they just want to compare all of them.
>
> In the attached patch, I add a new function equalRowTypes() that is
> effectively a subset of equalTupleDescs() and switch most callers to that.
>
> The purpose of this patch is to make the semantics less uncertain.
> Questions like the one in [0] about attstattarget now have a clear
> answer for both functions.  I think this would be useful to have, as we
> are thinking about more changes in pg_attribute and tuple descriptors.
>
>
> [0]:
> https://www.postgresql.org/message-id/202401101316.k4s3fomwjx52@alvherre.pgsql

function name record_type_typmod_hash imply that
hashRowType should also hash atttypmod field?

also:

bool
equalRowTypes(TupleDesc tupdesc1, TupleDesc tupdesc2)
{
if (tupdesc1->natts != tupdesc2->natts)
return false;
if (tupdesc1->tdtypeid != tupdesc2->tdtypeid)
return false;

for (int i = 0; i < tupdesc1->natts; i++)
{
Form_pg_attribute attr1 = TupleDescAttr(tupdesc1, i);
Form_pg_attribute attr2 = TupleDescAttr(tupdesc2, i);

if (strcmp(NameStr(attr1->attname), NameStr(attr2->attname)) != 0)
return false;
if (attr1->atttypid != attr2->atttypid)
return false;
if (attr1->atttypmod != attr2->atttypmod)
return false;
}

return true;
}

/*
 * hashRowType
 *
 * If two tuple descriptors would be considered equal by equalRowTypes()
 * then their hash value will be equal according to this function.
 */
uint32
hashRowType(TupleDesc desc)
{
uint32 s;
int i;

s = hash_combine(0, hash_uint32(desc->natts));
s = hash_combine(s, hash_uint32(desc->tdtypeid));
for (i = 0; i < desc->natts; ++i)
s = hash_combine(s, hash_uint32(TupleDescAttr(desc, i)->atttypid));

return s;
}

from the hashRowType comment, should we also hash attname and atttypmod?




Re: common signal handler protection

2024-02-06 Thread Andres Freund
Hi,

On 2024-02-06 20:39:41 -0600, Nathan Bossart wrote:
> On Tue, Nov 21, 2023 at 03:20:08PM -0600, Nathan Bossart wrote:
> > * Overhead: The wrapper handler calls a function pointer and getpid(),
> >   which AFAICT is a real system call on most platforms.  That might not be
> >   a tremendous amount of overhead, but it's not zero, either.  I'm
> >   particularly worried about signal-heavy code like synchronous
> >   replication.  (Are there other areas that should be tested?)  If this is
> >   a concern, perhaps we could allow certain processes to opt out of this
> >   wrapper handler, provided we believe it is unlikely to fork or that the
> >   handler code is safe to run in grandchild processes.
> 
> I finally spent some time trying to measure this overhead.  Specifically, I
> sent many, many SIGUSR2 signals to postmaster, which just uses
> dummy_handler(), i.e., does nothing.  I was just barely able to get
> wrapper_handler() to show up in the first page of 'perf top' in this
> extreme case, which leads me to think that the overhead might not be a
> problem.

That's what I'd expect. Signal delivery is fairly heavyweight, getpid() is one
of the cheapest system calls (IIRC only beat by close() of an invalid FD on
recent-ish linux).  If it were to become an issue, we'd much better spend our
time reducing the millions of signals/sec that'd have to involve.

Greetings,

Andres Freund




Re: common signal handler protection

2024-02-06 Thread Nathan Bossart
On Tue, Nov 21, 2023 at 03:20:08PM -0600, Nathan Bossart wrote:
> * Overhead: The wrapper handler calls a function pointer and getpid(),
>   which AFAICT is a real system call on most platforms.  That might not be
>   a tremendous amount of overhead, but it's not zero, either.  I'm
>   particularly worried about signal-heavy code like synchronous
>   replication.  (Are there other areas that should be tested?)  If this is
>   a concern, perhaps we could allow certain processes to opt out of this
>   wrapper handler, provided we believe it is unlikely to fork or that the
>   handler code is safe to run in grandchild processes.

I finally spent some time trying to measure this overhead.  Specifically, I
sent many, many SIGUSR2 signals to postmaster, which just uses
dummy_handler(), i.e., does nothing.  I was just barely able to get
wrapper_handler() to show up in the first page of 'perf top' in this
extreme case, which leads me to think that the overhead might not be a
problem.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: pgbnech: allow to cancel queries during benchmark

2024-02-06 Thread Yugo NAGATA
On Wed, 24 Jan 2024 22:17:44 +0900
Yugo NAGATA  wrote:
 
> Attached is the updated patch, v6.

Currently, on non-Windows, SIGINT is received only by thread #0. 
CancelRequested is checked during the loop in the thread, and
queries are cancelled if it is set. However, once thread #0 exits
the loop due to some runtime error and starts waiting in pthread_join,
there is no opportunity to cancel queries run by other threads. 

In addition, if -C option is specified, connections are created for
each transaction, so cancel objects (PGcancel) also have to be
recreated at each time in each thread. However, these cancel objects
are used  in a specific thread to perform cancel for all queries,
which is not safe because a thread refers to objects updated by other
threads.

I think the first problem would be addressed by any of followings.

(1a) Perform cancels in the signal handler. The signal handler will be
called even while the thread 0 is blocked in pthread_join. This is safe
because PQcancel is callable from a signal handler.

(1b) Create an additional dedicated thread  that calls sigwait on SIGINT
and performs query cancel. As far as I studied, creating such dedicated
thread calling sigwait is a typical way to handle signal in multi-threaded
programming.

(1c) Each thread performs cancel for queries run by each own, rather than
that thread 0 cancels all queries. For the purpose, pthread_kill might be
used to interrupt threads blocked in wait_on_socket_set. 

The second one would be addressed by any of followings. 

(2a) Use critical section when accessing PGcancel( e.g by using
pthread_mutex (non-Windows) or EnterCriticalSection (Windows)). On
non-Windows, we cannot use this way when calling PQcancel in a signal
handler ((1a) above) because acquiring a lock is not re-entrant.

(2b) Perform query cancel in each thread that has created the connection
(same as (1c) above).

Considering both, possible combination would be either (1b)&(2a) or
(1c)&(2b). I would prefer the former way, because creating the
dedicated thread handling SIGINT signal and canceling all queries seems
simpler and safer than calling pthread_kill in the SIGINT signal handler
to send another signal to other threads. I'll update the patch in
this way soon.

Regards,
Yugo Nagata


> 
> > Best reagards,
> > --
> > Tatsuo Ishii
> > SRA OSS LLC
> > English: http://www.sraoss.co.jp/index_en/
> > Japanese:http://www.sraoss.co.jp
> > 
> > 
> 
> 
> -- 
> Yugo NAGATA 


-- 
Yugo NAGATA 




Re: Synchronizing slots from primary to standby

2024-02-06 Thread Masahiko Sawada
On Tue, Feb 6, 2024 at 8:21 PM Amit Kapila  wrote:
>
> On Tue, Feb 6, 2024 at 3:33 PM Amit Kapila  wrote:
> >
> > On Tue, Feb 6, 2024 at 1:09 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Tue, Feb 6, 2024 at 3:19 PM Amit Kapila  
> > > wrote:
> > > >
> > > > I think users can refer to LOGs to see if it has changed since the
> > > > first time it was configured. I tried by existing parameter and see
> > > > the following in LOG:
> > > > LOG:  received SIGHUP, reloading configuration files
> > > > 2024-02-06 11:38:59.069 IST [9240] LOG:  parameter "autovacuum" changed 
> > > > to "on"
> > > >
> > > > If the user can't confirm then it is better to follow the steps
> > > > mentioned in the patch. Do you want something else to be written in
> > > > docs for this? If so, what?
> > >
> > > IIUC even if a wrong slot name is specified to standby_slot_names or
> > > even standby_slot_names is empty, the standby server might not be
> > > lagging behind the subscribers depending on the timing. But when
> > > checking it the next time, the standby server might lag behind the
> > > subscribers. So what I wanted to know is how the user can confirm if a
> > > failover-enabled subscription is ensured not to go in front of
> > > failover-candidate standbys (i.e., standbys using the slots listed in
> > > standby_slot_names).
> > >
> >
> > But isn't the same explained by two steps ((a) Firstly, on the
> > subscriber node check the last replayed WAL. (b) Next, on the standby
> > server check that the last-received WAL location is ahead of the
> > replayed WAL location on the subscriber identified above.) in the
> > latest *_0004 patch.
> >
>
> Additionally, I would like to add that the users can use the queries
> mentioned in the doc after the primary has failed and before promoting
> the standby. If she wants to do that when both primary and standby are
> available, the value of 'standby_slot_names' on primary should be
> referred. Isn't those two sufficient that there won't be false
> positives?

>From a user perspective, I'd like to confirm the following two points :

1. replication slots used by subscribers are synchronized to the standby.
2. it's guaranteed that logical replication doesn't go ahead of
physical replication to the standby.

These checks are necessary at least when building a replication setup
(primary, standby, and subscriber). Otherwise, it's too late if we
find out that no standby is failover-ready when the primary fails and
we're about to do a failover.

As for the point 1 above, we can use the step 1 described in the doc.

As for point 2, the step 2 described in the doc could return true even
if standby_slot_names isn't working. For example, standby_slot_names
is empty, the user changed the standby_slot_names but forgot to reload
the config file, and the walsender doesn't reflect the
standby_slot_names update yet for some reason etc. It's possible that
standby's last-received WAL location just happens to be ahead of the
replayed WAL location on the subscriber. So even if the check query
returns true once, it could return false when we check it again, if
standby_slot_names is not working. On the other hand, IIUC if the
point 2 is ensured, the check query always returns true. I think it
would be good if we could provide a reliable way to check point 2
ideally via SQL queries (especially for tools).

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-06 Thread Andres Freund
Hi,

On 2024-02-06 15:11:05 +0900, Michael Paquier wrote:
> On Mon, Feb 05, 2024 at 09:46:42PM -0800, Andres Freund wrote:
> > No - what I mean is that it doesn't make sense to have copy_attribute_out(),
> > as e.g. CopyToTextOneRow() already knows that it's dealing with text, so it
> > can directly call the right function. That does require splitting a bit more
> > between csv and text output, but I think that can be done without much
> > duplication.
> 
> I am not sure to understand here.  In what is that different from
> reverting 2889fd23be56 then mark CopyAttributeOutCSV and
> CopyAttributeOutText as static inline?

Well, you can't just do that, because there's only one caller, namely
CopyToTextOneRow(). What I am trying to suggest is something like the
attached, just a quick hacky POC. Namely to split out CSV support from
CopyToTextOneRow() by introducing CopyToCSVOneRow(), and to avoid code
duplication by moving the code into a new CopyToTextLikeOneRow().

I named it CopyToTextLike* here, because it seems confusing that some Text*
are used for both CSV and text and others are actually just for text. But if
were to go for that, we should go further.


To test the performnce effects I chose to remove the pointless encoding
"check" we're discussing in the other thread, as it makes it harder to see the
time differences due to the per-attribute code.  I did three runs of pgbench
-t of [1] and chose the fastest result for each.


With turbo mode and power saving disabled:

  Avg Time
HEAD   995.349
Remove Encoding Check  870.793
v13-0001   869.678
Remove out callback839.508

Greetings,

Andres Freund

[1] COPY (SELECT 
1::int2,2::int2,3::int2,4::int2,5::int2,6::int2,7::int2,8::int2,9::int2,10::int2,11::int2,12::int2,13::int2,14::int2,15::int2,16::int2,17::int2,18::int2,19::int2,20::int2,
 generate_series(1, 100::int4)) TO '/dev/null';
>From 28be7510a62984eb09b41aabe8c345a07b2b0e97 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 6 Feb 2024 14:57:20 -0800
Subject: [PATCH v13a 1/3] WIP: COPY TO: remove unnecessary and ineffective
 encoding "checks"

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/commands/copyto.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 52b42f8a522..ea317b5b3d5 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -637,9 +637,10 @@ BeginCopyTo(ParseState *pstate,
 	 * are the same, we must apply pg_any_to_server() to validate data in
 	 * multibyte encodings.
 	 */
-	cstate->need_transcoding =
-		(cstate->file_encoding != GetDatabaseEncoding() ||
-		 pg_database_encoding_max_length() > 1);
+	cstate->need_transcoding = (
+		cstate->file_encoding != GetDatabaseEncoding()
+		 /* || pg_database_encoding_max_length() > 1 */
+		);
 	/* See Multibyte encoding comment above */
 	cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
 
-- 
2.38.0

>From 176442831183baa7648fee87ab4aa016d9c4d8e4 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 6 Feb 2024 08:40:17 +0900
Subject: [PATCH v13a 2/3] Extract COPY FROM/TO format implementations

This doesn't change the current behavior.  This just introduces a set of
copy routines called CopyFromRoutine and CopyToRoutine, which just has
function pointers of format implementation like TupleTableSlotOps, and
use it for existing "text", "csv" and "binary" format implementations.
There are plans to extend that more in the future with custom formats.

This improves performance when a relation has many attributes as this
eliminates format-based if branches.  The more the attributes, the
better the performance gain.  Blah add some numbers.
---
 src/include/commands/copyapi.h   | 100 ++
 src/include/commands/copyfrom_internal.h |  10 +
 src/backend/commands/copyfrom.c  | 209 ---
 src/backend/commands/copyfromparse.c | 362 ++-
 src/backend/commands/copyto.c| 438 +++
 src/tools/pgindent/typedefs.list |   2 +
 6 files changed, 770 insertions(+), 351 deletions(-)
 create mode 100644 src/include/commands/copyapi.h

diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
new file mode 100644
index 000..87e0629c2f7
--- /dev/null
+++ b/src/include/commands/copyapi.h
@@ -0,0 +1,100 @@
+/*-
+ *
+ * copyapi.h
+ *	  API for COPY TO/FROM handlers
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyapi.h
+ *
+ *-
+ */
+#ifndef COPYAPI_H
+#define COPYAPI_H
+
+#include "executor/tuptable.h"
+#include 

Re: Properly pathify the union planner

2024-02-06 Thread Andy Fan

Hi,

> * I think we should update truncate_useless_pathkeys() to account for
> the ordering requested by the query's set operation;

Nice catch.

> I'm thinking that maybe it'd be better to move the work of sorting the
> subquery's paths to the outer query level, specifically within the
> build_setop_child_paths() function, just before we stick SubqueryScanPath
> on top of the subquery's paths.  I think this is better because:
>
> 1. This minimizes the impact on subquery planning and reduces the
> footprint within the grouping_planner() function as much as possible.
>
> 2. This can help avoid the aforementioned add_path() issue because the
> two involved paths will be structured as:
>
> cheapest_path -> subqueryscan
> and
> cheapest_path -> sort -> subqueryscan
>
> If the two paths cost fuzzily the same and add_path() decides to keep
> the second one due to it having better pathkeys and pfree the first one,
> it would not be a problem.

This is a smart idea, it works because you create a two different
subqueryscan for the cheapest_input_path.

FWIW, I found we didn't create_sort_path during building a merge join
path, instead it just cost the sort and add it to the cost of mergejoin
path only and note this path needs a presorted data. At last during the
create_mergejoin_*plan*, it create the sort_plan really. As for the
mergeappend case, could we use the similar strategy? with this way, we
might simpliy the code to use MergeAppend node since the caller just
need to say I want to try MergeAppend with the given pathkeys without
really creating the sort by themselves. 

(Have a quick glance of initial_cost_mergejoin and
create_mergejoin_plan, looks incremental sort doesn't work with mergejoin?)

>
> To assist the discussion I've attached a diff file that includes all the
> changes above.

+ */
+static int
+pathkeys_useful_for_setop(PlannerInfo *root, List *pathkeys)
+{
+   int n_common_pathkeys;
+
+   if (root->setop_pathkeys == NIL)
+   return 0;   /* no special setop 
ordering requested */
+
+   if (pathkeys == NIL)
+   return 0;   /* unordered path */
+
+   (void) pathkeys_count_contained_in(root->setop_pathkeys, pathkeys,
+  
_common_pathkeys);
+
+   return n_common_pathkeys;
+}

The two if-clauses looks unnecessary, it should be handled by
pathkeys_count_contained_in already. The same issue exists in
pathkeys_useful_for_ordering as well. Attached patch fix it in master.

-- 
Best Regards
Andy Fan

>From f562dd8b80bd0b7c2d66ee9633434a1e5be82e04 Mon Sep 17 00:00:00 2001
From: "yizhi.fzh" 
Date: Wed, 7 Feb 2024 07:00:33 +0800
Subject: [PATCH v1 1/1] Remove the unnecessary checks for pathkey routines

both pathkeys_count_contained_in and foreach can handle the NIL input,
so no need the extra checks.
---
 src/backend/optimizer/path/pathkeys.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 82ff31273b..98334e5c52 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -2126,12 +2126,6 @@ pathkeys_useful_for_ordering(PlannerInfo *root, List *pathkeys)
 {
 	int			n_common_pathkeys;
 
-	if (root->query_pathkeys == NIL)
-		return 0;/* no special ordering requested */
-
-	if (pathkeys == NIL)
-		return 0;/* unordered path */
-
 	(void) pathkeys_count_contained_in(root->query_pathkeys, pathkeys,
 	   _common_pathkeys);
 
@@ -2167,10 +2161,6 @@ pathkeys_useful_for_grouping(PlannerInfo *root, List *pathkeys)
 	if (root->group_pathkeys == NIL)
 		return 0;
 
-	/* unordered path */
-	if (pathkeys == NIL)
-		return 0;
-
 	/* walk the pathkeys and search for matching group key */
 	foreach(key, pathkeys)
 	{
-- 
2.34.1



Re: confusing / inefficient "need_transcoding" handling in copy

2024-02-06 Thread Andres Freund
Hi,

On 2024-02-06 12:51:48 -0500, Tom Lane wrote:
> Michael Paquier  writes:
> > On Mon, Feb 05, 2024 at 06:05:04PM -0800, Andres Freund wrote:
> >> I haven't yet dug into the code history. One guess is that this should only
> >> have been set this way for COPY FROM.
>
> > Looking the git history, this looks like an oversight of c61a2f58418e
> > that has added the condition on pg_database_encoding_max_length(), no?
> > Adding Tom and Ishii-san, even if this comes from 2005.
>
> Yeah, back in 8.1 that code was applied for both directions, but
> probably it should have enforced validation for same-encoding
> cases only for COPY FROM.
>
> It looks like now we have a mess, because the condition was copied
> verbatim into copyto.c but not copyfrom.c.  Aren't we failing to
> validate encoding in this case in COPY FROM, which is where we
> actually need to?

I think the code is just very confusing - there actually *is* verification of
the encoding, it just happens at a different, earlier, layer, namely in
copyfromparse.c: CopyConvertBuf() which says:
/*
 * If the file and server encoding are the same, no encoding conversion 
is
 * required.  However, we still need to verify that the input is valid 
for
 * the encoding.
 */

And does indeed verify that.


One unfortunate issue: We don't have any tests verifying that COPY FROM
catches encoding issues.

Regards,

Andres




Re: Psql meta-command conninfo+

2024-02-06 Thread Nathan Bossart
On Tue, Feb 06, 2024 at 09:45:54PM +, Maiquel Grassi wrote:
> My initial idea has always been that they should continue to appear
> because \conninfo+ should show all the things that \conninfo shows and
> add more information. I think that's the purpose of the 'plus.' Now we're
> on a better path than the initial one. We can still add the socket
> directory and the host.

Agreed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




RE: Psql meta-command conninfo+

2024-02-06 Thread Maiquel Grassi
  On Tue, Feb 06, 2024 at 03:06:05PM -0600, Nathan Bossart wrote:
  > On Tue, Feb 06, 2024 at 08:52:09PM +, Maiquel Grassi wrote:
  >> I made the adjustment in the code and updated the patch. I believe this
  >> is the format suggested by you all. Would this be it?
  >
  > I was thinking something more like
  >
  >   SELECT pg_catalog.current_database() AS "Database",
  >  current_user AS "User",
  >  pg_catalog.current_setting('server_version') AS "Server Version",
  >  pg_catalog.inet_server_addr() AS "Server Address",
  >  pg_catalog.current_setting('port') AS "Port",
  >  pg_catalog.inet_client_addr() AS "Client Address",
  >  pg_catalog.inet_client_port() AS "Client Port",
  >  pg_catalog.pg_backend_pid() AS "Session PID";

  ... although that seems to be missing items like the socket directory and
  the host.

--//--

My initial idea has always been that they should continue to appear because 
\conninfo+ should show all the things that \conninfo shows and add more 
information. I think that's the purpose of the 'plus.' Now we're on a better 
path than the initial one. We can still add the socket directory and the host.

Regards,
Maiquel O. Grassi.


RE: Psql meta-command conninfo+

2024-02-06 Thread Maiquel Grassi
  On Tue, Feb 06, 2024 at 08:52:09PM +, Maiquel Grassi wrote:
  > I made the adjustment in the code and updated the patch. I believe this
  > is the format suggested by you all. Would this be it?

  I was thinking something more like

SELECT pg_catalog.current_database() AS "Database",
   current_user AS "User",
   pg_catalog.current_setting('server_version') AS "Server Version",
   pg_catalog.inet_server_addr() AS "Server Address",
   pg_catalog.current_setting('port') AS "Port",
   pg_catalog.inet_client_addr() AS "Client Address",
   pg_catalog.inet_client_port() AS "Client Port",
   pg_catalog.pg_backend_pid() AS "Session PID";

--//--

Good, I had misunderstood. I liked this adjustment. Now it truly aligns with 
the central idea of the other extended meta-commands.

[postgres@localhost bin]$ ./psql -h 192.168.0.220 -p 5433 -U postgres -d 
postgres
psql (17devel, server 16.1)
Type "help" for help.

postgres=# \conninfo+
  Current Connection Information
 Database |   User   | Server Version | Server Address | Port | Client Address 
| Client Port | Session PID
--+--+++--++-+-
 postgres | postgres | 16.1   | 192.168.0.220  | 5433 | 192.168.0.220  
|   57112 |   22120
(1 row)

postgres=# \q
[postgres@localhost bin]$ ./psql
psql (17devel)
Type "help" for help.

postgres=# \conninfo+
  Current Connection Information
 Database |   User   | Server Version | Server Address | Port | Client Address 
| Client Port | Session PID
--+--+++--++-+-
 postgres | postgres | 17devel|| 5432 |
| |   31430
(1 row)

postgres=# \q
[postgres@localhost bin]$ ./psql -h localhost
psql (17devel)
Type "help" for help.

postgres=# \conninfo+
  Current Connection Information
 Database |   User   | Server Version | Server Address | Port | Client Address 
| Client Port | Session PID
--+--+++--++-+-
 postgres | postgres | 17devel| ::1| 5432 | ::1
|   46918 |   31433
(1 row)

postgres=# \q
[postgres@localhost bin]$ ./psql -h 127.0.0.1
psql (17devel)
Type "help" for help.

postgres=# \conninfo+
  Current Connection Information
 Database |   User   | Server Version | Server Address | Port | Client Address 
| Client Port | Session PID
--+--+++--++-+-
 postgres | postgres | 17devel| 127.0.0.1  | 5432 | 127.0.0.1  
|   34970 |   31435
(1 row)

Regards,
Maiquel O. Grassi.


v3-0001-psql-meta-command-conninfo-plus.patch
Description: v3-0001-psql-meta-command-conninfo-plus.patch


Re: Where can I find the doxyfile?

2024-02-06 Thread Peter Eisentraut

On 02.02.24 01:19, John Morris wrote:
Here is the updated patch. It should fix the meson issue when no doxygen 
is present.


I think all the explanatory messages in doc/doxygen/meson.build are a 
bit much.  I think it's enough to just not define the target when the 
required conditions (dependencies, options) are not there.  Maybe 
something like docs_pdf can serve as an example.






Re: Where can I find the doxyfile?

2024-02-06 Thread Peter Eisentraut

On 05.02.24 18:29, John Morris wrote:
The purpose of the filter is to bring existing Postgres comments into 
the doxygen output. While I haven’t done a full survey, the majority of 
Postgres code has comments describing functions, globals, macros and 
structure fields.


Currently, those comments are thrown away. They do not appear in the 
existing Doxygen output.


Maybe this is something that can be tweaked on the doxygen side?

For example, clangd can also process doxygen-style comments.  But it can 
also process non-decorated comments, because it knows that the comment 
just before a declaration is probably the comment describing the thing. 
Maybe doxygen could have that functionality as well.






Re: Psql meta-command conninfo+

2024-02-06 Thread Nathan Bossart
On Tue, Feb 06, 2024 at 03:06:05PM -0600, Nathan Bossart wrote:
> On Tue, Feb 06, 2024 at 08:52:09PM +, Maiquel Grassi wrote:
>> I made the adjustment in the code and updated the patch. I believe this
>> is the format suggested by you all. Would this be it?
> 
> I was thinking something more like
> 
>   SELECT pg_catalog.current_database() AS "Database",
>  current_user AS "User",
>  pg_catalog.current_setting('server_version') AS "Server Version",
>  pg_catalog.inet_server_addr() AS "Server Address",
>  pg_catalog.current_setting('port') AS "Port",
>  pg_catalog.inet_client_addr() AS "Client Address",
>  pg_catalog.inet_client_port() AS "Client Port",
>  pg_catalog.pg_backend_pid() AS "Session PID";

... although that seems to be missing items like the socket directory and
the host.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Psql meta-command conninfo+

2024-02-06 Thread Nathan Bossart
On Tue, Feb 06, 2024 at 08:52:09PM +, Maiquel Grassi wrote:
> I made the adjustment in the code and updated the patch. I believe this
> is the format suggested by you all. Would this be it?

I was thinking something more like

  SELECT pg_catalog.current_database() AS "Database",
 current_user AS "User",
 pg_catalog.current_setting('server_version') AS "Server Version",
 pg_catalog.inet_server_addr() AS "Server Address",
 pg_catalog.current_setting('port') AS "Port",
 pg_catalog.inet_client_addr() AS "Client Address",
 pg_catalog.inet_client_port() AS "Client Port",
 pg_catalog.pg_backend_pid() AS "Session PID";

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: glibc qsort() vulnerability

2024-02-06 Thread Nathan Bossart
On Tue, Feb 06, 2024 at 03:55:58PM -0500, Tom Lane wrote:
> A comparison routine that is not is probably broken, agreed.
> I didn't look through the details of the patch --- I was more
> curious whether we had a version of the qsort bug, because
> if we do, we should fix that too.

+1

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: glibc qsort() vulnerability

2024-02-06 Thread Tom Lane
Nathan Bossart  writes:
> Even if the glibc issue doesn't apply to Postgres, I'm tempted to suggest
> that we make it project policy that comparison functions must be
> transitive.  There might be no real issues today, but if we write all
> comparison functions the way Mats is suggesting, it should be easier to
> reason about overflow risks.

A comparison routine that is not is probably broken, agreed.
I didn't look through the details of the patch --- I was more
curious whether we had a version of the qsort bug, because
if we do, we should fix that too.

regards, tom lane




Re: glibc qsort() vulnerability

2024-02-06 Thread Nathan Bossart
On Tue, Feb 06, 2024 at 10:11:16AM -0500, Tom Lane wrote:
> Mats Kindahl  writes:
>> There is a bug in glibc's qsort() algorithm that runs the risk of creating
>> an out-of-bounds error if the comparison function is not transitive, for
>> example, if subtraction is used so that it can create an overflow.
> 
> We don't use glibc's qsort.  Have you checked whether there's a
> problem with the code we do use?

Oh, interesting.  I didn't know that!  As of commit 6edd2b4, we've used an
in-tree qsort() for everything.  port.h has the following line:

#define qsort(a,b,c,d) pg_qsort(a,b,c,d)

I see a handful of callers that use pg_qsort() directly, which IMO is odd.
I would've expected that to be something different if I didn't know about
that macro.  Maybe we should change those to qsort()...

Even if the glibc issue doesn't apply to Postgres, I'm tempted to suggest
that we make it project policy that comparison functions must be
transitive.  There might be no real issues today, but if we write all
comparison functions the way Mats is suggesting, it should be easier to
reason about overflow risks.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




RE: Psql meta-command conninfo+

2024-02-06 Thread Maiquel Grassi
  On 2024-02-06 19:19 +0100, Nathan Bossart wrote:
  > On Tue, Feb 06, 2024 at 05:27:01PM +, Maiquel Grassi wrote:
  > > postgres=# \conninfo+
  > >  Current Connection Information
  > >Attribute| Value
  > > +
  > >  Database   | postgres
  > >  User   | postgres
  > >  Server Version | 16.1
  > >  Server Address | 192.168.0.5/32
  > >  Server Port| 5433
  > >  Client Address | 192.168.0.5/32
  > >  Client Port| 52716
  > >  Session PID| 21624
  > > (8 rows)
  >
  > My first reaction is that this should instead return a single row with the
  > same set of columns for any connection type (the not-applicable ones would
  > just be set to NULL).  That would match the other meta-commands like \l and
  > \du, and you could still get an expanded display with \x if needed.  Also,
  > I think it would simplify the code a bit.

  +1 for a single-row result and triggering expanded display with \x for
  consistency with other commands.

--//--

I made the adjustment in the code and updated the patch. I believe this is the 
format suggested by you all. Would this be it?

[postgres@localhost bin]$ ./psql -h 192.168.0.220 -p 5433 -U postgres -d 
postgres
psql (17devel, server 16.1)
Type "help" for help.

postgres=# \conninfo
You are connected to database "postgres" as user "postgres" on host 
"192.168.0.220" at port "5433".
postgres=# \conninfo+
  Current Connection Information
   Attribute|  Value
+--
 Database   | postgres
 User   | postgres
 Server Version | 16.1
 Server Address | 192.168.0.220/32
 Server Port| 5433
 Client Address | 192.168.0.220/32
 Client Port| 56606
 Session PID| 2424
(8 rows)

[postgres@localhost bin]$ ./psql
psql (17devel)
Type "help" for help.

postgres=# \conninfo
You are connected to database "postgres" as user "postgres" via socket in 
"/tmp" at port "5432".
postgres=# \conninfo+
Current Connection Information
   Attribute|  Value
+--
 Database   | postgres
 User   | postgres
 Server Version | 17devel
 Server Address |
 Server Port| 5432
 Client Address |
 Client Port|
 Session PID| 30216
(8 rows)

[postgres@localhost bin]$ ./psql -h localhost
psql (17devel)
Type "help" for help.

postgres=# \conninfo
You are connected to database "postgres" as user "postgres" on host "localhost" 
(address "::1") at port "5432".
postgres=# \conninfo+
Current Connection Information
   Attribute|  Value
+--
 Database   | postgres
 User   | postgres
 Server Version | 17devel
 Server Address | ::1/128
 Server Port| 5432
 Client Address | ::1/128
 Client Port| 46872
 Session PID| 30220
(8 rows)

[postgres@localhost bin]$ ./psql -h 127.0.0.1
psql (17devel)
Type "help" for help.

postgres=# \conninfo
You are connected to database "postgres" as user "postgres" on host "127.0.0.1" 
at port "5432".
postgres=# \conninfo+
Current Connection Information
   Attribute|Value
+--
 Database   | postgres
 User   | postgres
 Server Version | 17devel
 Server Address | 127.0.0.1/32
 Server Port| 5432
 Client Address | 127.0.0.1/32
 Client Port| 34924
 Session PID| 30223
(8 rows)

Regards,
Maiquel O. Grassi.


v2-0001-psql-meta-command-conninfo-plus.patch
Description: v2-0001-psql-meta-command-conninfo-plus.patch


Re: Remove Start* macros from postmaster.c to ease understanding of code

2024-02-06 Thread Nathan Bossart
On Wed, Feb 07, 2024 at 12:48:00AM +0530, Bharath Rupireddy wrote:
> +1. Patch LGTM.

Unless there are objections, I'll plan on committing this in the next day
or two.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Why is subscription/t/031_column_list.pl failing so much?

2024-02-06 Thread Tom Lane
I wrote:
> More to the point, aren't these proposals just band-aids that
> would stabilize the test without fixing the actual problem?
> The same thing is likely to happen to people in the field,
> unless we do something drastic like removing ALTER SUBSCRIPTION.

I've been able to make the 031_column_list.pl failure pretty
reproducible by adding a delay in walsender, as attached.

While I'm not too familiar with this code, it definitely does appear
that the new walsender is told to start up at an LSN before the
creation of the publication, and then if it needs to decide whether
to stream a particular data change before it's reached that creation,
kaboom!

I read and understood the upthread worries about it not being
a great idea to ignore publication lookup failures, but I really
don't see that we have much choice.  As an example, if a subscriber
is humming along reading publication pub1, and then someone
drops and then recreates pub1 on the publisher, I don't think that
the subscriber will be able to advance through that gap if there
are any operations within it that require deciding if they should
be streamed.  (That is, contrary to Amit's expectation that
DROP/CREATE would mask the problem, I suspect it will instead turn
it into a hard failure.  I've not experimented though.)

BTW, this same change breaks two other subscription tests:
015_stream.pl and 022_twophase_cascade.pl.
The symptoms are different (no "publication does not exist" errors),
so maybe these are just test problems not fundamental weaknesses.
But "replication falls over if the walsender is slow" isn't
something I'd call acceptable.

regards, tom lane

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 77c8baa32a..c897ef4c60 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2699,6 +2699,8 @@ WalSndLoop(WalSndSendDataCallback send_data)
 			!pq_is_send_pending())
 			break;
 
+		pg_usleep(1);
+
 		/*
 		 * If we don't have any pending data in the output buffer, try to send
 		 * some more.  If there is some, we don't bother to call send_data


Re: Remove Start* macros from postmaster.c to ease understanding of code

2024-02-06 Thread Bharath Rupireddy
On Tue, Feb 6, 2024 at 10:34 PM  wrote:
>
> Hi,
>
> Attached is a small patch implemented as I agree with Andres' comment
> below noted while reviewing the thread
> https://www.postgresql.org/message-id/flat/20240122210740.7vq5fd4woixpwx3f%40awork3.anarazel.de#6eb7595873392621d60e6b5a723941bc
>
> I agree that its easier to not have to refer back to the macros only to
> see that they're all invoking StartChildProcess(X). All invocations are
> within postmaster.c.
>
> > @@ -561,13 +561,13 @@ static void ShmemBackendArrayAdd(Backend *bn);
> >  static void ShmemBackendArrayRemove(Backend *bn);
> >  #endif   /* 
> > EXEC_BACKEND */
> >
> > -#define StartupDataBase()StartChildProcess(StartupProcess)
> > -#define StartArchiver()  
> > StartChildProcess(ArchiverProcess)
> > -#define StartBackgroundWriter() StartChildProcess(BgWriterProcess)
> > -#define StartCheckpointer()  StartChildProcess(CheckpointerProcess)
> > -#define StartWalWriter() StartChildProcess(WalWriterProcess)
> > -#define StartWalReceiver()   StartChildProcess(WalReceiverProcess)
> > -#define StartWalSummarizer() StartChildProcess(WalSummarizerProcess)
> > +#define StartupDataBase()StartChildProcess(B_STARTUP)
> > +#define StartArchiver()  StartChildProcess(B_ARCHIVER)
> > +#define StartBackgroundWriter() StartChildProcess(B_BG_WRITER)
> > +#define StartCheckpointer()  StartChildProcess(B_CHECKPOINTER)
> > +#define StartWalWriter() StartChildProcess(B_WAL_WRITER)
> > +#define StartWalReceiver()   StartChildProcess(B_WAL_RECEIVER)
> > +#define StartWalSummarizer() StartChildProcess(B_WAL_SUMMARIZER)
>
> Not for this commit, but we ought to rip out these macros - all they do is to 
> make it harder to understand the code.

+1. Patch LGTM.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Psql meta-command conninfo+

2024-02-06 Thread Erik Wienhold
On 2024-02-06 19:19 +0100, Nathan Bossart wrote:
> On Tue, Feb 06, 2024 at 05:27:01PM +, Maiquel Grassi wrote:
> > postgres=# \conninfo+
> >  Current Connection Information
> >Attribute| Value
> > +
> >  Database   | postgres
> >  User   | postgres
> >  Server Version | 16.1
> >  Server Address | 192.168.0.5/32
> >  Server Port| 5433
> >  Client Address | 192.168.0.5/32
> >  Client Port| 52716
> >  Session PID| 21624
> > (8 rows)
> 
> My first reaction is that this should instead return a single row with the
> same set of columns for any connection type (the not-applicable ones would
> just be set to NULL).  That would match the other meta-commands like \l and
> \du, and you could still get an expanded display with \x if needed.  Also,
> I think it would simplify the code a bit.

+1 for a single-row result and triggering expanded display with \x for
consistency with other commands.

-- 
Erik




Re: Set log_lock_waits=on by default

2024-02-06 Thread Laurenz Albe
On Tue, 2024-02-06 at 12:29 -0500, Tom Lane wrote:
> Nathan Bossart  writes:
> > It looks like there are two ideas:
> > [...]
> > * Set log_lock_waits on by default.  The folks on this thread seem to
> >   support this idea, but given the lively discussion for enabling
> >   log_checkpoints by default [0], I'm hesitant to commit something like
> >   this without further community discussion.
> 
> I was, and remain, of the opinion that that was a bad idea that
> we'll eventually revert, just like we previously got rid of most
> inessential log chatter in the default configuration.  So I doubt
> you'll have much trouble guessing my opinion of this one.  I think
> the default logging configuration should be chosen with the
> understanding that nobody ever looks at the logs of most
> installations, and we should be more worried about their disk space
> consumption than anything else.  Admittedly, log_lock_waits is less
> bad than log_checkpoints, because no such events will occur in a
> well-tuned configuration ... but still, it's going to be useless
> chatter in the average installation.

Unsurprisingly, I want to argue against that.

You say that it is less bad than "log_checkpoints = on", and I agree.
I can't remember seeing any complaints by users about
"log_checkpoints", and I predict that the complaints about
"log_lock_waits = on" will be about as loud.

I am all for avoiding useless chatter in the log.  In my personal
experience, that is usually "database typo does not exist" and
constraint violation errors.  I always recommend people to enable
"log_lock_waits", and so far I have not seen it spam the logs.

I agree that usually nobody ever looks into the log file.  The
time when people *do* look into the log file is when they encounter
trouble, and my stance is that the default configuration should be
such that the log contains clues as to what may be the problem.
If a statement sometimes takes unreasonably long, it is very
valuable corroborative information that the statement occasionally
waits more than a second for a lock.

Yours,
Laurenz Albe




Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-02-06 Thread Bharath Rupireddy
On Mon, Feb 5, 2024 at 3:15 PM Bertrand Drouvot
 wrote:
>
> Thanks for the patch and +1 for the idea, I think adding those new
> "invalidation reasons" make sense.

Thanks for looking at it.

> I think it's better to have the XID one being discussed/implemented before the
> inactive_timeout one: what about changing the 0002, 0003 and 0004 ordering?
>
> 0004 -> 0002
> 0002 -> 0003
> 0003 -> 0004

Done that way.

> As far 0001:
>
> "
> This commit renames conflict_reason to
> invalidation_reason, and adds the support to show invalidation
> reasons for both physical and logical slots.
> "
>
> I'm not sure I like the fact that "invalidations" and "conflicts" are merged
> into a single field. I'd vote to keep conflict_reason as it is and add a new
> invalidation_reason (and put "conflict" as value when it is the case). The 
> reason
> is that I think they are 2 different concepts (could be linked though) and 
> that
> it would be easier to check for conflicts (means conflict_reason is not NULL).

So, do you want conflict_reason for only logical slots, and a separate
column for invalidation_reason for both logical and physical slots? Is
there any strong reason to have two properties "conflict" and
"invalidated" for slots? They both are the same internally, so why
confuse the users?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 5c965e485f0abb3e7c55484b136a986597975ef6 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Tue, 6 Feb 2024 16:27:12 +
Subject: [PATCH v4 4/4] Add inactive_timeout based replication slot 
 invalidation

Currently postgres has the ability to invalidate inactive
replication slots based on the amount of WAL (set via
max_slot_wal_keep_size GUC) that will be needed for the slots in
case they become active. However, choosing a default value for
max_slot_wal_keep_size is tricky. Because the amount of WAL a
customer generates, and their allocated storage will vary greatly
in production, making it difficult to pin down a one-size-fits-all
value. It is often easy for developers to set a timeout of say 1
or 2 or 3 days, after which the inactive slots get dropped.

To achieve the above, postgres uses replication slot metric
inactive_at (the time at which the slot became inactive), and a
new GUC inactive_replication_slot_timeout. The checkpointer then
looks at all replication slots invalidating the inactive slots
based on the timeout set.
---
 doc/src/sgml/config.sgml  | 18 +
 src/backend/access/transam/xlog.c | 10 +++
 src/backend/replication/slot.c| 19 ++
 src/backend/replication/slotfuncs.c   |  3 +
 src/backend/utils/misc/guc_tables.c   | 12 
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/replication/slot.h|  3 +
 src/test/recovery/meson.build |  1 +
 src/test/recovery/t/050_invalidate_slots.pl   | 68 +++
 9 files changed, 135 insertions(+)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index bc8c039b06..0ae3a15400 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4426,6 +4426,24 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
   
  
 
+ 
+  inactive_replication_slot_timeout (integer)
+  
+   inactive_replication_slot_timeout configuration parameter
+  
+  
+  
+   
+Invalidate replication slots that are inactive for longer than this
+amount of time at the next checkpoint. If this value is specified
+without units, it is taken as seconds. A value of zero (which is
+default) disables the timeout mechanism. This parameter can only be
+set in the postgresql.conf file or on the server
+command line.
+   
+  
+ 
+
  
   track_commit_timestamp (boolean)
   
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dbf2fa5911..4f5ee71638 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7056,6 +7056,11 @@ CreateCheckPoint(int flags)
 		InvalidateObsoleteReplicationSlots(RS_INVAL_XID_AGE, 0,
 		   InvalidOid, InvalidTransactionId);
 
+	/* Invalidate inactive replication slots based on timeout */
+	if (inactive_replication_slot_timeout > 0)
+		InvalidateObsoleteReplicationSlots(RS_INVAL_INACTIVE_TIMEOUT, 0,
+		   InvalidOid, InvalidTransactionId);
+
 	/*
 	 * Delete old log files, those no longer needed for last checkpoint to
 	 * prevent the disk holding the xlog from growing full.
@@ -7505,6 +7510,11 @@ CreateRestartPoint(int flags)
 		InvalidateObsoleteReplicationSlots(RS_INVAL_XID_AGE, 0,
 		   InvalidOid, InvalidTransactionId);
 
+	/* Invalidate inactive replication slots based on timeout */
+	if (inactive_replication_slot_timeout > 0)
+		InvalidateObsoleteReplicationSlots(RS_INVAL_INACTIVE_TIMEOUT, 0,

Re: Psql meta-command conninfo+

2024-02-06 Thread Nathan Bossart
On Tue, Feb 06, 2024 at 05:27:01PM +, Maiquel Grassi wrote:
> I'm seeking to improve the \conninfo meta-command in psql. Currently, it
> provides limited information about the current connection. I believe that
> expanding it using the concept of "plus" [+] could ease the work of DBAs,
> SysAdmins, DevOps, etc., who manage a large volume of databases and/or
> multiple PostgreSQL servers. The objective of this enhancement is to
> obtain quick information about the current connection (session). I
> believe that for a PostgreSQL administrator, it is not feasible to write
> a plpgsql function and apply it to all databases (for example, imagine
> managing over 200 databases). I have an example on GitHub
> https://github.com/maiquelgrassi/DBA-toolkit/blob/main/cluster/dba_whoami_function.sql
> of a plpgsql function demonstrating exactly what I believe is impractical
> for the daily routine of a PostgreSQL professional. I see psql's
> meta-commands as significant allies in daily work in productive
> environments.

This seems like a reasonable idea to me.

> postgres=# \conninfo+
>  Current Connection Information
>Attribute| Value
> +
>  Database   | postgres
>  User   | postgres
>  Server Version | 16.1
>  Server Address | 192.168.0.5/32
>  Server Port| 5433
>  Client Address | 192.168.0.5/32
>  Client Port| 52716
>  Session PID| 21624
> (8 rows)

My first reaction is that this should instead return a single row with the
same set of columns for any connection type (the not-applicable ones would
just be set to NULL).  That would match the other meta-commands like \l and
\du, and you could still get an expanded display with \x if needed.  Also,
I think it would simplify the code a bit.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




RE: Popcount optimization using AVX512

2024-02-06 Thread Amonson, Paul D
Hi All,

> However, it would be much better if the improved code were available for
> all relevant builds and activated if a CPUID test determines that the
> relevant instructions are available, instead of requiring a compile-time
> flag -- which most builds are not going to use, thus wasting the
> opportunity for running the optimized code.
 
This makes sense. I addressed the feedback, and am attaching an updated patch. 
Patch also addresses your feedback of autconf configurations by adding CFLAG 
support. I tested the runtime check for AVX512 on multiple processors with and 
without AVX512 and it detected or failed to detect the feature as expected.
 
> Looking at the existing code, I would also consider renaming
> the "_fast" variants to something like pg_popcount32_asml/
> pg_popcount64_asmq so that you can name the new one pg_popcount64_asmdq
> or such.
 
I left out the renaming, as it made sense to keep the fast/slow naming for 
readability.
 
> Finally, the matter of using ifunc as proposed by Noah seems to be still
> in the air, with no patches offered for the popcount family. Given that
> Nathan reports [1] a performance decrease, maybe we should set that
> thought aside for now and continue to use function pointers.
 
Since there are improvements without it (results below), I agree with you to 
continue using function pointers.
 
I collected data on machines with, and without AVX512 support, using a table 
with 1M rows and performing SQL  bit_count() on a char column containing 
(84bytes, 4KiB, 8KiB, 16KiB).
* On non-AVX 512 hardware: no regression or impact at runtime with code 
built with AVX 512 support in the binary between the patched and unpatched 
servers.
* On AVX512 hardware: the max improvement I saw was 17% but was averaged 
closer to 6.5% on a bare-metal machine. The benefit is lower on smaller cloud 
VMs on AWS (1 - 3%)
 
If the patch looks good, please suggest next steps on committing it.
 
Paul

-Original Message-
From: Alvaro Herrera  
Sent: Thursday, January 25, 2024 1:49 AM
To: Shankaran, Akash 
Cc: Nathan Bossart ; Noah Misch ; 
Amonson, Paul D ; Tom Lane ; 
Matthias van de Meent ; 
pgsql-hackers@lists.postgresql.org
Subject: Re: Popcount optimization using AVX512

On 2024-Jan-25, Shankaran, Akash wrote:

> With the updated patch, we observed significant improvements and 
> handily beat the previous popcount algorithm performance. No 
> regressions in any scenario are observed:
> Platform: Intel Xeon Platinum 8360Y (Icelake) for data sizes 1kb - 64kb.
> Microbenchmark: 2x - 3x gains presently vs 19% previously, on the same 
> microbenchmark described initially in this thread.

These are great results.

However, it would be much better if the improved code were available for all 
relevant builds and activated if a CPUID test determines that the relevant 
instructions are available, instead of requiring a compile-time flag -- which 
most builds are not going to use, thus wasting the opportunity for running the 
optimized code.

I suppose this would require patching pg_popcount64_choose() to be more 
specific.  Looking at the existing code, I would also consider renaming the 
"_fast" variants to something like pg_popcount32_asml/ pg_popcount64_asmq so 
that you can name the new one pg_popcount64_asmdq or such.  (Or maybe leave the 
32-bit version alone as "fast/slow", since there's no third option for that one 
-- or do I misread?)

I also think this needs to move the CFLAGS-decision-making elsewhere; asking 
the user to get it right is too much of a burden.  Is it workable to simply 
verify compiler support for the additional flags needed, and if so add them to 
a new CFLAGS_BITUTILS variable or such?  We already have the CFLAGS_CRC model 
that should be easy to follow.  Should be easy enough to mostly copy what's in 
configure.ac and meson.build, right?

Finally, the matter of using ifunc as proposed by Noah seems to be still in the 
air, with no patches offered for the popcount family.  Given that Nathan 
reports [1] a performance decrease, maybe we should set that thought aside for 
now and continue to use function pointers.  It's worth keeping in mind that 
popcount is already using function pointers (at least in the case where we try 
to use POPCNT directly), so patching to select between three options instead of 
between two wouldn't be a regression.

[1] https://postgr.es/m/20231107201441.GA898662@nathanxps13

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Nunca se desea ardientemente lo que solo se desea por razón" (F. Alexandre)


Official-PostgreSQL-AVX-512-POPCNT.patch
Description: Official-PostgreSQL-AVX-512-POPCNT.patch


Re: confusing / inefficient "need_transcoding" handling in copy

2024-02-06 Thread Tom Lane
Michael Paquier  writes:
> On Mon, Feb 05, 2024 at 06:05:04PM -0800, Andres Freund wrote:
>> I haven't yet dug into the code history. One guess is that this should only
>> have been set this way for COPY FROM.

> Looking the git history, this looks like an oversight of c61a2f58418e
> that has added the condition on pg_database_encoding_max_length(), no?
> Adding Tom and Ishii-san, even if this comes from 2005.

Yeah, back in 8.1 that code was applied for both directions, but
probably it should have enforced validation for same-encoding
cases only for COPY FROM.

It looks like now we have a mess, because the condition was copied
verbatim into copyto.c but not copyfrom.c.  Aren't we failing to
validate encoding in this case in COPY FROM, which is where we
actually need to?

regards, tom lane




Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-02-06 Thread Bharath Rupireddy
On Tue, Feb 6, 2024 at 2:16 PM Dilip Kumar  wrote:
>
> > Thoughts?
> >
> +1 for the idea,  here are some comments on 0002, I will review other
> patches soon and respond.

Thanks for looking at it.

> +   inactive_at timestamptz
>
> Maybe we can change the field name to 'last_inactive_at'? or maybe the
> comment can explain timestampt at which slot was last inactivated.
> I think since we are already maintaining the inactive_count so better
> to explicitly say this is the last invaliding time.

last_inactive_at looks better, so will use that in the next version of
the patch.

> 2.
> + /*
> + * XXX: Can inactive_count of type uint64 ever overflow? It takes
> + * about a half-billion years for inactive_count to overflow even
> + * if slot becomes inactive for every 1 millisecond. So, using
> + * pg_add_u64_overflow might be an overkill.
> + */
>
> Correct we don't need to use pg_add_u64_overflow for this counter.

Will remove this comment in the next version of the patch.

> + /* Convert to numeric. */
> + snprintf(buf, sizeof buf, UINT64_FORMAT, slot_contents.data.inactive_count);
> + values[i++] = DirectFunctionCall3(numeric_in,
> +   CStringGetDatum(buf),
> +   ObjectIdGetDatum(0),
> +   Int32GetDatum(-1));
>
> What is the purpose of doing this? I mean inactive_count is 8 byte
> integer and you can define function outparameter as 'int8' which is 8
> byte integer.  Then you don't need to convert int to string and then
> to numeric?

Nope, it's of type uint64, so reporting it as numeric is a way
typically used elsewhere - see code around /* Convert to numeric. */.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Set log_lock_waits=on by default

2024-02-06 Thread Tom Lane
Nathan Bossart  writes:
> It looks like there are two ideas:

> * Separate log_lock_waits from deadlock_timeout.  It looks like this idea
>   has a decent amount of support, but I didn't see any patch for it so far.

I think the support comes from people who have not actually looked at
the code.  The reason they are not separate is that the same timeout
service routine does both things.  To pull them apart, you would have
to (1) detangle that code and (2) incur the overhead of two timeout
events queued for every lock wait.  It's not clear to me that it's
worth it.  In some sense, deadlock_timeout is exactly the length of
time after which you want to get concerned.

>   IMHO this is arguably a prerequisite for setting log_lock_waits on by
>   default, as we could then easily set it higher by default to help address
>   concerns about introducing too much noise in the logs.

Well, that's the question --- would it be sane to enable
log_lock_waits by default if we don't separate them?

> * Set log_lock_waits on by default.  The folks on this thread seem to
>   support this idea, but given the lively discussion for enabling
>   log_checkpoints by default [0], I'm hesitant to commit something like
>   this without further community discussion.

I was, and remain, of the opinion that that was a bad idea that
we'll eventually revert, just like we previously got rid of most
inessential log chatter in the default configuration.  So I doubt
you'll have much trouble guessing my opinion of this one.  I think
the default logging configuration should be chosen with the
understanding that nobody ever looks at the logs of most
installations, and we should be more worried about their disk space
consumption than anything else.  Admittedly, log_lock_waits is less
bad than log_checkpoints, because no such events will occur in a
well-tuned configuration ... but still, it's going to be useless
chatter in the average installation.

regards, tom lane




Psql meta-command conninfo+

2024-02-06 Thread Maiquel Grassi
Hi,


I'm seeking to improve the \conninfo meta-command in psql. Currently, it 
provides limited information about the current connection. I believe that 
expanding it using the concept of "plus" [+] could ease the work of DBAs, 
SysAdmins, DevOps, etc., who manage a large volume of databases and/or multiple 
PostgreSQL servers. The objective of this enhancement is to obtain quick 
information about the current connection (session). I believe that for a 
PostgreSQL administrator, it is not feasible to write a plpgsql function and 
apply it to all databases (for example, imagine managing over 200 databases). I 
have an example on GitHub 
https://github.com/maiquelgrassi/DBA-toolkit/blob/main/cluster/dba_whoami_function.sql
 of a plpgsql function demonstrating exactly what I believe is impractical for 
the daily routine of a PostgreSQL professional. I see psql's meta-commands as 
significant allies in daily work in productive environments.


Note: As this is a prototype, I will adjust the rest (documentation, tests, 
etc.) once an agreement is reached.

Use cases for both the current and improved command bellow.

Connection 1 ("remote server"):

[postgres@localhost bin]$ ./psql -h 192.168.0.5 -p 5433 -U postgres -d postgres

psql (17devel, server 16.1)
Type "help" for help.

postgres=# \conninfo
You are connected to database "postgres" as user "postgres" on host 
"192.168.0.5" at port "5433".
postgres=# \conninfo+
 Current Connection Information
   Attribute| Value
+
 Database   | postgres
 User   | postgres
 Server Version | 16.1
 Server Address | 192.168.0.5/32
 Server Port| 5433
 Client Address | 192.168.0.5/32
 Client Port| 52716
 Session PID| 21624
(8 rows)


Connection 2 (socket):

[postgres@localhost bin]$ ./psql

psql (17devel)
Type "help" for help.

postgres=# \conninfo
You are connected to database "postgres" as user "postgres" via socket in 
"/tmp" at port "5432".
postgres=# \conninfo+
  Current Connection Information
Attribute | Value
--+---
 Info | Connected via socket!
 Database | postgres
 User | postgres
 Socket Directory | /tmp
 Server Version   | 17devel
 Server Port  | 5432
 Session PID  | 27586
(7 rows)

Connection 3 (localhost):
[postgres@localhost bin]$ ./psql -h localhost
psql (17devel)
Type "help" for help.

postgres=# \conninfo
You are connected to database "postgres" as user "postgres" on host "localhost" 
(address "::1") at port "5432".
postgres=# \conninfo+
Current Connection Information
   Attribute|   Value
+---
 Database   | postgres
 User   | postgres
 Host   | localhost
 Server Version | 17devel
 Server Address | ::1/128
 Server Port| 5432
 Client Address | ::1/128
 Client Port| 46824
 Session PID| 27598
(9 rows)

Connection 4 (127.0.0.1):
[postgres@localhost bin]$ ./psql -h 127.0.0.1
psql (17devel)
Type "help" for help.

postgres=# \conninfo
You are connected to database "postgres" as user "postgres" on host "127.0.0.1" 
at port "5432".
postgres=# \conninfo+
Current Connection Information
   Attribute|Value
+--
 Database   | postgres
 User   | postgres
 Server Version | 17devel
 Server Address | 127.0.0.1/32
 Server Port| 5432
 Client Address | 127.0.0.1/32
 Client Port| 34876
 Session PID| 27624
(8 rows)

Regards,
Maiquel O. Grassi.


v1-0001-psql-meta-command-conninfo-plus.patch
Description: v1-0001-psql-meta-command-conninfo-plus.patch


Re: Remove Start* macros from postmaster.c to ease understanding of code

2024-02-06 Thread Nathan Bossart
On Tue, Feb 06, 2024 at 11:58:50AM -0500, reid.thomp...@crunchydata.com wrote:
> Attached is a small patch implemented as I agree with Andres' comment
> below noted while reviewing the thread
> https://www.postgresql.org/message-id/flat/20240122210740.7vq5fd4woixpwx3f%40awork3.anarazel.de#6eb7595873392621d60e6b5a723941bc
> 
> I agree that its easier to not have to refer back to the macros only to
> see that they're all invoking StartChildProcess(X). All invocations are
> within postmaster.c. 

Seems reasonable to me.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Remove Start* macros from postmaster.c to ease understanding of code

2024-02-06 Thread reid . thompson
Hi,

Attached is a small patch implemented as I agree with Andres' comment
below noted while reviewing the thread
https://www.postgresql.org/message-id/flat/20240122210740.7vq5fd4woixpwx3f%40awork3.anarazel.de#6eb7595873392621d60e6b5a723941bc

I agree that its easier to not have to refer back to the macros only to
see that they're all invoking StartChildProcess(X). All invocations are
within postmaster.c. 

> @@ -561,13 +561,13 @@ static void ShmemBackendArrayAdd(Backend *bn);
>  static void ShmemBackendArrayRemove(Backend *bn);
>  #endif   /* EXEC_BACKEND 
> */
>  
> -#define StartupDataBase()StartChildProcess(StartupProcess)
> -#define StartArchiver()  
> StartChildProcess(ArchiverProcess)
> -#define StartBackgroundWriter() StartChildProcess(BgWriterProcess)
> -#define StartCheckpointer()  StartChildProcess(CheckpointerProcess)
> -#define StartWalWriter() StartChildProcess(WalWriterProcess)
> -#define StartWalReceiver()   StartChildProcess(WalReceiverProcess)
> -#define StartWalSummarizer() StartChildProcess(WalSummarizerProcess)
> +#define StartupDataBase()StartChildProcess(B_STARTUP)
> +#define StartArchiver()  StartChildProcess(B_ARCHIVER)
> +#define StartBackgroundWriter() StartChildProcess(B_BG_WRITER)
> +#define StartCheckpointer()  StartChildProcess(B_CHECKPOINTER)
> +#define StartWalWriter() StartChildProcess(B_WAL_WRITER)
> +#define StartWalReceiver()   StartChildProcess(B_WAL_RECEIVER)
> +#define StartWalSummarizer() StartChildProcess(B_WAL_SUMMARIZER)

Not for this commit, but we ought to rip out these macros - all they do is to 
make it harder to understand the code.
From 236580a0dd381e245a459d0e8769bd30ba2d79e3 Mon Sep 17 00:00:00 2001
From: Reid Thompson 
Date: Tue, 6 Feb 2024 09:17:28 -0500
Subject: [PATCH] Remove Start* macros in postmaster.c to ease understanding of
 code. Per comment in thread here
 https://www.postgresql.org/message-id/20240122210740.7vq5fd4woixpw...@awork3.anarazel.de
 I agree that its easier to not have to refer back to the macros only to see
 that they're just invoking StartChildProcess(X). All invocations are
 within postmaster.c. 

---
 src/backend/postmaster/postmaster.c | 44 -
 1 file changed, 18 insertions(+), 26 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index feb471dd1d..f1e60c7fd9 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -561,14 +561,6 @@ static void ShmemBackendArrayAdd(Backend *bn);
 static void ShmemBackendArrayRemove(Backend *bn);
 #endif			/* EXEC_BACKEND */
 
-#define StartupDataBase()		StartChildProcess(StartupProcess)
-#define StartArchiver()			StartChildProcess(ArchiverProcess)
-#define StartBackgroundWriter() StartChildProcess(BgWriterProcess)
-#define StartCheckpointer()		StartChildProcess(CheckpointerProcess)
-#define StartWalWriter()		StartChildProcess(WalWriterProcess)
-#define StartWalReceiver()		StartChildProcess(WalReceiverProcess)
-#define StartWalSummarizer()	StartChildProcess(WalSummarizerProcess)
-
 /* Macros to check exit status of a child process */
 #define EXIT_STATUS_0(st)  ((st) == 0)
 #define EXIT_STATUS_1(st)  (WIFEXITED(st) && WEXITSTATUS(st) == 1)
@@ -1457,14 +1449,14 @@ PostmasterMain(int argc, char *argv[])
 
 	/* Start bgwriter and checkpointer so they can help with recovery */
 	if (CheckpointerPID == 0)
-		CheckpointerPID = StartCheckpointer();
+		CheckpointerPID = StartChildProcess(CheckpointerProcess);
 	if (BgWriterPID == 0)
-		BgWriterPID = StartBackgroundWriter();
+		BgWriterPID = StartChildProcess(BgWriterProcess);
 
 	/*
 	 * We're ready to rock and roll...
 	 */
-	StartupPID = StartupDataBase();
+	StartupPID = StartChildProcess(StartupProcess);
 	Assert(StartupPID != 0);
 	StartupStatus = STARTUP_RUNNING;
 	pmState = PM_STARTUP;
@@ -1798,9 +1790,9 @@ ServerLoop(void)
 			pmState == PM_HOT_STANDBY || pmState == PM_STARTUP)
 		{
 			if (CheckpointerPID == 0)
-CheckpointerPID = StartCheckpointer();
+CheckpointerPID = StartChildProcess(CheckpointerProcess);
 			if (BgWriterPID == 0)
-BgWriterPID = StartBackgroundWriter();
+BgWriterPID = StartChildProcess(BgWriterProcess);
 		}
 
 		/*
@@ -1809,7 +1801,7 @@ ServerLoop(void)
 		 * be writing any new WAL).
 		 */
 		if (WalWriterPID == 0 && pmState == PM_RUN)
-			WalWriterPID = StartWalWriter();
+			WalWriterPID = StartChildProcess(WalWriterProcess);
 
 		/*
 		 * If we have lost the autovacuum launcher, try to start a new one. We
@@ -1828,7 +1820,7 @@ ServerLoop(void)
 
 		/* If we have lost the archiver, try to start a new one. */
 		if (PgArchPID == 0 && PgArchStartupAllowed())
-			PgArchPID = StartArchiver();
+			PgArchPID = StartChildProcess(ArchiverProcess);
 
 		/* If we need to signal the autovacuum launcher, do so now */
 		if 

Re: Reducing connection overhead in pg_upgrade compat check phase

2024-02-06 Thread Nathan Bossart
On Tue, Feb 06, 2024 at 05:47:56PM +0100, Daniel Gustafsson wrote:
>> On 6 Feb 2024, at 17:32, Nathan Bossart  wrote:
>> IMHO this patch is worth trying to get into v17.  I'd be happy to take it
>> forward if Daniel does not intend to work on it.
> 
> I actually had the same thought yesterday and spent some time polishing and
> rebasing it.  I'll post an updated rebase shortly with the hopes of getting it
> committed this week.

Oh, awesome.  Thanks!

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Set log_lock_waits=on by default

2024-02-06 Thread Nathan Bossart
I saw this was marked ready-for-committer, so I took a look at the thread.
It looks like there are two ideas:

* Separate log_lock_waits from deadlock_timeout.  It looks like this idea
  has a decent amount of support, but I didn't see any patch for it so far.
  IMHO this is arguably a prerequisite for setting log_lock_waits on by
  default, as we could then easily set it higher by default to help address
  concerns about introducing too much noise in the logs.

* Set log_lock_waits on by default.  The folks on this thread seem to
  support this idea, but given the lively discussion for enabling
  log_checkpoints by default [0], I'm hesitant to commit something like
  this without further community discussion.

[0] 
https://postgr.es/m/CALj2ACX-rW_OeDcp4gqrFUAkf1f50Fnh138dmkd0JkvCNQRKGA%40mail.gmail.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Reducing connection overhead in pg_upgrade compat check phase

2024-02-06 Thread Daniel Gustafsson
> On 6 Feb 2024, at 17:32, Nathan Bossart  wrote:
> 
> On Fri, Feb 02, 2024 at 12:18:25AM +0530, vignesh C wrote:
>> With no update to the thread and the patch still not applying I'm
>> marking this as returned with feedback.  Please feel free to resubmit
>> to the next CF when there is a new version of the patch.
> 
> IMHO this patch is worth trying to get into v17.  I'd be happy to take it
> forward if Daniel does not intend to work on it.

I actually had the same thought yesterday and spent some time polishing and
rebasing it.  I'll post an updated rebase shortly with the hopes of getting it
committed this week.

--
Daniel Gustafsson





Re: Reducing connection overhead in pg_upgrade compat check phase

2024-02-06 Thread Nathan Bossart
On Fri, Feb 02, 2024 at 12:18:25AM +0530, vignesh C wrote:
> With no update to the thread and the patch still not applying I'm
> marking this as returned with feedback.  Please feel free to resubmit
> to the next CF when there is a new version of the patch.

IMHO this patch is worth trying to get into v17.  I'd be happy to take it
forward if Daniel does not intend to work on it.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: pg_stat_advisor extension

2024-02-06 Thread Ilia Evdokimov

Dear Team,

Firstly, I would like to extend my sincere apologies for the confusion 
and technical oversights in our previous discussions regarding the 
'pg_stat_advisor extension'. To address this and facilitate a clearer, 
more focused dialogue, I have initiated a new thread to consolidate our 
discussions on this matter.


For context, our previous conversation can be found here: 
https://www.postgresql.org/message-id/flat/4681151706615977%40mail.yandex.ru.


The extension 'pg_stat_advisor' extension is architected to optimize 
query plan. It operates by suggesting when to create extended 
statistics, particularly in queries where current selectivity estimates 
fall short. This is achieved through the GUC parameter 
'pg_stat_advisor.suggest_statistics_threshold', which assesses the ratio 
of total tuples compared to the planned rows. This feature is 
instrumental in identifying scenarios where the planner's estimates 
could be optimized.


You can install the extension by:

```

LOAD 'pg_stat_advisor'

SET pg_stat_advisor.suggest_statistics_threshold = 1.0;

```


Example:

```

EXPLAIN ANALYZE SELECT * FROM t WHERE i = 100 AND j = 10;

NOTICE: pg_stat_advisor suggestion: CREATE STATISTICS t_i_j ON i, j FROM t

    QUERY PLAN



```

After EXPLAIN ANALYZE command you can see the message of suggestion 
creating statistics with name 't_i_j' on 'i', 'j' columns from 't' table.


Thank you for your understanding, patience, and continued support.



Best regards,
Ilia Evdokimov,
Tantor Labs LLC.




Re: pg_stat_advisor extension

2024-02-06 Thread Ilia Evdokimov

Hi hackers,


I've encountered and addressed errors in the 
"0001-pg_stat_advisor-extension.patch" when applying it to the main 
branch, specifically trailing whitespace issues at lines 117 and 118:


```
0001-pg_stat_advisor-extension.patch:117: trailing whitespace.
   QUERY PLAN
0001-pg_stat_advisor-extension.patch:118: trailing whitespace.

warning: 2 lines add whitespace errors.

```

An updated patch is attached for review


I welcome your insights, feedback, and evaluations regarding the 
necessity of integrating this new extension into PostgreSQL.



Kind regards,

Ilia Evdokimov,

Tantor Labs LLC.
From 6316706c42996219e507bb6ded9dd1e872180e38 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Tue, 6 Feb 2024 18:11:04 +0300
Subject: [PATCH] pg_stat_advisor extension

---
 contrib/Makefile  |   1 +
 contrib/meson.build   |   1 +
 contrib/pg_stat_advisor/.gitignore|   3 +
 contrib/pg_stat_advisor/Makefile  |  20 +
 contrib/pg_stat_advisor/README.md |  85 
 .../expected/pg_stat_advisor.out  |  96 
 contrib/pg_stat_advisor/meson.build   |  30 ++
 contrib/pg_stat_advisor/pg_stat_advisor.c | 477 ++
 .../pg_stat_advisor/sql/pg_stat_advisor.sql   |  50 ++
 9 files changed, 763 insertions(+)
 create mode 100644 contrib/pg_stat_advisor/.gitignore
 create mode 100644 contrib/pg_stat_advisor/Makefile
 create mode 100644 contrib/pg_stat_advisor/README.md
 create mode 100644 contrib/pg_stat_advisor/expected/pg_stat_advisor.out
 create mode 100644 contrib/pg_stat_advisor/meson.build
 create mode 100644 contrib/pg_stat_advisor/pg_stat_advisor.c
 create mode 100644 contrib/pg_stat_advisor/sql/pg_stat_advisor.sql

diff --git a/contrib/Makefile b/contrib/Makefile
index da4e2316a3..da9a4ceeaa 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -34,6 +34,7 @@ SUBDIRS = \
 		pg_buffercache	\
 		pg_freespacemap \
 		pg_prewarm	\
+		pg_stat_advisor \
 		pg_stat_statements \
 		pg_surgery	\
 		pg_trgm		\
diff --git a/contrib/meson.build b/contrib/meson.build
index c12dc906ca..a20d99443b 100644
--- a/contrib/meson.build
+++ b/contrib/meson.build
@@ -49,6 +49,7 @@ subdir('pgcrypto')
 subdir('pg_freespacemap')
 subdir('pg_prewarm')
 subdir('pgrowlocks')
+subdir('pg_stat_advisor')
 subdir('pg_stat_statements')
 subdir('pgstattuple')
 subdir('pg_surgery')
diff --git a/contrib/pg_stat_advisor/.gitignore b/contrib/pg_stat_advisor/.gitignore
new file mode 100644
index 00..913175ff6e
--- /dev/null
+++ b/contrib/pg_stat_advisor/.gitignore
@@ -0,0 +1,3 @@
+/log/
+/results/
+/tmp_check/
diff --git a/contrib/pg_stat_advisor/Makefile b/contrib/pg_stat_advisor/Makefile
new file mode 100644
index 00..f31b939e8a
--- /dev/null
+++ b/contrib/pg_stat_advisor/Makefile
@@ -0,0 +1,20 @@
+# contrib/pg_stat_advisor/Makefile
+
+MODULE_big = pg_stat_advisor
+OBJS = \
+	$(WIN32RES) \
+	pg_stat_advisor.o
+PGFILEDESC = "pg_stat_advisor - analyze query performance and recommend the creation of additional statistics"
+
+REGRESS = pg_stat_advisor
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_stat_advisor
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_stat_advisor/README.md b/contrib/pg_stat_advisor/README.md
new file mode 100644
index 00..f9610f2ed5
--- /dev/null
+++ b/contrib/pg_stat_advisor/README.md
@@ -0,0 +1,85 @@
+## pg_stat_advisor - PostgreSQL advisor to create statistics
+
+pg_stat_advisor is a PostgreSQL extension designed to analyze query performance and recommend the creation of additional statistics to improve query plan.
+
+Append pg_stat_advisor to shared_preload_libraries configuration parameter in your postgresql.conf file then restart the PostgreSQL database to apply the changes. Or you can use "LOAD 'pg_stat_advisor';"command
+```
+LOAD 'pg_stat_advisor';
+```
+
+There is the pg_stat_advisor.suggest_statistics_threshold GUC that can be used to set a suggest_statistics_threshold. It is the the ratio of total tuples produced compared to the planned rows. If parameter is set by 0, the printing switches off.
+
+For example:
+```
+SET pg_stat_advisor.suggest_statistics_threshold = 1.0;
+```
+
+Examples:
+
+
+```
+postgres=# create table t (i int, j int);
+CREATE TABLE
+postgres=# insert into t select i/10, i/100 from generate_series(1, 100) i;
+INSERT 0 100
+postgres=# analyze t;
+ANALYZE
+postgres=# explain analyze select * from t where i = 100 and j = 10;
+   QUERY PLAN
+
+--
+--
+ Gather  (cost=1000.00..11675.10 rows=1 width=8) (actual time=0.526..61.564 rows=10 loops=1)
+   Workers Planned: 2

Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-02-06 Thread Alvaro Herrera
Here's the rest of it rebased on top of current master.  I think it
makes sense to have this as one individual commit.

I made CLOGShmemBuffers, CommitTsShmemBuffers and SUBTRANSShmemBuffers
compute a number that's multiple of SLRU_BANK_SIZE.  But it's a crock,
because we don't have that macro at that point, so I just used constant
16.  Obviously need a better solution for this.

I also changed the location of bank_mask in SlruCtlData for better
packing, as advised by pahole; and renamed SLRU_SLOTNO_GET_BANKLOCKNO()
to SlotGetBankNumber().

Some very critical comments still need to be updated to the new design,
particularly anything that mentions "control lock"; but also the overall
model needs to be explained in some central location, rather than
incongruently some pieces here and other pieces there.  I'll see about
this later.  But at least this is code you should be able to play with.


I've been wondering whether we should add a "slru" to the name of the
GUCs:

commit_timestamp_slru_buffers
transaction_slru_buffers
etc

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Aprender sin pensar es inútil; pensar sin aprender, peligroso" (Confucio)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 61038472c5..3e3119865a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2006,6 +2006,145 @@ include_dir 'conf.d'
   
  
 
+ 
+  commit_timestamp_buffers (integer)
+  
+   commit_timestamp_buffers configuration parameter
+  
+  
+  
+   
+Specifies the amount of memory to use to cache the contents of
+pg_commit_ts (see
+).
+If this value is specified without units, it is taken as blocks,
+that is BLCKSZ bytes, typically 8kB.
+The default value is 0, which requests
+shared_buffers/512 up to 1024 blocks,
+but not fewer than 16 blocks.
+This parameter can only be set at server start.
+   
+  
+ 
+
+ 
+  multixact_members_buffers (integer)
+  
+   multixact_members_buffers configuration parameter
+  
+  
+  
+   
+Specifies the amount of shared memory to use to cache the contents
+of pg_multixact/members (see
+).
+If this value is specified without units, it is taken as blocks,
+that is BLCKSZ bytes, typically 8kB.
+The default value is 32.
+This parameter can only be set at server start.
+   
+  
+ 
+
+ 
+  multixact_offsets_buffers (integer)
+  
+   multixact_offsets_buffers configuration parameter
+  
+  
+  
+   
+Specifies the amount of shared memory to use to cache the contents
+of pg_multixact/offsets (see
+).
+If this value is specified without units, it is taken as blocks,
+that is BLCKSZ bytes, typically 8kB.
+The default value is 16.
+This parameter can only be set at server start.
+   
+  
+ 
+
+ 
+  notify_buffers (integer)
+  
+   notify_buffers configuration parameter
+  
+  
+  
+   
+Specifies the amount of shared memory to use to cache the contents
+of pg_notify (see
+).
+If this value is specified without units, it is taken as blocks,
+that is BLCKSZ bytes, typically 8kB.
+The default value is 16.
+This parameter can only be set at server start.
+   
+  
+ 
+
+ 
+  serializable_buffers (integer)
+  
+   serializable_buffers configuration parameter
+  
+  
+  
+   
+Specifies the amount of shared memory to use to cache the contents
+of pg_serial (see
+).
+If this value is specified without units, it is taken as blocks,
+that is BLCKSZ bytes, typically 8kB.
+The default value is 32.
+This parameter can only be set at server start.
+   
+  
+ 
+
+ 
+  subtransaction_buffers (integer)
+  
+   subtransaction_buffers configuration parameter
+  
+  
+  
+   
+Specifies the amount of shared memory to use to cache the contents
+of pg_subtrans (see
+).
+If this value is specified without units, it is taken as blocks,
+that is BLCKSZ bytes, typically 8kB.
+The default value is 0, which requests
+shared_buffers/512 up to 1024 blocks,
+but not fewer than 16 blocks.
+This parameter can only be set at server start.
+   
+  
+ 
+
+ 
+  transaction_buffers (integer)
+  
+   transaction_buffers configuration parameter
+  
+  
+  
+   
+Specifies the amount of shared memory to use to cache the contents
+of pg_xact (see
+).
+If this value is specified without units, it is taken as blocks,
+that is BLCKSZ bytes, typically 8kB.
+The default 

Re: Possibility to disable `ALTER SYSTEM`

2024-02-06 Thread Jelte Fennema-Nio
On Tue, 30 Jan 2024 at 18:49, Robert Haas  wrote:
> I also think that using the GUC system to manage itself is a little
> bit suspect. I wonder if it would be better to do this some other way,
> e.g. a sentinel file in the data directory. For example, suppose we
> refuse ALTER SYSTEM if $PGDATA/disable_alter_system exists, or
> something like that.

On Tue, 6 Feb 2024 at 15:10, Peter Eisentraut  wrote:
> How about ALTER SYSTEM is disabled if the file
> postgresql.auto.conf.disabled exists? This is somewhat similar to making
> the file read-only, but doesn't risk other tools breaking when they
> encounter such a file.  And it's more obvious and self-explaining.

I'm not convinced we need a new file to disable ALTER SYSTEM. I feel
like an "enable_alter_system" GUC that defaults to ON would work fine
for this. If we make that GUC be PGC_POSTMASTER then an operator can
disable ALTER SYSTEM either with a command line argument or by
changing the main config file. Since this feature is mostly useful
when the config file is managed by an external system, it seems rather
simple for that system to configure one extra GUC in the config file.




Re: clarify equalTupleDescs()

2024-02-06 Thread Tom Lane
Peter Eisentraut  writes:
> The first want to compare what I call row-type equality, which means 
> they want to check specifically for equal number of attributes, and the 
> same attribute names, types, and typmods for each attribute.  Most 
> callers actually want that behavior.

Should compare attcollation too, no?

+1 for the general idea, but it seems like "row type equality"
might still be a slightly fuzzy concept.

regards, tom lane




Re: glibc qsort() vulnerability

2024-02-06 Thread Tom Lane
Mats Kindahl  writes:
> There is a bug in glibc's qsort() algorithm that runs the risk of creating
> an out-of-bounds error if the comparison function is not transitive, for
> example, if subtraction is used so that it can create an overflow.

We don't use glibc's qsort.  Have you checked whether there's a
problem with the code we do use?

regards, tom lane




Re: Why is subscription/t/031_column_list.pl failing so much?

2024-02-06 Thread Tom Lane
Amit Kapila  writes:
> Yeah, I was worried about that. The other idea I have previously
> thought was to change Alter Subscription to Drop+Create Subscription.
> That should also help in bringing stability without losing any
> functionality.

Hm, why would that fix it?

More to the point, aren't these proposals just band-aids that
would stabilize the test without fixing the actual problem?
The same thing is likely to happen to people in the field,
unless we do something drastic like removing ALTER SUBSCRIPTION.

regards, tom lane




Re: Possibility to disable `ALTER SYSTEM`

2024-02-06 Thread David G. Johnston
On Tue, Feb 6, 2024 at 7:10 AM Peter Eisentraut 
wrote:

>
> How about ALTER SYSTEM is disabled if the file
> postgresql.auto.conf.disabled exists? This is somewhat similar to making
> the file read-only, but doesn't risk other tools breaking when they
> encounter such a file.  And it's more obvious and self-explaining.
>

A separate configuration file would be self-documenting and able to always
exist; the same properties as postgres.conf

ISTM the main requirement regardless of how the file system API is designed
- assuming there is a filesystem API - is that the running postgres process
be unable to write to the file.  It seems immaterial how the OS admin
accomplishes that goal.

The command line argument method seems appealing but it seems harder in
that case to ensure that the postgres process be disallowed from modifyIng
whatever file defines what should be run.

One concern with a file configuration is that if we require it to be
present in the data directory that goes somewhat against the design of
allowing configuration files to be placed anywhere by changing the
config_file guc.

Any design should factor in the almost immediate need to be extended to
prevent copy variants that touch the local filesystem or shell directly.

I was pondering a directory in pgdata where you could add *.disabled files
indicating which features to disable.  This is a bit more pluggable than a
single configuration file but the later still seems better to me.

David J.


Re: Possibility to disable `ALTER SYSTEM`

2024-02-06 Thread Peter Eisentraut

On 31.01.24 11:16, Gabriele Bartolini wrote:
I very much like the idea of a file in the data directory that also 
controls the copy operations.


Just wanted to highlight though that in our operator we have already 
applied the read-only postgresql.auto.conf trick to disable the system 
(see 
https://cloudnative-pg.io/documentation/current/postgresql_conf/#enabling-alter-system ). However, having that file read-only triggered an issue when using pg_rewind to resync a former primary, as pg_rewind immediately bails out when a read-only file is encountered in the PGDATA (see https://github.com/cloudnative-pg/cloudnative-pg/issues/3698 ).


We might keep this in mind if we go down the path of the separate file.


How about ALTER SYSTEM is disabled if the file 
postgresql.auto.conf.disabled exists? This is somewhat similar to making 
the file read-only, but doesn't risk other tools breaking when they 
encounter such a file.  And it's more obvious and self-explaining.







glibc qsort() vulnerability

2024-02-06 Thread Mats Kindahl
Hi hackers,

There is a bug in glibc's qsort() algorithm that runs the risk of creating
an out-of-bounds error if the comparison function is not transitive, for
example, if subtraction is used so that it can create an overflow.

See
https://packetstormsecurity.com/files/176931/glibc-qsort-Out-Of-Bounds-Read-Write.html

I checked the existing functions in the latest version of Postgres source
code and most are safe, but there were a few ones that could lead to
overflow. I do not know if these can actually lead to problems, but better
safe than sorry, so I created a patch to fix those few cases and add a
comment to one case that was not clear that it could not overflow.

Best wishes,
Mats Kindahl, Timescale
From 83e2f14ab259f568034b07c2f99e34c6dff2c7b5 Mon Sep 17 00:00:00 2001
From: Mats Kindahl 
Date: Tue, 6 Feb 2024 14:53:53 +0100
Subject: Ensure comparison functions are transitive

There are a few comparison functions to qsort() that are non-transitive
because they can cause an overflow. Fix these functions to instead use
normal comparisons and return -1, 0, or 1 explicitly.
---
 src/backend/access/spgist/spgtextproc.c |  6 +-
 src/backend/utils/cache/relcache.c  |  2 ++
 src/bin/pg_upgrade/function.c   | 10 +++---
 src/bin/psql/crosstabview.c |  8 +++-
 4 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/spgist/spgtextproc.c b/src/backend/access/spgist/spgtextproc.c
index b8fd0c2ad8..d0a2b4e6e1 100644
--- a/src/backend/access/spgist/spgtextproc.c
+++ b/src/backend/access/spgist/spgtextproc.c
@@ -325,7 +325,11 @@ cmpNodePtr(const void *a, const void *b)
 	const spgNodePtr *aa = (const spgNodePtr *) a;
 	const spgNodePtr *bb = (const spgNodePtr *) b;
 
-	return aa->c - bb->c;
+	if (aa->c > bb->c)
+		return 1;
+	if (aa->c < bb->c)
+		return -1;
+	return 0;
 }
 
 Datum
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index ac106b40e3..42e4359c7b 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -4520,6 +4520,8 @@ AttrDefaultCmp(const void *a, const void *b)
 	const AttrDefault *ada = (const AttrDefault *) a;
 	const AttrDefault *adb = (const AttrDefault *) b;
 
+	/* Cannot overflow because standard conversions will convert to int and
+	 * sizeof(int16) < sizeof(int) */
 	return ada->adnum - adb->adnum;
 }
 
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index af998f74d3..c82ae11016 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -32,14 +32,18 @@ library_name_compare(const void *p1, const void *p2)
 	int			slen1 = strlen(str1);
 	int			slen2 = strlen(str2);
 	int			cmp = strcmp(str1, str2);
+	int lhsdb = ((const LibraryInfo *) p1)->dbnum;
+	int rhsdb = ((const LibraryInfo *) p2)->dbnum;
 
 	if (slen1 != slen2)
 		return slen1 - slen2;
 	if (cmp != 0)
 		return cmp;
-	else
-		return ((const LibraryInfo *) p1)->dbnum -
-			((const LibraryInfo *) p2)->dbnum;
+	if (lhsdb < rhsdb)
+		return -1;
+	if (lhsdb > rhsdb)
+		return 1;
+	return 0;
 }
 
 
diff --git a/src/bin/psql/crosstabview.c b/src/bin/psql/crosstabview.c
index c6116b7238..948ed9b1fe 100644
--- a/src/bin/psql/crosstabview.c
+++ b/src/bin/psql/crosstabview.c
@@ -709,5 +709,11 @@ pivotFieldCompare(const void *a, const void *b)
 static int
 rankCompare(const void *a, const void *b)
 {
-	return *((const int *) a) - *((const int *) b);
+	const int lhs = *(const int *) a;
+	const int rhs = *(const int *) b;
+	if (lhs < rhs)
+		return -1;
+	if (lhs > rhs)
+		return 1;
+	return 0;
 }
-- 
2.34.1



RE: Synchronizing slots from primary to standby

2024-02-06 Thread Zhijie Hou (Fujitsu)
On Monday, February 5, 2024 10:25 PM Masahiko Sawada  
wrote:
lvh.no-ip.org>
> Subject: Re: Synchronizing slots from primary to standby
> 
> On Mon, Feb 5, 2024 at 8:26 PM shveta malik 
> wrote:
> >
> > On Mon, Feb 5, 2024 at 10:57 AM Ajin Cherian  wrote:
> > >
> > > Just noticed that doc/src/sgml/config.sgml still refers to enable_synclot
> instead of sync_replication_slots:
> > >
> > > The standbys corresponding to the physical replication slots in
> > > standby_slot_names must configure
> > > enable_syncslot = true so they can receive
> > > failover logical slots changes from the primary.
> >
> > Thanks Ajin for pointing this out. Here are v78 patches, corrected there.
> >
> > Other changes are:
> >
> > 1)  Rebased the patches as the v77-001 is now pushed.
> > 2)  Enabled executing pg_sync_replication_slots() on cascading-standby.
> > 3)  Rearranged the code around parameter validity checks. Changed
> > function names and changed the way how dbname is extracted as
> > suggested by Amit offlist.
> > 4)  Rearranged the code around check_primary_info(). Removed output
> args.
> > 5)  Few other trivial changes.
> >
> 
> Thank you for updating the patch! Here are some comments:
> 
> ---
> Since Two processes (e.g. the slotsync worker and
> pg_sync_replication_slots()) concurrently fetch and update the slot 
> information,
> there is a race condition where slot's confirmed_flush_lsn goes backward. . We
> have the following check but it doesn't prevent the slot's confirmed_flush_lsn
> from moving backward if the restart_lsn does't change:
> 
> /*
>  * Sanity check: As long as the invalidations are handled
>  * appropriately as above, this should never happen.
>  */
> if (remote_slot->restart_lsn < slot->data.restart_lsn)
> elog(ERROR,
>  "cannot synchronize local slot \"%s\" LSN(%X/%X)"
>  " to remote slot's LSN(%X/%X) as synchronization"
>  " would move it backwards", remote_slot->name,
>  LSN_FORMAT_ARGS(slot->data.restart_lsn),
>  LSN_FORMAT_ARGS(remote_slot->restart_lsn));
> 

As discussed, I added a flag in shared memory to control the concurrent slot 
sync.

> ---
> + It is recommended that subscriptions are first disabled before
> + promoting
> f+ the standby and are enabled back after altering the connection string.
> 
> I think it's better to describe the reason why it's recommended to disable
> subscriptions before the standby promotion.

Added.

> 
> ---
> +/* Slot sync worker objects */
> +extern PGDLLIMPORT char *PrimaryConnInfo; extern PGDLLIMPORT char
> +*PrimarySlotName;
> 
> These two variables are declared also in xlogrecovery.h. Is it intentional? 
> If so, I
> think it's better to write comments.

Will address.

> 
> ---
> Global functions and variables used by the slotsync worker are declared in
> logicalworker.h and worker_internal.h. But is it really okay to make a
> dependency between the slotsync worker and logical replication workers? IIUC
> the slotsync worker is conceptually a separate feature from the logical
> replication. I think the slotsync worker can have its own header file.

Added.

> 
> ---
> +   SELECT r.srsubid AS subid, CONCAT('pg_' || srsubid ||
> '_sync_' || srrelid || '_' || ctl.system_identifier) AS slotname
> 
> and
> 
> +   SELECT (CASE WHEN r.srsubstate = 'f' THEN
> pg_replication_origin_progress(CONCAT('pg_' || r.srsubid || '_' || 
> r.srrelid), false)
> 
> If we use CONCAT function, we can replace '||' with ','.
> 

Will address.

> ---
> + Confirm that the standby server is not lagging behind the subscribers.
> + This step can be skipped if
> +  linkend="guc-standby-slot-names">standby_slot_names me>
> + has been correctly configured.
> 
> How can the user confirm if standby_slot_names is correctly configured?

Will address after concluding.

Thanks Shveta for helping addressing the comments.

Best Regards,
Hou zj


RE: Synchronizing slots from primary to standby

2024-02-06 Thread Zhijie Hou (Fujitsu)
On Tuesday, February 6, 2024 3:39 PM Masahiko Sawada  
wrote:
> 
> On Tue, Feb 6, 2024 at 3:19 PM Amit Kapila  wrote:
> >
> > On Mon, Feb 5, 2024 at 7:56 PM Masahiko Sawada
>  wrote:
> > >
> > > ---
> > > Since Two processes (e.g. the slotsync worker and
> > > pg_sync_replication_slots()) concurrently fetch and update the slot
> > > information, there is a race condition where slot's
> > > confirmed_flush_lsn goes backward.
> > >
> >
> > Right, this is possible, though there shouldn't be a problem because
> > anyway, slotsync is an async process. Till we hold restart_lsn, the
> > required WAL won't be removed. Having said that, I can think of two
> > ways to avoid it: (a) We can have some flag in shared memory using
> > which we can detect whether any other process is doing slot
> > syncronization and then either error out at that time or simply wait
> > or may take nowait kind of parameter from user to decide what to do?
> > If this is feasible, we can simply error out for the first version and
> > extend it later if we see any use cases for the same (b) similar to
> > restart_lsn, if confirmed_flush_lsn is getting moved back, raise an
> > error, this is good for now but in future we may still have another
> > similar issue, so I would prefer (a) among these but I am fine if you
> > prefer (b) or have some other ideas like just note down in comments
> > that this is a harmless case and can happen only very rarely.
> 
> Thank you for sharing the ideas. I would prefer (a). For (b), the same issue 
> still
> happens for other fields.

Attach the V79 patch which includes the following changes. (Note that only
0001 is sent in this version, we will send the later patches after rebasing)

1. Address all the comments from Amit[1], all the comments from Peter[2] and 
some of
   the comments from Sawada-san[3].
2. Using a flag in shared to memory to restrcit concurrent slot sync.
3. Add more tap tests for pg_sync_replication_slots function.

[1] 
https://www.postgresql.org/message-id/CAA4eK1KGHT9S-Bst_G1CUNQvRep%3DipMs5aTBNRQFVi6TogbJ9w%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAHut%2BPtyoRf3adoLoTrbL6momzkhXAFKz656Vv9YRu4cp%3D6Yig%40mail.gmail.com
[3] 
https://www.postgresql.org/message-id/CAD21AoCEkcTaPb%2BGdOhSQE49_mKJG6D64quHcioJGx6RCqMv%2BQ%40mail.gmail.com

Best Regards,
Hou zj


v79-0001-Add-a-slot-synchronization-function.patch
Description: v79-0001-Add-a-slot-synchronization-function.patch


Re: Reuse child_relids in try_partitionwise_join was Re: Assert failure on bms_equal(child_joinrel->relids, child_joinrelids)

2024-02-06 Thread Ashutosh Bapat
On Mon, Nov 6, 2023 at 2:48 PM Ashutosh Bapat
 wrote:
>
> explain.out in 0001 needed some adjustments. Without those CIbot shows
> failures. Fixed in the attached patchset. 0001 is just for diagnosis,
> not for actual commit. 0002 which is the actual patch has no changes
> wrt to the previous version.
>

First patch is no longer required. PFA rebased second patch.

-- 
Best Wishes,
Ashutosh Bapat
From 7cf3ad0fd9a7cfffa52a8faad759f16988427a70 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Wed, 26 Jul 2023 12:08:55 +0530
Subject: [PATCH] Reuse child_relids bitmapset in partitionwise joinrels

Bitmapsets containing child relids consume large memory when thousands of
partitions are involved. Create them once, use multiple times and free them up
once their use is over.

Ashutosh Bapat
---
 src/backend/optimizer/path/joinrels.c | 10 ++
 src/backend/optimizer/util/relnode.c  | 18 +++---
 src/include/optimizer/pathnode.h  |  3 ++-
 3 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 4750579b0a..d9dbe0e930 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1526,6 +1526,7 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 		RelOptInfo *child_joinrel;
 		AppendRelInfo **appinfos;
 		int			nappinfos;
+		Bitmapset  *child_relids = NULL;
 
 		if (joinrel->partbounds_merged)
 		{
@@ -1621,9 +1622,8 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 			   child_rel2->relids);
 
 		/* Find the AppendRelInfo structures */
-		appinfos = find_appinfos_by_relids(root,
-		   bms_union(child_rel1->relids,
-	 child_rel2->relids),
+		child_relids = bms_union(child_rel1->relids, child_rel2->relids);
+		appinfos = find_appinfos_by_relids(root, child_relids,
 		   );
 
 		/*
@@ -1641,7 +1641,8 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 		{
 			child_joinrel = build_child_join_rel(root, child_rel1, child_rel2,
  joinrel, child_restrictlist,
- child_sjinfo);
+ child_sjinfo, appinfos,
+ nappinfos);
 			joinrel->part_rels[cnt_parts] = child_joinrel;
 			joinrel->live_parts = bms_add_member(joinrel->live_parts, cnt_parts);
 			joinrel->all_partrels = bms_add_members(joinrel->all_partrels,
@@ -1659,6 +1660,7 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 	child_restrictlist);
 
 		pfree(appinfos);
+		bms_free(child_relids);
 	}
 }
 
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index e5f4062bfb..98558f4661 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -869,15 +869,15 @@ build_join_rel(PlannerInfo *root,
  * 'restrictlist': list of RestrictInfo nodes that apply to this particular
  *		pair of joinable relations
  * 'sjinfo': child join's join-type details
+ * 'appinfos' and 'nappinfos': AppendRelInfo array for child relids
  */
 RelOptInfo *
 build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	 RelOptInfo *inner_rel, RelOptInfo *parent_joinrel,
-	 List *restrictlist, SpecialJoinInfo *sjinfo)
+	 List *restrictlist, SpecialJoinInfo *sjinfo,
+	 AppendRelInfo **appinfos, int nappinfos)
 {
 	RelOptInfo *joinrel = makeNode(RelOptInfo);
-	AppendRelInfo **appinfos;
-	int			nappinfos;
 
 	/* Only joins between "other" relations land here. */
 	Assert(IS_OTHER_REL(outer_rel) && IS_OTHER_REL(inner_rel));
@@ -885,16 +885,6 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	/* The parent joinrel should have consider_partitionwise_join set. */
 	Assert(parent_joinrel->consider_partitionwise_join);
 
-	/*
-	 * Find the AppendRelInfo structures for the child baserels.  We'll need
-	 * these for computing the child join's relid set, and later for mapping
-	 * Vars to the child rel.
-	 */
-	appinfos = find_appinfos_by_relids(root,
-	   bms_union(outer_rel->relids,
- inner_rel->relids),
-	   );
-
 	joinrel->reloptkind = RELOPT_OTHER_JOINREL;
 	joinrel->relids = adjust_child_relids(parent_joinrel->relids,
 		  nappinfos, appinfos);
@@ -1010,8 +1000,6 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 		nappinfos, appinfos,
 		parent_joinrel, joinrel);
 
-	pfree(appinfos);
-
 	return joinrel;
 }
 
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index c43d97b48a..90f062ce96 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -342,6 +342,7 @@ extern Bitmapset *get_param_path_clause_serials(Path *path);
 extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
 		RelOptInfo *outer_rel, RelOptInfo *inner_rel,
 		RelOptInfo *parent_joinrel, List *restrictlist,
-		

Re: Race condition in FetchTableStates() breaks synchronization of subscription tables

2024-02-06 Thread Alexander Lakhin

05.02.2024 13:13, vignesh C wrote:

Thanks for the steps for the issue, I was able to reproduce this issue
in my environment with the steps provided. The attached patch has a
proposed fix where the latch will not be set in case of the apply
worker exiting immediately after starting.


It looks like the proposed fix doesn't help when ApplyLauncherWakeup()
called by a backend executing CREATE SUBSCRIPTION command.
That is, with the v4-0002 patch applied and pg_usleep(30L); added
just below
    if (!worker_in_use)
    return worker_in_use;
I still observe the test 027_nosuperuser running for 3+ minutes:
t/027_nosuperuser.pl .. ok
All tests successful.
Files=1, Tests=19, 187 wallclock secs ( 0.01 usr  0.00 sys +  4.82 cusr  4.47 
csys =  9.30 CPU)

IIUC, it's because a launcher wakeup call, sent by "CREATE SUBSCRIPTION
regression_sub ...", gets missed when launcher waits for start of another
worker (logical replication worker for subscription "admin_sub"), launched
just before that command.

Best regards,
Alexander




clarify equalTupleDescs()

2024-02-06 Thread Peter Eisentraut
In a recent patch thread it was discussed[0] which fields should be 
compared by equalTupleDescs() and whether it is ok to remove a field 
from tuple descriptors and how that should affect their equality 
(attstattarget in that case).


After analyzing all the callers, I have noticed that there are two 
classes of callers of equalTupleDescs():


The first want to compare what I call row-type equality, which means 
they want to check specifically for equal number of attributes, and the 
same attribute names, types, and typmods for each attribute.  Most 
callers actually want that behavior.  The remaining callers just want to 
compare the tuple descriptors as they are, they don't care why the 
fields are in there, they just want to compare all of them.


In the attached patch, I add a new function equalRowTypes() that is 
effectively a subset of equalTupleDescs() and switch most callers to that.


The purpose of this patch is to make the semantics less uncertain. 
Questions like the one in [0] about attstattarget now have a clear 
answer for both functions.  I think this would be useful to have, as we 
are thinking about more changes in pg_attribute and tuple descriptors.



[0]: 
https://www.postgresql.org/message-id/202401101316.k4s3fomwjx52@alvherre.pgsqlFrom 7212a249b56eb5ac339dcd029526e7fb380932fe Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 6 Feb 2024 13:02:41 +0100
Subject: [PATCH v0] Separate equalRowTypes() from equalTupleDescs()

This introduces a new function equalRowTypes() that is effectively a
subset of equalTupleDescs() but only compares the number of attributes
and attribute name, type, and typmod.  This is enough for most
existing uses of equalTupleDescs(), which are changed to use the new
function.  The only remaining callers of equalTupleDescs() are those
that really want to check the tuple descriptor as such, without
concern about record to row or record type semantics.

The existing function hashTupleDesc() is renamed to hashRowType(),
because it now corresponds more to equalRowTypes().

The purpose of this change is to be clearer about the semantics of
equality asked for by each caller.  (At least one caller had a comment
that questioned whether equalTupleDescs() was too restrictive.)  For
example, 4f622503d6d removed attstattarget from the tuple descriptor
structure.  It was not fully clear at the time how this should affect
equalTupleDescs().  Now the answer is clear: By their own definitions,
equalRowTypes() does not care, and equalTupleDescs() just compares
whatever is in the tuple descriptor but does not care why it is in
there.
---
 src/backend/access/common/tupdesc.c | 58 +++--
 src/backend/catalog/pg_proc.c   |  2 +-
 src/backend/commands/analyze.c  |  4 +-
 src/backend/commands/view.c | 13 +++
 src/backend/utils/cache/plancache.c |  5 +--
 src/backend/utils/cache/typcache.c  | 10 ++---
 src/include/access/tupdesc.h|  4 +-
 7 files changed, 64 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/common/tupdesc.c 
b/src/backend/access/common/tupdesc.c
index bc80caee117..22d6708708e 100644
--- a/src/backend/access/common/tupdesc.c
+++ b/src/backend/access/common/tupdesc.c
@@ -414,11 +414,6 @@ DecrTupleDescRefCount(TupleDesc tupdesc)
 
 /*
  * Compare two TupleDesc structures for logical equality
- *
- * Note: we deliberately do not check the attrelid and tdtypmod fields.
- * This allows typcache.c to use this routine to see if a cached record type
- * matches a requested type, and is harmless for relcache.c's uses.
- * We don't compare tdrefcount, either.
  */
 bool
 equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
@@ -431,6 +426,8 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
if (tupdesc1->tdtypeid != tupdesc2->tdtypeid)
return false;
 
+   /* tdtypmod and tdrefcount are not checked */
+
for (i = 0; i < tupdesc1->natts; i++)
{
Form_pg_attribute attr1 = TupleDescAttr(tupdesc1, i);
@@ -561,17 +558,54 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
 }
 
 /*
- * hashTupleDesc
- * Compute a hash value for a tuple descriptor.
+ * equalRowTypes
  *
- * If two tuple descriptors would be considered equal by equalTupleDescs()
- * then their hash value will be equal according to this function.
+ * This determines whether two tuple descriptors have equal row types.  This
+ * means specifically:
+ *
+ * - same number of attributes
+ * - same composite type ID (but could both be zero)
+ * - corresponding attributes (in order) have the same name, type, and typmod
  *
- * Note that currently contents of constraint are not hashed - it'd be a bit
- * painful to do so, and conflicts just due to constraints are unlikely.
+ * This is used to check whether two record types are compatible, whether
+ * function return row types are the same, and other similar situations.
+ *
+ * Note: We deliberately do not check the 

Re: Memory consumed by paths during partitionwise join planning

2024-02-06 Thread Ashutosh Bapat
On Fri, Dec 15, 2023 at 5:22 AM Ashutosh Bapat
 wrote:

> >
> > That looks pretty small considering the benefits. What do you think?
> >
> > [1] 
> > https://www.postgresql.org/message-id/caexhw5stmouobe55pmt83r8uxvfcph+pvo5dnpdrvcsbgxe...@mail.gmail.com
>
> If you want to experiment, please use attached patches. There's a fix
> for segfault during initdb in them. The patches are still raw.

First patch is no longer required. Here's rebased set

The patches are raw. make check has some crashes that I need to fix. I
am waiting to hear whether this is useful and whether the design is on
the right track.

-- 
Best Wishes,
Ashutosh Bapat
From 0a80eea476b696ad980a0a63858e2658a1b5b0de Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Mon, 17 Jul 2023 10:44:18 +0530
Subject: [PATCH 1/3] Basic infrastructure to link, unlink and free pathnodes

add_path and add_partial_path free path nodes which they deem sub-optimal. But
they just free the uppermost pathnode in the path. The subpaths continue to
occupy memory even if they are not used anywhere. The subpath nodes are not
freed because they may be referenced by other paths. This commit introduces the
infrastructure to track references to paths.

As the path nodes are referenced they are "linked" using link_path() to
referencing objects. When the path nodes are no longer useful they are
"unlinked" using unlink_path() from the referencing objects. The paths whose
references reach 0 during unlinking are freed automatically using
free_path(). The function unlinks the sub path nodes and can be called
when freeing any path node.

When the final path for join tree is chosen the paths linked to each
participating relation are unlinked, thus ultimately getting freed if not used.

TODO
The functions free_path(), unlink_path() and link_path() have ereports
to catch code paths which do not use these function correctly. They call
errbacktrace() which is not used anywhere else. These ereport calls will
need to be fixed for productization.
---
 src/backend/optimizer/path/allpaths.c |  77 ++
 src/backend/optimizer/util/pathnode.c | 142 ++
 src/include/nodes/pathnodes.h |   1 +
 src/include/optimizer/pathnode.h  |  38 +++
 4 files changed, 258 insertions(+)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 84c4de488a..9c02f8adb5 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3388,6 +3388,81 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
 	}
 }
 
+/*
+ * TODO: Need similar code to free paths in upper relations.
+ */
+static void
+free_unused_paths_from_rel(RelOptInfo *rel)
+{
+	ListCell   *lc_path;
+	int			cnt_part;
+
+	foreach(lc_path, rel->pathlist)
+	{
+		Path	   *path = (Path *) lfirst(lc_path);
+
+		/* Free the path if none references it. */
+		if (path->ref_count == 1)
+		{
+			/* TODO: use routine to unlink path from list */
+			rel->pathlist = foreach_delete_current(rel->pathlist, lc_path);
+			unlink_path();
+		}
+	}
+
+	/* Do the same for partial pathlist. */
+	foreach(lc_path, rel->partial_pathlist)
+	{
+		Path	   *path = (Path *) lfirst(lc_path);
+
+		/* Free the path if none references it. */
+		if (path->ref_count == 1)
+		{
+			rel->partial_pathlist = foreach_delete_current(rel->partial_pathlist, lc_path);
+			unlink_path();
+		}
+	}
+
+	/*
+	 * TODO: We can perform this in generate_partitionwise_paths as well since
+	 * by that time all the paths from partitions will be linked if used.
+	 */
+
+	/* Free unused paths from the partition relations */
+	for (cnt_part = 0; cnt_part < rel->nparts; cnt_part++)
+	{
+		free_unused_paths_from_rel(rel->part_rels[cnt_part]);
+	}
+}
+
+/*
+ * Free unused paths from all the relations created while building the join tree.
+ */
+static void
+free_unused_paths(PlannerInfo *root, int levels_needed)
+{
+	int			cnt;
+
+	for (cnt = levels_needed - 1; cnt >= 1; cnt--)
+	{
+		ListCell   *lc;
+
+		foreach(lc, root->join_rel_level[cnt])
+		{
+			free_unused_paths_from_rel(lfirst(lc));
+		}
+	}
+
+	for (cnt = 0; cnt < root->simple_rel_array_size; cnt++)
+	{
+		RelOptInfo *rel = root->simple_rel_array[cnt];
+
+		/* Skip empty slots */
+		if (rel != NULL)
+			free_unused_paths_from_rel(rel);
+	}
+}
+
 /*
  * standard_join_search
  *	  Find possible joinpaths for a query by successively finding ways
@@ -3490,6 +3565,8 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 		}
 	}
 
+	free_unused_paths(root, levels_needed);
+
 	/*
 	 * We should have a single rel at the final level.
 	 */
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 2e1ec41a54..084d48b66b 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -915,6 +915,148 @@ add_partial_path_precheck(RelOptInfo *parent_rel, Cost total_cost,
 	return true;
 }
 
+void
+free_path(Path *path)
+{
+	/*
+	 * If 

Re: Why is subscription/t/031_column_list.pl failing so much?

2024-02-06 Thread Amit Kapila
On Tue, Feb 6, 2024 at 2:30 PM Alexander Lakhin  wrote:
>
> 06.02.2024 09:48, Amit Kapila wrote:
> > cool, is it possible to see whether this patch changes the runtime of
> > this test in any noticeable way?
> >
>
> Yes, unfortunately it does.
> I've measured duration of 100 tests runs without the patch (with pristine
> bgwriter and with NO_TEMP_INSTALL):
> real6m46,031s
> real6m52,406s
> real6m51,014s
>
> But with the patched test, I've got:
> real9m39,872s
> real9m40,044s
> real9m38,236s
> (nearly 2 seconds increase per one test run)
>
> Under Valgrind, the original test run takes:
> Files=1, Tests=36, 334 wallclock secs ( 0.02 usr  0.00 sys + 163.14 cusr  
> 7.98 csys = 171.14 CPU)
>
> But the patched one:
> Files=1, Tests=36, 368 wallclock secs ( 0.02 usr  0.00 sys + 182.16 cusr  
> 8.90 csys = 191.08 CPU)
> (30 seconds increase)
>

Yeah, I was worried about that. The other idea I have previously
thought was to change Alter Subscription to Drop+Create Subscription.
That should also help in bringing stability without losing any
functionality.

-- 
With Regards,
Amit Kapila.




Re: Synchronizing slots from primary to standby

2024-02-06 Thread Amit Kapila
On Tue, Feb 6, 2024 at 12:08 PM Peter Smith  wrote:
>
> Hi, I took another high-level look at all the funtion names of the
> slotsync.c file.
>
>
> Below are some suggestions (some are unchanged); probably there are
> better ideas for names but my point is that the current names could be
> improved:
>
> CURRENT SUGGESTION
...
> check_sync_slot_on_remote check_local_synced_slot_exists_on_remote
>

I think none of this seems to state the purpose of the function. I
suggest changing it to local_sync_slot_required() and returning false
either if the local_slot doesn't exist in remote_slot_list or is
invalidated.

-- 
With Regards,
Amit Kapila.




Re: [PATCH] Add native windows on arm64 support

2024-02-06 Thread Dave Cramer
On Wed, 31 Jan 2024 at 10:21, Andrew Dunstan  wrote:

>
> On 2024-01-30 Tu 17:54, Dave Cramer wrote:
>
>
>
>
> On Tue, Jan 30, 2024 at 4:56 PM Andrew Dunstan 
> wrote:
>
>>
>> On 2024-01-30 Tu 09:50, Dave Cramer wrote:
>>
>>
>>
>> On Tue, 30 Jan 2024 at 08:38, Andrew Dunstan  wrote:
>>
>>>
>>> On 2024-01-29 Mo 11:20, Dave Cramer wrote:
>>>
>>>
>>> Dave Cramer
>>> www.postgres.rocks
>>>
>>>
>>> On Mon, 29 Jan 2024 at 11:16, Andrew Dunstan 
>>> wrote:
>>>

 On 2024-01-26 Fr 09:18, Dave Cramer wrote:



 On Fri, 26 Jan 2024 at 07:36, Andrew Dunstan 
 wrote:

>
> On 2024-01-25 Th 20:32, Michael Paquier wrote:
> > On Thu, Jan 25, 2024 at 04:52:30PM -0500, Dave Cramer wrote:
> >> On Thu, 25 Jan 2024 at 16:32, Andrew Dunstan 
> wrote:
> >>> On 2024-01-25 Th 16:17, Dave Cramer wrote:
> >>> Yeah, I think the default Developer Command Prompt for VS2022 is
> set up
> >>> for x86 builds. AIUI you should start by executing "vcvarsall
> x64_arm64".
> >> Yup, now I'm in the same state you are
> > Wait a minute here.  Based on [1], x64_arm64 means you can use a x64
> > host and you'll be able to produce ARM64 builds, still these will not
> > be able to run on the host where they were built.  How much of the
> > patch posted upthread is required to produce such builds?  Basically
> > everything from it, I guess, so as build dependencies can be
> > satisfied?
> >
> > [1]:
> https://learn.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-170
>
>
> If you look at the table here x86 and x64 are the only supported host
> architectures. But that's OK, the x64 binaries will run on arm64 (W11
> ARM64 has x64 emulation builtin). If that didn't work Dave and I would
> not have got as far as we have. But you want the x64_arm64 argument to
> vcvarsall so you will get ARM64 output.
>

 I've rebuilt it using  x64_arm64 and with the attached (very naive
 patch) and I still get an x64 binary :(


 With this patch I still get a build error, but it's different :-)


 [1406/2088] "link" @src/backend/postgres.exe.rsp
 FAILED: src/backend/postgres.exe src/backend/postgres.pdb
 "link" @src/backend/postgres.exe.rsp
Creating library src\backend\postgres.exe.lib

 storage_lmgr_s_lock.c.obj : error LNK2019: unresolved external symbol
 spin_delay referenced in function perform_spin_delay

 src\backend\postgres.exe : fatal error LNK1120: 1 unresolved externals

>>>
>>> Did you add the latest lock.patch ?
>>>
>>>
>>>
>>>
>>> I'm a bit confused about exactly what needs to be applied. Can you
>>> supply a complete patch to be applied to a pristine checkout that will let
>>> me build?
>>>
>>>
>>> cheers
>>>
>>
>> See attached.
>>
>>
>>
>> No, that is what is giving me the error shown above (just tried again to
>> be certain). And it's not surprising, as patch 2 #ifdef's out the
>> definition of spin_delay().
>>
>> If you can get a complete build with these patches then I suspect you're
>> not doing a proper ARM64 build.
>>
>
> Okay I will look when I get home in a week
>
>
> I made some progress. The attached is mostly taken from
> 
> 
>
> With it applied I was able to get a successful build using the buildfarm
> client. However, there are access violations when running some tests, so
> there is still some work to do, apparently.
>
>
> cheers
>
>
> andrew
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>

Thanks, this patch works and
testing with meon passes.

I'll try the buildfarm next.

Dave


Re: Memory leak fix in rmtree.c

2024-02-06 Thread Daniel Gustafsson
> On 6 Feb 2024, at 12:45, Ильясов Ян  wrote:
> 
> I agree with your argument.
> Thank you for your time.

Thank you for your report!

--
Daniel Gustafsson





Re: recently added jsonpath method change jsonb_path_query, jsonb_path_query_first immutability

2024-02-06 Thread Andrew Dunstan


On 2024-02-05 Mo 22:06, jian he wrote:


Hi.
this commit [0] changes immutability of jsonb_path_query, 
jsonb_path_query_first? If so, it may change other functions also.


demo:

begin;
SET LOCAL TIME ZONE 10.5;
with cte(s) as (select jsonb '"2023-08-15 12:34:56 +05:30"')

select jsonb_path_query(s, 
'$.timestamp_tz()')::text,'+10.5'::text,'timestamp_tz'::text from cte

union all
select jsonb_path_query(s, '$.time()')::text,'+10.5'::text, 
'time'::text from cte

union all
select jsonb_path_query(s, 
'$.timestamp()')::text,'+10.5'::text,'timestamp'::text from cte

union all
select jsonb_path_query(s, '$.date()')::text,'+10.5'::text, 
'date'::text from cte

union all
select jsonb_path_query(s, '$.time_tz()')::text,'+10.5'::text, 
'time_tz'::text from cte;


SET LOCAL TIME ZONE -8;
with cte(s) as (select jsonb '"2023-08-15 12:34:56 +05:30"')
select jsonb_path_query(s, 
'$.timestamp_tz()')::text,'-8'::text,'timestamp_tz'::text from cte

union all
select jsonb_path_query(s, '$.time()')::text,'-8'::text, 'time'::text 
from cte

union all
select jsonb_path_query(s, 
'$.timestamp()')::text,'-8'::text,'timestamp'::text from cte

union all
select jsonb_path_query(s, '$.date()')::text,'-8'::text, 'date'::text 
from cte

union all
select jsonb_path_query(s, '$.time_tz()')::text,'-8'::text, 
'time_tz'::text from cte;

commit;


[0] 
https://git.postgresql.org/cgit/postgresql.git/commit/?id=66ea94e8e606529bb334515f388c62314956739e



ouch. Good catch. Clearly we need to filter these like we do for the 
.datetime() method.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


RE: Memory leak fix in rmtree.c

2024-02-06 Thread Ильясов Ян
I agree with your argument.
Thank you for your time.


Kind regards,
Ian Ilyasov

Juniour Software Developer at Postgres Professional



Re: Synchronizing slots from primary to standby

2024-02-06 Thread Amit Kapila
On Tue, Feb 6, 2024 at 3:57 PM Dilip Kumar  wrote:
>
> On Tue, Feb 6, 2024 at 3:41 PM Amit Kapila  wrote:
> >
> > On Tue, Feb 6, 2024 at 3:23 PM Dilip Kumar  wrote:
> > >
> > > On Tue, Feb 6, 2024 at 1:09 PM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Tue, Feb 6, 2024 at 3:19 PM Amit Kapila  
> > > > wrote:
> > > > >
> > > > > On Mon, Feb 5, 2024 at 7:56 PM Masahiko Sawada 
> > > > >  wrote:
> > > > > >
> > > > > > ---
> > > > > > Since Two processes (e.g. the slotsync worker and
> > > > > > pg_sync_replication_slots()) concurrently fetch and update the slot
> > > > > > information, there is a race condition where slot's
> > > > > > confirmed_flush_lsn goes backward.
> > > > > >
> > > > >
> > > > > Right, this is possible, though there shouldn't be a problem because
> > > > > anyway, slotsync is an async process. Till we hold restart_lsn, the
> > > > > required WAL won't be removed. Having said that, I can think of two
> > > > > ways to avoid it: (a) We can have some flag in shared memory using
> > > > > which we can detect whether any other process is doing slot
> > > > > syncronization and then either error out at that time or simply wait
> > > > > or may take nowait kind of parameter from user to decide what to do?
> > > > > If this is feasible, we can simply error out for the first version and
> > > > > extend it later if we see any use cases for the same (b) similar to
> > > > > restart_lsn, if confirmed_flush_lsn is getting moved back, raise an
> > > > > error, this is good for now but in future we may still have another
> > > > > similar issue, so I would prefer (a) among these but I am fine if you
> > > > > prefer (b) or have some other ideas like just note down in comments
> > > > > that this is a harmless case and can happen only very rarely.
> > > >
> > > > Thank you for sharing the ideas. I would prefer (a). For (b), the same
> > > > issue still happens for other fields.
> > >
> > > I agree that (a) looks better.  On a separate note, while looking at
> > > this API pg_sync_replication_slots(PG_FUNCTION_ARGS) shouldn't there
> > > be an optional parameter to give one slot or multiple slots or all
> > > slots as default, that will give better control to the user no?
> > >
> >
> > As of now, we want to give functionality similar to slotsync worker
> > with a difference that users can use this new function for planned
> > switchovers. So, syncing all failover slots by default. I think if
> > there is a use case to selectively sync some of the failover slots
> > then we can probably extend this function and slotsync worker as well.
> > Normally, if the primary goes down due to whatever reason users would
> > want to restart the replication for all the defined publications via
> > existing failover slots. Why would anyone want to do it partially?
>
> If we consider the usability of such a function (I mean as it is
> implemented now, without any argument) one use case could be that if
> the slot sync worker is not keeping up or at some point in time the
> user doesn't want to wait for the worker to do this instead user can
> do it by himself.
>

Possibly, but I was imagining that it would be used for planned
switchover cases and also for testing the core sync slot functionality
in our TAP tests.

> So now if we have such a functionality then it would be even better to
> extend it to selectively sync the slot.  For example, if there is some
> issue in syncing all slots, maybe some bug or taking a long time to
> sync because there are a lot of slots but if the user needs to quickly
> failover and he/she is interested in only a couple of slots then such
> a option could be helpful. no?
>

I see your point but not sure how useful it is in the field. I am fine
if others also think such a parameter will be useful and anyway I
think we can even extend it after v1 is done.

-- 
With Regards,
Amit Kapila.




Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-02-06 Thread Dilip Kumar
On Tue, Feb 6, 2024 at 4:23 PM Alvaro Herrera  wrote:
>
> > > (We also have SimpleLruTruncate, but I think it's not as critical to
> > > have a barrier there anyhow: accessing a slightly outdated page number
> > > could only be a problem if a bug elsewhere causes us to try to truncate
> > > in the current page.  I think we only have this code there because we
> > > did have such bugs in the past, but IIUC this shouldn't happen anymore.)
> >
> > +1, I agree with this theory in general.  But the below comment in
> > SimpleLruTrucate in your v3 patch doesn't seem correct, because here
> > we are checking if the latest_page_number is smaller than the cutoff
> > if so we log it as wraparound and skip the whole thing and that is
> > fine even if we are reading with atomic variable and slightly outdated
> > value should not be a problem but the comment claim that this safe
> > because we have the same bank lock as SimpleLruZeroPage(), but that's
> > not true here we will be acquiring different bank locks one by one
> > based on which slotno we are checking.  Am I missing something?
>
> I think you're correct.  I reworded this comment, so now it says this:
>
> /*
>  * An important safety check: the current endpoint page must not be
>  * eligible for removal.  This check is just a backstop against wraparound
>  * bugs elsewhere in SLRU handling, so we don't care if we read a slightly
>  * outdated value; therefore we don't add a memory barrier.
>  */
>
> Pushed with those changes.  Thank you!

Yeah, this looks perfect, thanks.

> Now I'll go rebase the rest of the patch on top.

Okay, I will review and test after that.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Synchronizing slots from primary to standby

2024-02-06 Thread Amit Kapila
On Tue, Feb 6, 2024 at 3:33 PM Amit Kapila  wrote:
>
> On Tue, Feb 6, 2024 at 1:09 PM Masahiko Sawada  wrote:
> >
> > On Tue, Feb 6, 2024 at 3:19 PM Amit Kapila  wrote:
> > >
> > > I think users can refer to LOGs to see if it has changed since the
> > > first time it was configured. I tried by existing parameter and see
> > > the following in LOG:
> > > LOG:  received SIGHUP, reloading configuration files
> > > 2024-02-06 11:38:59.069 IST [9240] LOG:  parameter "autovacuum" changed 
> > > to "on"
> > >
> > > If the user can't confirm then it is better to follow the steps
> > > mentioned in the patch. Do you want something else to be written in
> > > docs for this? If so, what?
> >
> > IIUC even if a wrong slot name is specified to standby_slot_names or
> > even standby_slot_names is empty, the standby server might not be
> > lagging behind the subscribers depending on the timing. But when
> > checking it the next time, the standby server might lag behind the
> > subscribers. So what I wanted to know is how the user can confirm if a
> > failover-enabled subscription is ensured not to go in front of
> > failover-candidate standbys (i.e., standbys using the slots listed in
> > standby_slot_names).
> >
>
> But isn't the same explained by two steps ((a) Firstly, on the
> subscriber node check the last replayed WAL. (b) Next, on the standby
> server check that the last-received WAL location is ahead of the
> replayed WAL location on the subscriber identified above.) in the
> latest *_0004 patch.
>

Additionally, I would like to add that the users can use the queries
mentioned in the doc after the primary has failed and before promoting
the standby. If she wants to do that when both primary and standby are
available, the value of 'standby_slot_names' on primary should be
referred. Isn't those two sufficient that there won't be false
positives?

-- 
With Regards,
Amit Kapila.




Allow passing extra options to initdb for tests

2024-02-06 Thread Peter Eisentraut
I'm proposing here a way to pass extra options to initdb when run 
internally during test setup in pg_regress or 
PostgreSQL::Test::Cluster's init (which covers just about all test 
suites other than initdb's own tests).


For example:

make check PG_TEST_INITDB_EXTRA_OPTS='-k -c work_mem=50MB'

We currently document at [0] a way to pass custom server settings to the 
tests via PGOPTIONS.  But this only works for pg_regress and only for 
options that can be changed at run time.  My proposal can set initdb 
options, and since initdb has the -c option now, it can set any GUC 
parameter as well.


I think this can be useful for a wide variety of uses, like running all 
tests with checksums enabled, or with JIT enabled, or with different GUC 
settings, or with different locale settings.  (The existing pg_regress 
--no-locale option is essentially a special case of this, but it only 
provides one particular locale setting, not things like changing the 
default provider etc.)


Of course, not all tests are going to pass with arbitrary options, but 
it is useful to run this against specific test suites.


This patch also updates the documentation at [0] to explain the new 
method and distinguish it from the previously documented methods.



[0]: 
https://www.postgresql.org/docs/devel/regress-run.html#REGRESS-RUN-CUSTOM-SETTINGS
From f2dad8de9af12852ac022e7b43c2996b343aeb23 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 6 Feb 2024 10:44:55 +0100
Subject: [PATCH v0] Allow passing extra options to initdb for tests

Setting the environment variable PG_TEST_INITDB_EXTRA_OPTS passes
extra options to initdb run by pg_regress or
PostgreSQL::Test::Cluster's init.
---
 doc/src/sgml/regress.sgml| 34 +++-
 src/test/perl/PostgreSQL/Test/Cluster.pm |  7 +
 src/test/regress/pg_regress.c|  7 -
 3 files changed, 40 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 70d9bdefe1b..6a27aae3195 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -390,12 +390,37 @@ Locale and Encoding
Custom Server Settings
 

-Custom server settings to use when running a regression test suite can be
+There are several ways to use custom server settings when running a test
+suite.  This can be useful to enable additional logging, adjust resource
+limits, or enable extra run-time checks such as .  But note that not all tests can be
+expected to pass cleanly with arbitrary settings.
+   
+
+   
+Extra options can be passed to the various initdb
+commands that are run internally during test setup using the environment
+variable PG_TEST_INITDB_EXTRA_OPTS.  For example, to run a
+test with checksums enabled and a custom WAL segment size and
+work_mem setting, use:
+
+make check PG_TEST_INITDB_EXTRA_OPTS='-k --wal-segsize=4 -c work_mem=50MB'
+
+   
+
+   
+For the core regression test suite and other tests driven by
+pg_regress, custom run-time server settings can also be
 set in the PGOPTIONS environment variable (for settings
-that allow this):
+that allow this), for example:
 
 make check PGOPTIONS="-c debug_parallel_query=regress -c work_mem=50MB"
 
+(This makes use of functionality provided by libpq; see  for details.)
+   
+
+   
 When running against a temporary installation, custom settings can also be
 set by supplying a pre-written postgresql.conf:
 
@@ -405,11 +430,6 @@ Custom Server Settings
 

 
-   
-This can be useful to enable additional logging, adjust resource limits,
-or enable extra run-time checks such as .
-   
   
 
   
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm 
b/src/test/perl/PostgreSQL/Test/Cluster.pm
index e2e70d0dbf9..07da74cf562 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -114,6 +114,7 @@ use Socket;
 use Test::More;
 use PostgreSQL::Test::Utils  ();
 use PostgreSQL::Test::BackgroundPsql ();
+use Text::ParseWords qw(shellwords);
 use Time::HiRes  qw(usleep);
 use Scalar::Util qw(blessed);
 
@@ -519,6 +520,12 @@ sub init
$params{allows_streaming} = 0 unless defined $params{allows_streaming};
$params{has_archiving} = 0 unless defined $params{has_archiving};
 
+   my $initdb_extra_opts_env = $ENV{PG_TEST_INITDB_EXTRA_OPTS};
+   if (defined $initdb_extra_opts_env)
+   {
+   push @{ $params{extra} }, shellwords($initdb_extra_opts_env);
+   }
+
mkdir $self->backup_dir;
mkdir $self->archive_dir;
 
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index c894005dac0..f1f6011ae0a 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2306,6 +2306,7 @@ regression_main(int argc, char *argv[],
const char *keywords[4];
const char 

Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-02-06 Thread Alvaro Herrera
On 2024-Feb-04, Andrey M. Borodin wrote:

> This patch uses wording "banks" in comments before banks start to
> exist. But as far as I understand, it is expected to be committed
> before "banks" patch.

True -- changed that to use ControlLock.

> Besides this patch looks good to me.

Many thanks for reviewing.

On 2024-Feb-05, Dilip Kumar wrote:

> > (We also have SimpleLruTruncate, but I think it's not as critical to
> > have a barrier there anyhow: accessing a slightly outdated page number
> > could only be a problem if a bug elsewhere causes us to try to truncate
> > in the current page.  I think we only have this code there because we
> > did have such bugs in the past, but IIUC this shouldn't happen anymore.)
> 
> +1, I agree with this theory in general.  But the below comment in
> SimpleLruTrucate in your v3 patch doesn't seem correct, because here
> we are checking if the latest_page_number is smaller than the cutoff
> if so we log it as wraparound and skip the whole thing and that is
> fine even if we are reading with atomic variable and slightly outdated
> value should not be a problem but the comment claim that this safe
> because we have the same bank lock as SimpleLruZeroPage(), but that's
> not true here we will be acquiring different bank locks one by one
> based on which slotno we are checking.  Am I missing something?

I think you're correct.  I reworded this comment, so now it says this:

/*
 * An important safety check: the current endpoint page must not be
 * eligible for removal.  This check is just a backstop against wraparound
 * bugs elsewhere in SLRU handling, so we don't care if we read a slightly
 * outdated value; therefore we don't add a memory barrier.
 */

Pushed with those changes.  Thank you!

Now I'll go rebase the rest of the patch on top.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Having your biases confirmed independently is how scientific progress is
made, and hence made our great society what it is today" (Mary Gardiner)




Re: RFC: Logging plan of the running query

2024-02-06 Thread Ashutosh Bapat
Hi Atsushi,


On Mon, Jan 29, 2024 at 6:32 PM torikoshia  wrote:
>
> Hi,
>
> Updated the patch to fix typos and move
> ProcessLogQueryPlanInterruptActive from errfinish() to AbortTransaction.
>
>
> BTW since the thread is getting long, I list the some points of the
> discussion so far:
>
> # Safety concern
> ## Catalog access inside CFI
> - it seems safe if the CFI call is inside an existing valid
> transaction/query state[1]
>
> - We did some tests, for example calling ProcessLogQueryPlanInterrupt()
> in every single CHECK_FOR_INTERRUPTS()[2]. This test passed on my env
> but was stucked on James's env, so I modified to exit
> ProcessLogQueryPlanInterrupt() when target process is inside of lock
> acquisition code[3]
>
> ## Risk of calling EXPLAIN code in CFI
> - EXPLAIN is not a simple logic code, and there exists risk calling it
> from CFI. For example, if there is a bug, we may find ourselves in a
> situation where we can't cancel the query
>
> - it's a trade-off that's worth making for the introspection benefits
> this patch would provide?[4]
>
> # Design
> - Although some suggested it should be in auto_explain, current patch
> introduces this feature to the core[5]
>
> - When the target query is nested, only the most inner query's plan is
> explained. In future, all the nested queries' plans are expected to
> explained optionally like auto_explain.log_nested_statements[6]
>
> - When the target process is a parallel worker, the plan is not shown[6]
>
> - When the target query is nested and its subtransaction is aborted,
> pg_log_query_plan cannot log the parental query plan after the abort
> even parental query is running[7]
>
> - The output corresponds to EXPLAIN with VERBOSE, COST, SETTINGS and
> FORMAT text. It doesn't do ANALYZE or show the progress of the query
> execution. Future work proposed by Rafael Thofehrn Castro may realize
> this[8]
>
> - To prevent assertion error, this patch ensures no page lock is held by
> checking all the LocalLock entries before running explain code, but
> there is a discussion that ginInsertCleanup() should be modified[9]
>
>
> It may be not so difficult to improve some of restrictions in "Design",
> but I'd like to limit the scope of the 1st patch to make it simpler.

Thanks for the summary. It is helpful. I think patch is also getting better.

I have a few questions and suggestions
1. Prologue of GetLockMethodLocalHash() mentions
 * NOTE: When there are many entries in LockMethodLocalHash, calling this
 * function and looking into all of them can lead to performance problems.
 */
How bad this performance could be. Let's assume that a query is taking
time and pg_log_query_plan() is invoked to examine the plan of this
query. Is it possible that the looping over all the locks itself takes
a lot of time delaying the query execution further?

2. What happens if auto_explain is enabled in the backend and
pg_log_query_plan() is called on the same backend? Will they conflict?
I think we should add a test for the same.

Using injection point support we should be able to add tests for
testing pg_log_query_plan behaviour when there are page locks held or
when auto_explain (with instrumentation) and pg_log_query_plan() work
on the same query plan. Use injection point to make the backend
running query wait at a suitable point to delay its execution and fire
pg_log_query_plan() from other backend. May be the same test could
examine the server log file to see if the plan is indeed output to the
server log file.

Given that the feature will be used when the things have already gone
wrong, it should not make things more serious. So more testing and
especially automated would help.

-- 
Best Wishes,
Ashutosh Bapat




Re: Printing backtrace of postgres processes

2024-02-06 Thread Michael Paquier
On Fri, Jan 26, 2024 at 11:58:00PM +0530, Bharath Rupireddy wrote:
> You probably were looking at v21, the above change isn't there in
> versions after that. Can you please review the latest version v26
> attached here?
> 
> We might want this patch extended to the newly added walsummarizer
> process which I'll do so in the next version.

--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
+bool
+SendProcSignalBackendOrAuxproc(int pid, ProcSignalReason reason)
+{

Looking at 0001.  This API is a thin wrapper of SendProcSignal(), that
just checks that we have actually a process before using it.
Shouldn't it be in procsignal.c.

Now looking at 0002, this new routine is used in one place.  Seeing
that we have something similar in pgstatfuncs.c, wouldn't it be
better, instead of englobing SendProcSignal(), to have one routine
that's able to return a PID for a PostgreSQL process?

All the backtrace-related handling is stored in procsignal.c, could it
be cleaner to move the internals into a separate, new file, like
procbacktrace.c or something similar?

LoadBacktraceFunctions() is one more thing we need to set up in all
auxiliary processes.  That's a bit sad to require that in all these
places, and we may forget to add it.  Could we put more efforts in
centralizing these initializations?  The auxiliary processes are one
portion of the problem, and getting stack traces for backend processes
would be useful on its own.  Another suggestion that I'd propose to
simplify the patch would be to focus solely on backends for now, and
do something for auxiliary process later on.  If you do that, the
strange layer with BackendOrAuxproc() is not a problem anymore, as it
would be left out for now.

+
+logging current backtrace of process with PID 3499242:
+postgres: ubuntu postgres [local] INSERT(+0x61a355)[0x5559b94de355]
+postgres: ubuntu postgres [local] 
INSERT(procsignal_sigusr1_handler+0x9e)[0x5559b94de4ef]

This is IMO too much details for the documentation, and could be
confusing depending on how the code evolves in the future.  I'd
suggest to keep it minimal, cutting that to a few lines.  I don't see
a need to mention ubuntu, either.
--
Michael


signature.asc
Description: PGP signature


Re: Memory leak fix in rmtree.c

2024-02-06 Thread Daniel Gustafsson
> On 6 Feb 2024, at 11:21, Ильясов Ян  wrote:
> 
> > dirnames isn't allocated at this point, it's palloc'd after this return
> > statement on line 67.
> 
> I am sorry, I pointed on the wrong branch. I see that in master
> it is really in line 67th , and the allocation goes well. But in
> REL_16_STABLE the allocation is in line 58th and my patch is for this branch 
> only.

Ok, that makes more sense.  That was changed in commit f1e9f6bbfa53, and in the
discussion for that patch it was deemed that no callers actually suffered from
the leak as they were exiting (or erroring out) on error, so it was never back-
patched.

  
https://www.postgresql.org/message-id/flat/CAEudQAoN3-2ZKBALThnEk_q2hu8En5A0WG9O%2B5siJTQKVZzoWQ%40mail.gmail.com

That still holds true today, so I don't see a big incentive to spend energy on
backpatching that since it mainly serves to silence analyzers.  Grepping for
the pattern of allocating in the declaration doesn't reveal any other codepaths
where the allocation leaks like this.

--
Daniel Gustafsson





Re: Synchronizing slots from primary to standby

2024-02-06 Thread Amit Kapila
On Tue, Feb 6, 2024 at 9:35 AM Peter Smith  wrote:
>
> ==
> GENERAL
>
> 1.
> Should the "Chapter 30 Logical Replication" at least have another
> section that mentions the feature of slot synchronization so the
> information about it is easier to find? It doesn't need to say much --
> just give a reference to the other sections where it is explained
> already.
>

We can think of something like Failover/Switchover but that we can do
at the end once we get the worker patch and other work not with the
first patch.

>
> 6.
> +  
> +role="func_table_entry">
> +
> + pg_sync_replication_slots
>
> Currently, this is in section "9.27.6 Replication Management
> Functions", but I wondered if it should also have some mention in the
> "9.27.4. Recovery Control Functions" section.
>

I feel this is more suited to "Replication Management Functions"
because the other section talks about functions used during recovery
whereas we won't do anything for slotsync during recovery.

-- 
With Regards,
Amit Kapila.




Re: Synchronizing slots from primary to standby

2024-02-06 Thread Dilip Kumar
On Tue, Feb 6, 2024 at 3:41 PM Amit Kapila  wrote:
>
> On Tue, Feb 6, 2024 at 3:23 PM Dilip Kumar  wrote:
> >
> > On Tue, Feb 6, 2024 at 1:09 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Tue, Feb 6, 2024 at 3:19 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Mon, Feb 5, 2024 at 7:56 PM Masahiko Sawada  
> > > > wrote:
> > > > >
> > > > > ---
> > > > > Since Two processes (e.g. the slotsync worker and
> > > > > pg_sync_replication_slots()) concurrently fetch and update the slot
> > > > > information, there is a race condition where slot's
> > > > > confirmed_flush_lsn goes backward.
> > > > >
> > > >
> > > > Right, this is possible, though there shouldn't be a problem because
> > > > anyway, slotsync is an async process. Till we hold restart_lsn, the
> > > > required WAL won't be removed. Having said that, I can think of two
> > > > ways to avoid it: (a) We can have some flag in shared memory using
> > > > which we can detect whether any other process is doing slot
> > > > syncronization and then either error out at that time or simply wait
> > > > or may take nowait kind of parameter from user to decide what to do?
> > > > If this is feasible, we can simply error out for the first version and
> > > > extend it later if we see any use cases for the same (b) similar to
> > > > restart_lsn, if confirmed_flush_lsn is getting moved back, raise an
> > > > error, this is good for now but in future we may still have another
> > > > similar issue, so I would prefer (a) among these but I am fine if you
> > > > prefer (b) or have some other ideas like just note down in comments
> > > > that this is a harmless case and can happen only very rarely.
> > >
> > > Thank you for sharing the ideas. I would prefer (a). For (b), the same
> > > issue still happens for other fields.
> >
> > I agree that (a) looks better.  On a separate note, while looking at
> > this API pg_sync_replication_slots(PG_FUNCTION_ARGS) shouldn't there
> > be an optional parameter to give one slot or multiple slots or all
> > slots as default, that will give better control to the user no?
> >
>
> As of now, we want to give functionality similar to slotsync worker
> with a difference that users can use this new function for planned
> switchovers. So, syncing all failover slots by default. I think if
> there is a use case to selectively sync some of the failover slots
> then we can probably extend this function and slotsync worker as well.
> Normally, if the primary goes down due to whatever reason users would
> want to restart the replication for all the defined publications via
> existing failover slots. Why would anyone want to do it partially?

If we consider the usability of such a function (I mean as it is
implemented now, without any argument) one use case could be that if
the slot sync worker is not keeping up or at some point in time the
user doesn't want to wait for the worker to do this instead user can
do it by himself.

So now if we have such a functionality then it would be even better to
extend it to selectively sync the slot.  For example, if there is some
issue in syncing all slots, maybe some bug or taking a long time to
sync because there are a lot of slots but if the user needs to quickly
failover and he/she is interested in only a couple of slots then such
a option could be helpful. no?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




RE: Memory leak fix in rmtree.c

2024-02-06 Thread Ильясов Ян
> dirnames isn't allocated at this point, it's palloc'd after this return
> statement on line 67.
>
> --
> Daniel Gustafsson

I am sorry, I pointed on the wrong branch. I see that in master
it is really in line 67th , and the allocation goes well. But in
REL_16_STABLE the allocation is in line 58th and my patch is for this branch 
only.

Kind regards,
Ian Ilyasov.






Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2024-02-06 Thread Yugo NAGATA
On Mon, 5 Feb 2024 14:26:46 +0800
jian he  wrote:

> On Mon, Feb 5, 2024 at 10:29 AM torikoshia  wrote:
> >
> > Hi,
> >
> > On 2024-02-03 15:22, jian he wrote:
> > > The idea of on_error is to tolerate errors, I think.
> > > if a column has a not null constraint, let it cannot be used with
> > > (on_error 'null')
> >
> > > +   /*
> > > +* we can specify on_error 'null', but it can only apply to
> > > columns
> > > +* don't have not null constraint.
> > > +   */
> > > +   if (att->attnotnull && cstate->opts.on_error ==
> > > COPY_ON_ERROR_NULL)
> > > +   ereport(ERROR,
> > > +   (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> > > +errmsg("copy on_error 'null' cannot be used with
> > > not null constraint column")));
> >
> > This means we cannot use ON_ERROR 'null' even when there is one column
> > which have NOT NULL constraint, i.e. primary key, right?
> > IMHO this is strong constraint and will decrease the opportunity to use
> > this feature.
> 
> I don't want to fail in the middle of bulk inserts,
> so I thought immediately erroring out would be a great idea.
> Let's see what other people think.

I also think this restriction is too strong because it is very
common that a table has a primary key, unless there is some way
to specify columns that can be set to NULL. Even when ON_ERROR
is specified, any constraint violation errors cannot be generally
ignored, so we cannot elimiate the posibility COPY FROM fails in
the middle due to invalid data, anyway.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: doc patch: Spell I/O consistently

2024-02-06 Thread Dagfinn Ilmari Mannsåker
Michael Paquier  writes:

> On Mon, Feb 05, 2024 at 03:59:27PM +, Dagfinn Ilmari Mannsåker wrote:
>> I just noticed that a couple of places in the docs spell I/O as IO or
>> even io when not referring to literal table/view/column names or values
>> therein.  Here's a patch to fix them.
>
> Makes sense to me to be consistent in these sections of the docs, so
> applied.  Thanks!

Thanks!

- ilmari




Re: Question on LWLockMode in dsa.c

2024-02-06 Thread Dilip Kumar
On Tue, Jan 30, 2024 at 6:24 AM Masahiko Sawada  wrote:
>
> Hi,
>
> While working on radix tree patch[1], John Naylor found that dsa.c
> doesn't already use shared locks even in dsa_dump(). dsa_dump() seems
> a pure read-only function so I thought we could use a shared lock mode
> there. Is there any reason to use exclusive mode even in dsa_dump()?
>
> Ultimately, since we're trying to add a new function
> dsa_get_total_size() that just returns
> dsa_area_control.total_segment_size and therefore would also be a
> read-only function, I'd like to find out the correct lock mode there.
>

Doesn't seem like there is any reason for this to be an exclusive lock.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: POC: Extension for adding distributed tracing - pg_tracing

2024-02-06 Thread Anthonin Bonnefoy
Hi!

On Fri, Jan 26, 2024 at 1:43 PM Nikita Malakhov  wrote:
> It's a good idea to split a big patch into several smaller ones.
> But you've already implemented these features - you could provide them
> as sequential small patches (i.e. v13-0002-guc-context-propagation.patch and 
> so on)

Keeping a consistent set of patches is going to be hard as any change
in the first patch would require rebasing that are likely going to
conflict. I haven't discarded the features but keeping them separated
could help with the discussion.

On Fri, Feb 2, 2024 at 1:41 PM Heikki Linnakangas  wrote:
>
> Hi Anthonin,
>
> I'm only now catching up on this thread. Very exciting feature!
Thanks! And thanks for the comments.

> - You're relying on various existing hooks, but it might be better to
> have the 'begin_span'/'end_span' calls directly in PostgreSQL source
> code in the right places. There are more places where spans might be
> nice, like when performing a large sort in tuplesort.c to name one
> example. Or even across every I/O; not sure how much overhead the spans
> incur and how much we'd be willing to accept. 'begin_span'/'end_span'
> could be new hooks that normally do nothing, but your extension would
> implement them.

Having information that would only be available to tracing doesn't
sound great. It probably should be added to the core instrumentation
which could be then leveraged by tracing and other systems.
One of the features I initially implemented was to track parse start
and end. However, since only the post parse hook is available, I could
only infer when the parse started. Changing the core to track parse
start would make this more easier and it could be used by EXPLAIN and
pg_stat_statements.
In the end, I've tried to match what's done in pg_stat_statements
since the core idea is the same, except pg_tracing outputs spans
instead of statistics.

> - How to pass the tracing context from the client? You went with
> SQLCommenter and others proposed a GUC. A protocol extension would make
> sense too. I can see merits to all of those. It probably would be good
> to support multiple mechanisms, but some might need core changes. It
> might be good to implement the mechanism for accepting trace context in
> core. Without the extension, we could still include the trace ID in the
> logs, for example.

I went with SQLCommenter for the initial implementation since it has
the benefit of already being implemented and working out of the box.
Jelte has an ongoing patch to add the possibility to modify GUC values
through a protocol message [1]. I've tested it and it worked well
without adding too much complexity in pg_tracing. I imagine having a
dedicated traceparent GUC variable could be also used in the core to
propagate in the logs and other places.

> - Include EXPLAIN plans in the traces.
This was already implemented in the first versions (or rather, the
planstate was translated to spans) of the patch. I've removed it in
the latest patch as it was contributing to 50% of the code and also
required a core change, namely a change in instrument.c to track the
first start of a node. Though if you were thinking of the raw EXPLAIN
ANALYZE text dumped as a span metadata, that could be doable.

> # Comments on the implementation
>
> There was discussion already on push vs pull model. Currently, you
> collect the traces in memory / files, and require a client to poll them.
> A push model is more common in applications that support tracing. If
> Postgres could connect directly to the OTEL collector, you'd need one
> fewer running component. You could keep the traces in backend-private
> memory, no need to deal with shared memory and spilling to files.

PostgreSQL metrics are currently exposed through tables like pg_stat_*
and users likely have collectors pulling those metrics (like OTEL
PostgreSQL Receiver [2]). Having this collector also pulling traces
would keep the logic in a single place and would avoid a situation
where both pull and push is used.

> 2. Start a new pgsql-hackers thread on in-core changes that would
> benefit the extension. Write patches for them. I'm thinking of the
> things I listed in the Core changes section above, but maybe there are
> others.
I already have an ongoing patch (more support for extended protocol in
psql [3] needed for tests) and plan to do others (the parse changes
I've mentioned). Though I also don't want to spread myself too thin.


[1]: https://commitfest.postgresql.org/46/4736/
[2]: 
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/postgresqlreceiver/README.md
[3]: https://commitfest.postgresql.org/46/4650/

On Fri, Feb 2, 2024 at 5:42 PM Jan Katins  wrote:
>
> Hi!
>
> On Fri, 2 Feb 2024 at 13:41, Heikki Linnakangas  wrote:
>>
>> PS. Does any other DBMS implement this? Any lessons to be learned from them?
>
>
> Mysql 8.3 has open telemetrie component, so very fresh:
>
> https://dev.mysql.com/doc/refman/8.3/en/telemetry-trace-install.html
> 

Re: Synchronizing slots from primary to standby

2024-02-06 Thread Amit Kapila
On Tue, Feb 6, 2024 at 3:23 PM Dilip Kumar  wrote:
>
> On Tue, Feb 6, 2024 at 1:09 PM Masahiko Sawada  wrote:
> >
> > On Tue, Feb 6, 2024 at 3:19 PM Amit Kapila  wrote:
> > >
> > > On Mon, Feb 5, 2024 at 7:56 PM Masahiko Sawada  
> > > wrote:
> > > >
> > > > ---
> > > > Since Two processes (e.g. the slotsync worker and
> > > > pg_sync_replication_slots()) concurrently fetch and update the slot
> > > > information, there is a race condition where slot's
> > > > confirmed_flush_lsn goes backward.
> > > >
> > >
> > > Right, this is possible, though there shouldn't be a problem because
> > > anyway, slotsync is an async process. Till we hold restart_lsn, the
> > > required WAL won't be removed. Having said that, I can think of two
> > > ways to avoid it: (a) We can have some flag in shared memory using
> > > which we can detect whether any other process is doing slot
> > > syncronization and then either error out at that time or simply wait
> > > or may take nowait kind of parameter from user to decide what to do?
> > > If this is feasible, we can simply error out for the first version and
> > > extend it later if we see any use cases for the same (b) similar to
> > > restart_lsn, if confirmed_flush_lsn is getting moved back, raise an
> > > error, this is good for now but in future we may still have another
> > > similar issue, so I would prefer (a) among these but I am fine if you
> > > prefer (b) or have some other ideas like just note down in comments
> > > that this is a harmless case and can happen only very rarely.
> >
> > Thank you for sharing the ideas. I would prefer (a). For (b), the same
> > issue still happens for other fields.
>
> I agree that (a) looks better.  On a separate note, while looking at
> this API pg_sync_replication_slots(PG_FUNCTION_ARGS) shouldn't there
> be an optional parameter to give one slot or multiple slots or all
> slots as default, that will give better control to the user no?
>

As of now, we want to give functionality similar to slotsync worker
with a difference that users can use this new function for planned
switchovers. So, syncing all failover slots by default. I think if
there is a use case to selectively sync some of the failover slots
then we can probably extend this function and slotsync worker as well.
Normally, if the primary goes down due to whatever reason users would
want to restart the replication for all the defined publications via
existing failover slots. Why would anyone want to do it partially?

-- 
With Regards,
Amit Kapila.




Re: Synchronizing slots from primary to standby

2024-02-06 Thread Amit Kapila
On Tue, Feb 6, 2024 at 1:09 PM Masahiko Sawada  wrote:
>
> On Tue, Feb 6, 2024 at 3:19 PM Amit Kapila  wrote:
> >
> > I think users can refer to LOGs to see if it has changed since the
> > first time it was configured. I tried by existing parameter and see
> > the following in LOG:
> > LOG:  received SIGHUP, reloading configuration files
> > 2024-02-06 11:38:59.069 IST [9240] LOG:  parameter "autovacuum" changed to 
> > "on"
> >
> > If the user can't confirm then it is better to follow the steps
> > mentioned in the patch. Do you want something else to be written in
> > docs for this? If so, what?
>
> IIUC even if a wrong slot name is specified to standby_slot_names or
> even standby_slot_names is empty, the standby server might not be
> lagging behind the subscribers depending on the timing. But when
> checking it the next time, the standby server might lag behind the
> subscribers. So what I wanted to know is how the user can confirm if a
> failover-enabled subscription is ensured not to go in front of
> failover-candidate standbys (i.e., standbys using the slots listed in
> standby_slot_names).
>

But isn't the same explained by two steps ((a) Firstly, on the
subscriber node check the last replayed WAL. (b) Next, on the standby
server check that the last-received WAL location is ahead of the
replayed WAL location on the subscriber identified above.) in the
latest *_0004 patch.

-- 
With Regards,
Amit Kapila.




Re: Synchronizing slots from primary to standby

2024-02-06 Thread Dilip Kumar
On Tue, Feb 6, 2024 at 1:09 PM Masahiko Sawada  wrote:
>
> On Tue, Feb 6, 2024 at 3:19 PM Amit Kapila  wrote:
> >
> > On Mon, Feb 5, 2024 at 7:56 PM Masahiko Sawada  
> > wrote:
> > >
> > > ---
> > > Since Two processes (e.g. the slotsync worker and
> > > pg_sync_replication_slots()) concurrently fetch and update the slot
> > > information, there is a race condition where slot's
> > > confirmed_flush_lsn goes backward.
> > >
> >
> > Right, this is possible, though there shouldn't be a problem because
> > anyway, slotsync is an async process. Till we hold restart_lsn, the
> > required WAL won't be removed. Having said that, I can think of two
> > ways to avoid it: (a) We can have some flag in shared memory using
> > which we can detect whether any other process is doing slot
> > syncronization and then either error out at that time or simply wait
> > or may take nowait kind of parameter from user to decide what to do?
> > If this is feasible, we can simply error out for the first version and
> > extend it later if we see any use cases for the same (b) similar to
> > restart_lsn, if confirmed_flush_lsn is getting moved back, raise an
> > error, this is good for now but in future we may still have another
> > similar issue, so I would prefer (a) among these but I am fine if you
> > prefer (b) or have some other ideas like just note down in comments
> > that this is a harmless case and can happen only very rarely.
>
> Thank you for sharing the ideas. I would prefer (a). For (b), the same
> issue still happens for other fields.

I agree that (a) looks better.  On a separate note, while looking at
this API pg_sync_replication_slots(PG_FUNCTION_ARGS) shouldn't there
be an optional parameter to give one slot or multiple slots or all
slots as default, that will give better control to the user no?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Memory leak fix in rmtree.c

2024-02-06 Thread Daniel Gustafsson
> On 6 Feb 2024, at 10:34, Ильясов Ян  wrote:

> Just like some of my colleagues I've been using Svace*
> and I think I've found a bug in src/common/rmtree.c .
> 
> In 64th line function returns false in case it couldn't open a directory,
> but the memory, that have been allocated for char** dirnames is
> not freed.

dirnames isn't allocated at this point, it's palloc'd after this return
statement on line 67.

--
Daniel Gustafsson





Memory leak fix in rmtree.c

2024-02-06 Thread Ильясов Ян
Hello hackers,

Just like some of my colleagues I've been using Svace*
and I think I've found a bug in src/common/rmtree.c .

In 64th line function returns false in case it couldn't open a directory,
but the memory, that have been allocated for char** dirnames is
not freed.

The patch that has a fix of this is attached and is based on the latest
master code.

* ​- https://svace.pages.ispras.ru/svace-website/en/

Kind regards,
Ian Ilyasov.

Subject: [PATCH] Fixed memory leak in case we couldn't open a directory in rmtree.c.
---
Index: src/common/rmtree.c
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===
diff --git a/src/common/rmtree.c b/src/common/rmtree.c
--- a/src/common/rmtree.c	(revision 48a6bf5c4ea8e04cc9bb33a8120a21743da515ed)
+++ b/src/common/rmtree.c	(date 1707211004528)
@@ -61,6 +61,7 @@
 	if (dir == NULL)
 	{
 		pg_log_warning("could not open directory \"%s\": %m", path);
+pfree(dirnames);
 		return false;
 	}
 


Re: pgsql: Add EXPLAIN (MEMORY) to report planner memory consumption

2024-02-06 Thread Ashutosh Bapat
On Tue, Feb 6, 2024 at 3:51 AM Justin Pryzby  wrote:
>
> Up to now, the explain "  " (two space) format is not mixed with "=".
>
> And, other places which show "Memory" do not use "=".  David will
> remember prior discussions.
> https://www.postgresql.org/message-id/20200402054120.gc14...@telsasoft.com
> https://www.postgresql.org/message-id/20200407042521.gh2...@telsasoft.com
>
>  "Memory: used=%lld bytes  
> allocated=%lld bytes",
> vs
>  "Buckets: %d (originally %d) 
>  Batches: %d (originally %d)  Memory Usage: %ldkB\n",
>

I have used = to be consistent with Buffers usage in the same Planning section.

Are you suggesting that
"Memory: used=%lld bytes allocated=%lld bytes",
should be used instead of
"Memory: used=%lld bytes  allocated=%lld bytes",
Please notice the single vs double space.

I am fine with this.

> There was some discussion about "bytes" - maybe it should instead show
> kB?
>

So EXPLAIN (memory) on a prepared statement may report memory less
than 1kB in which case bytes is a better unit. Planner may consume as
less as few kBs of memory, reporting which in kBs would be lossy.

> (Also, I first thought that "peek" should be "peak", but eventually I
> realized that's it's as intended.)
>

Don't understand the context. But probably it doesn't matter.

-- 
Best Wishes,
Ashutosh Bapat




Re: Properly pathify the union planner

2024-02-06 Thread Richard Guo
On Fri, Nov 24, 2023 at 6:29 AM David Rowley  wrote:

> I've attached the updated patch.  This one is probably ready for
> someone to test out. There will be more work to do, however.


I just started reviewing this patch and haven't looked through all the
details yet.  Here are some feedbacks that came to my mind.  Post them
first so that I don’t forget them after the holidays.

* I think we should update truncate_useless_pathkeys() to account for
the ordering requested by the query's set operation; otherwise we may
not get a subquery's path with the expected pathkeys.  For instance,

create table t (a int, b int);
create index on t (a, b);
set enable_hashagg to off;

-- on v1 patch
explain (costs off)
(select * from t order by a) UNION (select * from t order by a);
 QUERY PLAN

 Unique
   ->  Merge Append
 Sort Key: t.a, t.b
 ->  Incremental Sort
   Sort Key: t.a, t.b
   Presorted Key: t.a
   ->  Index Only Scan using t_a_b_idx on t
 ->  Incremental Sort
   Sort Key: t_1.a, t_1.b
   Presorted Key: t_1.a
   ->  Index Only Scan using t_a_b_idx on t t_1
(11 rows)

-- after accounting for setop_pathkeys in truncate_useless_pathkeys()
explain (costs off)
(select * from t order by a) UNION (select * from t order by a);
  QUERY PLAN
--
 Unique
   ->  Merge Append
 Sort Key: t.a, t.b
 ->  Index Only Scan using t_a_b_idx on t
 ->  Index Only Scan using t_a_b_idx on t t_1
(5 rows)

* I understand that we need to sort (full or incremental) the paths of
the subqueries to meet the ordering required for setop_pathkeys, so that
MergeAppend has something to work with.  Currently in the v1 patch this
sorting is performed during the planning phase of the subqueries (in
grouping_planner).

And we want to add the subquery's cheapest_total_path as-is to allow
that path to be used in hash-based UNIONs, and we also want to add a
sorted path on top of cheapest_total_path.  And then we may encounter
the issue you mentioned earlier regarding add_path() potentially freeing
the cheapest_total_path, leaving the Sort's subpath gone.

I'm thinking that maybe it'd be better to move the work of sorting the
subquery's paths to the outer query level, specifically within the
build_setop_child_paths() function, just before we stick SubqueryScanPath
on top of the subquery's paths.  I think this is better because:

1. This minimizes the impact on subquery planning and reduces the
footprint within the grouping_planner() function as much as possible.

2. This can help avoid the aforementioned add_path() issue because the
two involved paths will be structured as:

cheapest_path -> subqueryscan
and
cheapest_path -> sort -> subqueryscan

If the two paths cost fuzzily the same and add_path() decides to keep
the second one due to it having better pathkeys and pfree the first one,
it would not be a problem.

BTW, I haven't looked through the part involving partial paths, but I
think we can do the same to partial paths.


* I noticed that in generate_union_paths() we use a new function
build_setop_pathkeys() to build the 'union_pathkeys'.  I wonder why we
don't simply utilize make_pathkeys_for_sortclauses() since we already
have the grouplist for the setop's output columns.

To assist the discussion I've attached a diff file that includes all the
changes above.

Thanks
Richard


v1-0001-review-diff.patch
Description: Binary data


Re: Why is subscription/t/031_column_list.pl failing so much?

2024-02-06 Thread Alexander Lakhin

06.02.2024 09:48, Amit Kapila wrote:

cool, is it possible to see whether this patch changes the runtime of
this test in any noticeable way?



Yes, unfortunately it does.
I've measured duration of 100 tests runs without the patch (with pristine
bgwriter and with NO_TEMP_INSTALL):
real    6m46,031s
real    6m52,406s
real    6m51,014s

But with the patched test, I've got:
real    9m39,872s
real    9m40,044s
real    9m38,236s
(nearly 2 seconds increase per one test run)

Under Valgrind, the original test run takes:
Files=1, Tests=36, 334 wallclock secs ( 0.02 usr  0.00 sys + 163.14 cusr  7.98 
csys = 171.14 CPU)

But the patched one:
Files=1, Tests=36, 368 wallclock secs ( 0.02 usr  0.00 sys + 182.16 cusr  8.90 
csys = 191.08 CPU)
(30 seconds increase)

Maybe the more CPU-efficient solution would be disabling bgworker, as was
proposed in another discussion of tests instability:
https://www.postgresql.org/message-id/ZaTxhjnPygOdosJ4%40ip-10-97-1-34.eu-west-3.compute.internal

Though I think that devising a way to control bgwriter may take more time
than we can afford given the current 031 failure rate on the buildfarm
(17 failures for the last 3 days).

Best regards,
Alexander




Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-02-06 Thread Dilip Kumar
On Thu, Jan 11, 2024 at 10:48 AM Bharath Rupireddy
 wrote:
>
> Hi,
>
> Replication slots in postgres will prevent removal of required
> resources when there is no connection using them (inactive). This
> consumes storage because neither required WAL nor required rows from
> the user tables/system catalogs can be removed by VACUUM as long as
> they are required by a replication slot. In extreme cases this could
> cause the transaction ID wraparound.
>
> Currently postgres has the ability to invalidate inactive replication
> slots based on the amount of WAL (set via max_slot_wal_keep_size GUC)
> that will be needed for the slots in case they become active. However,
> the wraparound issue isn't effectively covered by
> max_slot_wal_keep_size - one can't tell postgres to invalidate a
> replication slot if it is blocking VACUUM. Also, it is often tricky to
> choose a default value for max_slot_wal_keep_size, because the amount
> of WAL that gets generated and allocated storage for the database can
> vary.
>
> Therefore, it is often easy for developers to do the following:
> a) set an XID age (age of slot's xmin or catalog_xmin) of say 1 or 1.5
> billion, after which the slots get invalidated.
> b) set a timeout of say 1 or 2 or 3 days, after which the inactive
> slots get invalidated.
>
> To implement (a), postgres needs a new GUC called max_slot_xid_age.
> The checkpointer then invalidates all the slots whose xmin (the oldest
> transaction that this slot needs the database to retain) or
> catalog_xmin (the oldest transaction affecting the system catalogs
> that this slot needs the database to retain) has reached the age
> specified by this setting.
>
> To implement (b), first postgres needs to track the replication slot
> metrics like the time at which the slot became inactive (inactive_at
> timestamptz) and the total number of times the slot became inactive in
> its lifetime (inactive_count numeric) in ReplicationSlotPersistentData
> structure. And, then it needs a new timeout GUC called
> inactive_replication_slot_timeout. Whenever a slot becomes inactive,
> the current timestamp and inactive count are stored in
> ReplicationSlotPersistentData structure and persisted to disk. The
> checkpointer then invalidates all the slots that are lying inactive
> for about inactive_replication_slot_timeout duration starting from
> inactive_at.
>
> In addition to implementing (b), these two new metrics enable
> developers to improve their monitoring tools as the metrics are
> exposed via pg_replication_slots system view. For instance, one can
> build a monitoring tool that signals when replication slots are lying
> inactive for a day or so using inactive_at metric, and/or when a
> replication slot is becoming inactive too frequently using inactive_at
> metric.
>
> I’m attaching the v1 patch set as described below:
> 0001 - Tracks invalidation_reason in pg_replication_slots. This is
> needed because slots now have multiple reasons for slot invalidation.
> 0002 - Tracks inactive replication slot information inactive_at and
> inactive_timeout.
> 0003 - Adds inactive_timeout based replication slot invalidation.
> 0004 - Adds XID based replication slot invalidation.
>
> Thoughts?
>
+1 for the idea,  here are some comments on 0002, I will review other
patches soon and respond.

1.
+  
+   inactive_at timestamptz
+  
+  
+The time at which the slot became inactive.
+NULL if the slot is currently actively being
+used.
+  
+ 

Maybe we can change the field name to 'last_inactive_at'? or maybe the
comment can explain timestampt at which slot was last inactivated.
I think since we are already maintaining the inactive_count so better
to explicitly say this is the last invaliding time.

2.
+ /*
+ * XXX: Can inactive_count of type uint64 ever overflow? It takes
+ * about a half-billion years for inactive_count to overflow even
+ * if slot becomes inactive for every 1 millisecond. So, using
+ * pg_add_u64_overflow might be an overkill.
+ */

Correct we don't need to use pg_add_u64_overflow for this counter.

3.

+
+ /* Convert to numeric. */
+ snprintf(buf, sizeof buf, UINT64_FORMAT, slot_contents.data.inactive_count);
+ values[i++] = DirectFunctionCall3(numeric_in,
+   CStringGetDatum(buf),
+   ObjectIdGetDatum(0),
+   Int32GetDatum(-1));

What is the purpose of doing this? I mean inactive_count is 8 byte
integer and you can define function outparameter as 'int8' which is 8
byte integer.  Then you don't need to convert int to string and then
to numeric?


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




RE: speed up a logical replica setup

2024-02-06 Thread Hayato Kuroda (Fujitsu)
Dear Shubham,

Thanks for testing our codes!

> While reviewing the v15 patches I discovered that subscription
> connection string has added a lot of options which are not required
> now:
> v15-0001
> postgres=# select subname, subconninfo from pg_subscription;
> subname|   subconninfo
> ---+--
> pg_createsubscriber_5_1867633 | host=localhost port=5432 dbname=postgres
> (1 row)
> 
> 
> 
> v15-0001+0002+0003
> postgres=# select subname, subconninfo from pg_subscription;
> subname|
> 
>   subconninfo
> 
> 
> 
> ---+--
> --
> --
> --
> --
> --
> --
> pg_createsubscriber_5_1895366 | user=shubham
> passfile='/home/shubham/.pgpass' channel_binding=prefer ho
> st=127.0.0.1 port=5432 sslmode=prefer sslcompression=0
> sslcertmode=allow sslsni=1 ssl_min_protocol_versi
> on=TLSv1.2 gssencmode=disable krbsrvname=postgres gssdelegation=0
> target_session_attrs=any load_balance_
> hosts=disable dbname=postgres
> (1 row)
> 
> 
> Here, we can see that channel_binding, sslmode, sslcertmode, sslsni,
> gssencmode, krbsrvname, etc are getting included. This does not look
> intentional, we should keep the subscription connection same as in
> v15-0001.

You should attach the script the reproducer. I suspected you used pg_basebackup
-R command for setting up the standby. In this case, it is intentional.
These settings are not caused by the pg_createsubscriber, done by pg_basebackup.
If you set primary_conninfo manually like attached, these settings would not 
appear.

As the first place, listed options (E.g., passfile, channel_binding, sslmode,
sslcompression, sslcertmode, etc...) were set when you connect to the database
via libpq functions. PQconninfoOptions in fe-connect.c lists parameters and
their default value.

v15-0003 reuses the primary_conninfo for subconninfo attribute. primary_conninfo
is set by pg_basebackup specified with '-R' option.
The content is built in GenerateRecoveryConfig(), which bypass parameters from
PQconninfo(). This function returns all the libpq connection parameters even if
it is set as default. So primary_conninfo looks longer.

I don't think this works wrongly. Users still can set an arbitrary connection
string as primary_conninfo. You just use longer string unintentionally.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



shorter_subconninfo.sh
Description: shorter_subconninfo.sh


Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2024-02-06 Thread jian he
On Tue, Feb 6, 2024 at 3:46 PM Yugo NAGATA  wrote:
>
> On Tue, 06 Feb 2024 09:39:09 +0900 (JST)
> Kyotaro Horiguchi  wrote:
>
> > At Mon, 5 Feb 2024 17:22:56 +0900, Yugo NAGATA  wrote 
> > in
> > > On Mon, 05 Feb 2024 11:28:59 +0900
> > > torikoshia  wrote:
> > >
> > > > > Based on this, I've made a patch.
> > > > > based on COPY Synopsis: ON_ERROR 'error_action'
> > > > > on_error 'null', the  keyword NULL should be single quoted.
> > > >
> > > > As you mentioned, single quotation seems a little odd..
> > > >
> > > > I'm not sure what is the best name and syntax for this feature, but
> > > > since current error_action are verbs('stop' and 'ignore'), I feel 'null'
> > > > might not be appropriate.
> > >
> > > I am not in favour of using 'null' either, so I suggested to use
> > > "set_to_null" or more generic syntax like  "set_to (col, val)" in my
> > > previous post[1], although I'm not convinced what is the best either.
> > >
> > > [1] 
> > > https://www.postgresql.org/message-id/20240129172858.ccb6c77c3be95a295e2b2b44%40sraoss.co.jp
> >
> > Tom sugggested using a separate option, and I agree with the
> > suggestion. Taking this into consideration, I imagined something like
> > the following, for example.  Although I'm not sure we are actually
> > going to do whole-tuple replacement, the action name in this example
> > has the suffix '-column'.
> >
> > COPY (on_error 'replace-colomn', replacement 'null') ..
>
> Thank you for your information. I've found a post[1] you mentioned,
> where adding a separate option for error log destination was suggested.
>
> Considering consistency with other options, adding a separate option
> would be better if we want to specify a value to replace the invalid
> value, without introducing a complex syntax that allows options with
> more than one parameters. Maybe, if we allow to use values for the
> replacement other than NULL, we have to also add a option to specify
> a column (or a type)  for each replacement value. Or,  we may add a
> option to specify a list of replacement values as many as the number of
> columns, each of whose default is NULL.
>
> Anyway, I prefer 'replace" (or 'set_to') to just 'null' as the option
> value.
>

Let's say tabe t column (a,b,c)
if we support set_to_null(a,b), what should we do if column c has an error.
should we ignore this row or error out immediately?
also I am not sure it's doable to just extract columnList from the
function defGetCopyOnErrorChoice.

to make `COPY x from stdin (on_error set_to_null(a,b);` work,
we may need to refactor to gram.y, in a similar way we do force null

i am ok with
COPY x from stdin (on_error set_to_null);




RE: speed up a logical replica setup

2024-02-06 Thread Hayato Kuroda (Fujitsu)
Dear Euler,

> 03.
> ```
> /*
>  * Is the standby server ready for logical replication?
>  */
> static bool
> check_subscriber(LogicalRepInfo *dbinfo)
> ```
> 
> You said "target server must be a standby" in [1], but I cannot find checks 
> for it.
> IIUC, there are two approaches:
>  a) check the existence "standby.signal" in the data directory
>  b) call an SQL function "pg_is_in_recovery"

I found that current code (HEAD+v14-0001) leads stuck or timeout because the 
recovery
would be never finished. In attached script emulates the case standby.signal 
file is missing.
This script always fails with below output:

```
...
pg_createsubscriber: waiting the postmaster to reach the consistent state
pg_createsubscriber: postmaster was stopped
pg_createsubscriber: error: recovery timed out
...
```

I also attached the server log during pg_createsubscriber, and we can see that
walreceiver exits before receiving all the required changes. I cannot find a 
path
yet, but the inconsistency lead the situation which walreceiver exists but 
startup
remains. This also said that recovery status must be checked in 
check_subscriber().

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



server_start_20240206T080003.670.log
Description: server_start_20240206T080003.670.log


test_0206.sh
Description: test_0206.sh


  1   2   >