what do you do after pg_walreset ?

2023-12-21 Thread Johnathan Tiamoh
Hello,


Please I wish to find out what you do after running pg_walreset.

I made a mistake and deleted my data directory with all its content. I then
restore with 1 day old backups, after that i could start unless i did
pg_walreset.

1, I wish to find out what to do after i have started postgresql ?

2, How do I apply the wall files after the reset ?

3, How can I determine the amount of data lost ?


Re: Changing a schema's name with function1 calling function2

2023-12-21 Thread Adrian Klaver

On 12/21/23 05:47, Wilma Wantren wrote:

Now as text mail, sorry.
I accidentally posted the following to the bugs mailing list first. Sorry for 
this, now as a question here.
The company I work for used to only use Oracle databases for their program and 
now also supports PostgreSQL.
With Postgres, we create a database, a user and a schema, which all have the 
same name. The name is chosen by the customers.
Sometimes, the customers would like to export the data and definitions and 
import them under a new name.
This works well by using the -O (no owner) option for pg_dump/pg_restore and 
only exporting and importing the schema. After the import the schema is 
renamed. So far so simple and easy!
But there is one problem: we have a database function function1, which calls 
another function function2. To ensure that the function call is safe, we set a 
search_path for function1.
Both functions are created in the main schema (the one that is named by the 
customer), the search_path is therefore set to this schema:
ALTER FUNCTION function1 SET SEARCH_PATH TO 
Since the search_path of the function is not renamed when the schema is 
renamed, I need to know that there is such a search_path, which I then manually 
change to the new schema name.
Would it be possible that there is a variable that designates the schema in 
which a function is located? Like this, for example:
ALTER FUNCTION function1 SET SEARCH_PATH TO $FUNCTION_SCHEMA;
Since $FUNCTION_SCHEMA would change to denote the new schema when renaming the 
schema I wouldn't have to change the search_path manually.


From

https://www.postgresql.org/docs/current/sql-alterfunction.html

"
configuration_parameter
value

Add or change the assignment to be made to a configuration 
parameter when the function is called. If value is DEFAULT or, 
equivalently, RESET is used, the function-local setting is removed, so 
that the function executes with the value present in its environment. 
Use RESET ALL to clear all function-local settings. SET FROM CURRENT 
saves the value of the parameter that is current when ALTER FUNCTION is 
executed as the value to be applied when the function is entered.


See SET and Chapter 20 for more information about allowed parameter 
names and values.

"

Not sure if that would meet your requirements.

I could see doing in a session:

SET search_path = 'main_schema';

ALTER FUNCTION function SET search_path FROM CURRENT;



Your E-Mail. Your Cloud. Your Office. eclipso Mail & Cloud. 
https://www.eclipso.de






--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Help understanding server-side logging (and more...)

2023-12-21 Thread Adrian Klaver

On 12/21/23 03:05, Dominique Devienne wrote:

If by any chance, anyone has seen this kind of behavior before, in a 
similar context or not, and has suggestions
on how we could further troubleshoot this (on the PostgreSQL 
server-side, but also the Node.JS client-side, why not),

that would be greatly appreciated.


1) It would be helpful to add the Postgres version to any future posts.

2) Also more information on what this "They run PostgreSQL in a Linux 
container" means?




Thanks, --DD



--
Adrian Klaver
adrian.kla...@aklaver.com





Unlogged partitionned tables and hot standbys

2023-12-21 Thread Phil Florent
Hi,



Today I corrected a bug on my perf tool, I had forgotten to set the unlogged 
attribute on the table partitions if end-user wanted to miminize the overhead 
of data collection and purges. Then I played a bit with partitioning, unlogged 
attribute and hot standby just for fun and I am a bit puzzled.



Here is a very simple case :



traqueur=# select version();

  version

---

PostgreSQL 15.5 (Debian 15.5-0+deb12u1) on x86_64-pc-linux-gnu, compiled by gcc 
(Debian 12.2.0-14) 12.2.0, 64-bit

(1 ligne)



Suppose I created my partionned table with "unlogged" attribute even if it does 
not mean much (perhaps command should ignore/refuse my unlogged attribute here 
but it is accepted). The partition was also created with the "unlogged" 
attribute.



  Schéma  |  Nom  |Type 
   | Propriétaire |  Persistence   | Méthode d'accès |   Taille   | 
Description

--+---++--++-++-

traqueur | traqueur_bloqueurs_process| table 
partitionnée | postgres | non journalisé | | 0 bytes|

traqueur | traqueur_bloqueurs_process_2023_12_21_2023_12_22  | table
  | postgres | non journalisé | heap| 16 kB  |



On the hot standby I cannot query the partitionned table nor its partition, 
which is quite logical :



traqueur=*# table traqueur_bloqueurs_process;

ERREUR:  ne peut pas accéder à des tables temporaires et non tracées lors de la 
restauration



traqueur=*# table traqueur_bloqueurs_process_2023_12_21_2023_12_22;

ERREUR:  ne peut pas accéder à des tables temporaires et non tracées lors de la 
restauration



(seems error message is not 100% consistent (in French only ?) with psql 
information "non tracées" vs "non journalisé" but OK)



Now I want to turn the logging mode of those tables to "permanent" to query the 
perf information from the standby. I do that on the primary cluster :



traqueur=# alter table traqueur_bloqueurs_process set logged;

ALTER TABLE

traqueur=# \d+

Liste 
des relations

  Schéma  |  Nom  |Type 
   | Propriétaire |  Persistence   | Méthode d'accès |   Taille   | 
Description

--+---++--++-++-

traqueur | traqueur_bloqueurs_process| table 
partitionnée | postgres | non journalisé | | 0 bytes|





My command silently did nothing on the partitioned table and it seems it is not 
really an issue, only partitions do have data after all.



traqueur=# alter table traqueur_bloqueurs_process_2023_12_21_2023_12_22 set 
logged;

ALTER TABLE

traqueur=# \d+

Liste 
des relations

  Schéma  |  Nom  |Type 
   | Propriétaire |  Persistence   | Méthode d'accès |   Taille   | 
Description

--+---++--++-++-

traqueur | traqueur_bloqueurs_process| table 
partitionnée | postgres | non journalisé | | 0 bytes|

traqueur | traqueur_bloqueurs_process_2023_12_21_2023_12_22  | table
  | postgres | permanent  | heap| 8192 bytes |

...



Situation is 100% OK on the primary cluster but I still cannot query my 
partitioned table on the hot standby database even if I can now query its 
partition:



traqueur=*# table traqueur_bloqueurs_process;

ERREUR:  ne peut pas accéder à des tables temporaires et non tracées lors de la 
restauration



traqueur=*# table traqueur_bloqueurs_process_2023_12_21_2023_12_22;

...

(0 ligne)



It could be convenient that the "alter table  set logged;" changes the mode 
even on partitioned tables or that the unlogged attribute could be 
ignored/refused at the creation of the partitioned table seems if it has no 
sense ? Seems a bit strange to apply it with « create table » and to ignore it 
with « alter table », it has consequences in the case I described . I have no 
way to properly correct my initial mistake that was silently applied.

The idea is not to change anything on the underlying partitions, command would 
still do "nothing" about logging but it would allow the partitioned table to be 
queried on the standby after the change. I 

Re: PostgreSQL 15.5 stops processing user queries

2023-12-21 Thread Tom Lane
Andrey Zhidenkov  writes:
> We have encountered an issue with our PostgreSQL 15.5 installation. The
> problem is that
> PostgreSQL server periodically falls into a state when it accepts new
> connections but
> doesn't execute any queries. The session which runs a query hangs and
> cannot be terminated
> via SIGINT (even if it's just "SELECT 1") - a corresponding linux process
> has state "S" and it's
> not terminated even if the PostgreSQL master process is stopped. No matter
> how we connect to
> the database - both TCP and unix socket sessions hang but it seems that
> existing sessions can
> still execute queries (once we succeeded to connect using pgBouncer which
> seemed to use an
> existing connection to the database).

Try using an existing session to capture pg_stat_activity and pg_locks
information about the stuck session(s).

> Here is a stack trace from gdb for one of the frozen sessions:

This appears to be a trace of a psql process waiting for a query
result.  It won't teach you much about what the server is doing.

regards, tom lane




Changing a schema's name with function1 calling function2

2023-12-21 Thread Wilma Wantren
Now as text mail, sorry.
I accidentally posted the following to the bugs mailing list first. Sorry for 
this, now as a question here.
The company I work for used to only use Oracle databases for their program and 
now also supports PostgreSQL.
With Postgres, we create a database, a user and a schema, which all have the 
same name. The name is chosen by the customers.
Sometimes, the customers would like to export the data and definitions and 
import them under a new name.
This works well by using the -O (no owner) option for pg_dump/pg_restore and 
only exporting and importing the schema. After the import the schema is 
renamed. So far so simple and easy!
But there is one problem: we have a database function function1, which calls 
another function function2. To ensure that the function call is safe, we set a 
search_path for function1.
Both functions are created in the main schema (the one that is named by the 
customer), the search_path is therefore set to this schema:
ALTER FUNCTION function1 SET SEARCH_PATH TO 
Since the search_path of the function is not renamed when the schema is 
renamed, I need to know that there is such a search_path, which I then manually 
change to the new schema name.
Would it be possible that there is a variable that designates the schema in 
which a function is located? Like this, for example:
ALTER FUNCTION function1 SET SEARCH_PATH TO $FUNCTION_SCHEMA;
Since $FUNCTION_SCHEMA would change to denote the new schema when renaming the 
schema I wouldn't have to change the search_path manually.

Your E-Mail. Your Cloud. Your Office. eclipso Mail & Cloud. 
https://www.eclipso.de






Changing a schema's name with function1 calling function2

2023-12-21 Thread Wilma Wantren
­I accidentally posted the following to the bugs mailing list first. Sorry for this, now as a question here.The company I work for used to only use Oracle databases for their program and now also supports PostgreSQL.With Postgres, we create a database, a user and a schema, which all have the same name. The name is chosen by the customers.Sometimes, the customers would like to export the data and definitions and import them under a new name.This works well by using the -O (no owner) option for pg_dump/pg_restore and only exporting and importing the schema. After the import the schema is renamed. So far so simple and easy!But there is one problem: we have a database function function1, which calls another function function2. To ensure that the function call is safe, we set a search_path for function1.Both functions are created in the main schema (the one that is named by the customer), the search_path is therefore set to this schema:ALTER FUNCTION function1 SET SEARCH_PATH TO Since the search_path of the function is not renamed when the schema is renamed, I need to know that there is such a search_path, which I then manually change to the new schema name.Would it be possible that there is a variable that designates the schema in which a function is located? Like this, for example:ALTER FUNCTION function1 SET SEARCH_PATH TO $FUNCTION_SCHEMA;Since $FUNCTION_SCHEMA would change to denote the new schema when renaming the schema I wouldn't have to change the search_path manually.Your E-Mail. Your Cloud. Your Office. eclipso Mail & Cloud.






Re: Read write performance check

2023-12-21 Thread veem v
Can someone please guide me, if any standard scripting is available for
doing such read/write performance test? Or point me to any available docs?

On Wed, 20 Dec, 2023, 10:39 am veem v,  wrote:

> Thank you.
>
> That would really be helpful if such test scripts or similar setups are
> already available. Can someone please guide me to some docs or blogs or
> sample scripts, on same please.
>
> On Wed, 20 Dec, 2023, 10:34 am Lok P,  wrote:
>
>> As Rob mentioned, the syntax you posted is not correct. You need to
>> process or read a certain batch of rows like 1000 or 10k etc. Not all 100M
>> at one shot.
>>
>> But again your uses case seems common one considering you want to compare
>> the read and write performance on multiple databases with similar table
>> structure as per your usecase. So in that case, you may want to use some
>> test scripts which others must have already done rather reinventing the
>> wheel.
>>
>>
>> On Wed, 20 Dec, 2023, 10:19 am veem v,  wrote:
>>
>>> Thank you.
>>>
>>> Yes, actually we are trying to compare and see what maximum TPS are we
>>> able to reach with both of these row by row and batch read/write test. And
>>> then afterwards, this figure may be compared with other databases etc with
>>> similar setups.
>>>
>>>  So wanted to understand from experts here, if this approach is fine? Or
>>> some other approach is advisable?
>>>
>>> I agree to the point that , network will play a role in real world app,
>>> but here, we are mainly wanted to see the database capability, as network
>>> will always play a similar kind of role across all databases. Do you
>>> suggest some other approach to achieve this objective?
>>>
>>>
>>> On Wed, 20 Dec, 2023, 2:42 am Peter J. Holzer,  wrote:
>>>
 On 2023-12-20 00:44:48 +0530, veem v wrote:
 >  So at first, we need to populate the base tables with the necessary
 data (say
 > 100million rows) with required skewness using random functions to
 generate the
 > variation in the values of different data types. Then in case of row
 by row
 > write/read test , we can traverse in a cursor loop. and in case of
 batch write/
 > insert , we need to traverse in a bulk collect loop. Something like
 below and
 > then this code can be wrapped into a procedure and passed to the
 pgbench and
 > executed from there. Please correct me if I'm wrong.

 One important point to consider for benchmarks is that your benchmark
 has to be similar to the real application to be useful. If your real
 application runs on a different node and connects to the database over
 the network, a benchmark running within a stored procedure may not be
 very indicative of real performance.

 hp

 --
_  | Peter J. Holzer| Story must make more sense than reality.
 |_|_) ||
 | |   | h...@hjp.at |-- Charles Stross, "Creative writing
 __/   | http://www.hjp.at/ |   challenge!"

>>>


PostgreSQL 15.5 stops processing user queries

2023-12-21 Thread Andrey Zhidenkov
Hello all,

We have encountered an issue with our PostgreSQL 15.5 installation. The
problem is that
PostgreSQL server periodically falls into a state when it accepts new
connections but
doesn't execute any queries. The session which runs a query hangs and
cannot be terminated
via SIGINT (even if it's just "SELECT 1") - a corresponding linux process
has state "S" and it's
not terminated even if the PostgreSQL master process is stopped. No matter
how we connect to
the database - both TCP and unix socket sessions hang but it seems that
existing sessions can
still execute queries (once we succeeded to connect using pgBouncer which
seemed to use an
existing connection to the database).

Here is a stack trace from gdb for one of the frozen sessions:

(gdb) bt 10
#0  0x7f6d31dbd378 in poll () from /lib64/libc.so.6
#1  0x7f6d3286aee1 in pqSocketCheck.part.2 () from
/usr/pgsql-15/lib/libpq.so.5
#2  0x7f6d3286b054 in pqWaitTimed () from /usr/pgsql-15/lib/libpq.so.5
#3  0x7f6d32867848 in PQgetResult () from /usr/pgsql-15/lib/libpq.so.5
#4  0x00411320 in ExecQueryAndProcessResults
(query=query@entry=0x23a68b0
"select 1;", elapsed_msec=elapsed_msec@entry=0x7ffc2b5840a8,
svpt_gone_p=svpt_gone_p@entry=0x7ffc2b5840a7,
is_watch=is_watch@entry=false, opt=opt@entry=0x0,
printQueryFout=printQueryFout@entry=0x0) at common.c:1426
#5  0x0040feb9 in SendQuery (query=0x23a68b0 "select 1;") at
common.c:1117
#6  0x0040627b in main (argc=, argv=)
at startup.c:384

We're using glibc-2.28-236.0.1.el8.7.x86_64 on this machine and PostgreSQL
15.5:

postgres=# select version();
 version

-
 PostgreSQL 15.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0
20210514 (Red Hat 8.5.0-20), 64-bit
(1 row)

We've tried to recreate a cluster from scratch using a logical dump on new
hardware but it didn't
help though. And unfortunately we could not reproduce the issue, it looks
like it occurs randomly and
when it happens only PostgreSQL restart helps. Also we have number of
machines that run the
same version of PostgreSQL but we have the problem only with one cluster so
maybe it somehow
related to queries that are specific to this cluster.

We also run patroni 2.1.4 on this cluster, for reference (I'm not sure if
it can be related). We checked PostgreSQL logs,
of course - there are no any messages that could be related to the issue as
well.

We will really appreciate any help, thanks!

-- 
With best regards, Andrei Zhidenkov.


Re: Help understanding server-side logging (and more...)

2023-12-21 Thread Peter J. Holzer
On 2023-12-21 12:05:41 +0100, Dominique Devienne wrote:
> FWIW, the symptoms they are experiencing is that initially all queries
> are fast, then they start to get longer and longer, to the point all
> services accessing PostgreSQL grind to a halt and basically hang.
> While that happens, one can connect to the DB from a shell (that
> cluster has a single DB) w/o issues, and run queries just fine

If you do that, do you see the "hanging" queries in pg_stat_activity? If
so, what are they waiting for?

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: How to generate random bigint

2023-12-21 Thread Peter J. Holzer
On 2023-12-21 00:06:39 -0600, Phillip Diffley wrote:
> Postgres's random() function generates a random double. That can be converted
> to a random int for smaller integers, but a double can't represent all of the
> values in a bigint. Is there a recommended way to generate a random bigint in
> Postgres?

Call random() twice and add the results?

Like this:

select (random() * 2147483648)::int8 * 4294967296
   + (random() * 4294967296)::int8;

(This assumes that random() actually returns at least 32 random bits.
If that's not the case you'll need more calls to random())

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Help understanding server-side logging (and more...)

2023-12-21 Thread Julien Rouhaud
Hi,

On Thu, Dec 21, 2023 at 12:05:41PM +0100, Dominique Devienne wrote:
> Hi. Another team (than mine) has been trying to troubleshoot hang issues in
> their Node/JS/TS-based mid-tier services,
> after a switch from MySQL to PostgreSQL. They run PostgreSQL in a Linux
> container (see PID [1] below), unlike us.
> They started looking at the PostgreSQL server-side logs, but with little to
> no experience reading them, we'd like to
> confirm the extract below looks normal, notably whether the various
> exit-related messages are OK or not.

All those messages are at DEBUG level.  Anything abnormal would be emitted at a
higher level.
>
>
> FWIW, the symptoms they are experiencing is that initially all queries are
> fast, then they start to get longer and longer,
> to the point all services accessing PostgreSQL grind to a halt and
> basically hang. While that happens, one can connect
> to the DB from a shell (that cluster has a single DB) w/o issues, and run
> queries just fine (they have little data,
> in the few dozens MBs spread in 2 schemas and probably 2 or 3 dozen tables
> max), thus I personally think the problem
> is more on the client Node.JS side, but given that the same code worked
> fine with MySQL, maybe I'm wrong.

Yeah, "everything works normally except inside the app" is usually a good
indication that it's unlikely to be a postgres problem.  Maybe they have some
locking issue, transactions that are opened for too long or something else but
it's impossible to say without much more information.  One sure thing is that
running with logs at DEBUG level is going to add noticeable overhead.




Help understanding server-side logging (and more...)

2023-12-21 Thread Dominique Devienne
Hi. Another team (than mine) has been trying to troubleshoot hang issues in
their Node/JS/TS-based mid-tier services,
after a switch from MySQL to PostgreSQL. They run PostgreSQL in a Linux
container (see PID [1] below), unlike us.
They started looking at the PostgreSQL server-side logs, but with little to
no experience reading them, we'd like to
confirm the extract below looks normal, notably whether the various
exit-related messages are OK or not.

I'm guessing it's fine/normal, especially since they use a Node PostgreSQL
module with Pooling (10 connections max by default)
that disconnects idle connections after 10s, which matches what the log
shows. Can anyone please confirm?

FWIW, the symptoms they are experiencing is that initially all queries are
fast, then they start to get longer and longer,
to the point all services accessing PostgreSQL grind to a halt and
basically hang. While that happens, one can connect
to the DB from a shell (that cluster has a single DB) w/o issues, and run
queries just fine (they have little data,
in the few dozens MBs spread in 2 schemas and probably 2 or 3 dozen tables
max), thus I personally think the problem
is more on the client Node.JS side, but given that the same code worked
fine with MySQL, maybe I'm wrong.

Their services are concurrent and accessed by maybe 50-100 users tops, with
queries coming in all the time.

If by any chance, anyone has seen this kind of behavior before, in a
similar context or not, and has suggestions
on how we could further troubleshoot this (on the PostgreSQL server-side,
but also the Node.JS client-side, why not),
that would be greatly appreciated.

Thanks, --DD

2023-12-20 00:00:16.185 UTC [3605] DEBUG:  InitPostgres
2023-12-20 00:00:16.185 UTC [3605] DEBUG:  my backend ID is 4
2023-12-20 00:00:16.186 UTC [1] DEBUG:  forked new backend, pid=3606
socket=10
2023-12-20 00:00:16.187 UTC [3606] DEBUG:  InitPostgres
2023-12-20 00:00:16.187 UTC [3606] DEBUG:  my backend ID is 6
2023-12-20 00:00:16.188 UTC [3605] DEBUG:  StartTransaction(1) name:
unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
2023-12-20 00:00:16.189 UTC [3605] DEBUG:  processing received SASL
response of length 51
2023-12-20 00:00:16.189 UTC [3605] DEBUG:  sending SASL challenge of length
84
2023-12-20 00:00:16.190 UTC [3606] DEBUG:  StartTransaction(1) name:
unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
2023-12-20 00:00:16.191 UTC [3606] DEBUG:  processing received SASL
response of length 51
2023-12-20 00:00:16.191 UTC [3606] DEBUG:  sending SASL challenge of length
84
2023-12-20 00:00:16.193 UTC [3605] DEBUG:  processing received SASL
response of length 104
2023-12-20 00:00:16.193 UTC [3605] DEBUG:  sending SASL challenge of length
46
2023-12-20 00:00:16.195 UTC [3606] DEBUG:  processing received SASL
response of length 104
2023-12-20 00:00:16.195 UTC [3606] DEBUG:  sending SASL challenge of length
46
2023-12-20 00:00:16.200 UTC [3605] DEBUG:  CommitTransaction(1) name:
unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0
2023-12-20 00:00:16.201 UTC [3605] DEBUG:  StartTransaction(1) name:
unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
2023-12-20 00:00:16.201 UTC [3605] STATEMENT:  SELECT * FROM "NSP"."T1"
2023-12-20 00:00:16.202 UTC [3606] DEBUG:  CommitTransaction(1) name:
unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0
2023-12-20 00:00:16.203 UTC [3606] DEBUG:  StartTransaction(1) name:
unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
2023-12-20 00:00:16.203 UTC [3606] STATEMENT:  SELECT * FROM "NSP"."T2"
2023-12-20 00:00:16.205 UTC [3605] DEBUG:  CommitTransaction(1) name:
unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0
2023-12-20 00:00:16.205 UTC [3605] STATEMENT:  SELECT * FROM "NSP"."T1"
2023-12-20 00:00:16.207 UTC [3606] DEBUG:  CommitTransaction(1) name:
unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0
2023-12-20 00:00:16.207 UTC [3606] STATEMENT:  SELECT * FROM "NSP"."T2"
2023-12-20 00:00:26.209 UTC [3605] DEBUG:  shmem_exit(0): 4
before_shmem_exit callbacks to make
2023-12-20 00:00:26.209 UTC [3605] DEBUG:  shmem_exit(0): 6 on_shmem_exit
callbacks to make
2023-12-20 00:00:26.209 UTC [3606] DEBUG:  shmem_exit(0): 4
before_shmem_exit callbacks to make
2023-12-20 00:00:26.209 UTC [3605] DEBUG:  proc_exit(0): 2 callbacks to make
2023-12-20 00:00:26.209 UTC [3605] DEBUG:  exit(0)
2023-12-20 00:00:26.209 UTC [3605] DEBUG:  shmem_exit(-1): 0
before_shmem_exit callbacks to make
2023-12-20 00:00:26.209 UTC [3605] DEBUG:  shmem_exit(-1): 0 on_shmem_exit
callbacks to make
2023-12-20 00:00:26.209 UTC [3605] DEBUG:  proc_exit(-1): 0 callbacks to
make
2023-12-20 00:00:26.209 UTC [3606] DEBUG:  shmem_exit(0): 6 on_shmem_exit
callbacks to make
2023-12-20 00:00:26.209 UTC [3606] DEBUG:  proc_exit(0): 2 callbacks to make
2023-12-20 00:00:26.209 UTC [3606] DEBUG:  exit(0)
2023-12-20 00:00:26.209 UTC [3606] 

Re: "invalid contrecord" error on replica

2023-12-21 Thread Adrien Nayrat

I received a question about this error.
Just for the record, if someone encounter the same issue.
It has been fixed in ff9f111bce24

TL;DR : update your instance :)

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ff9f111bce24

 Fix WAL replay in presence of an incomplete record

Physical replication always ships WAL segment files to replicas once
they are complete.  This is a problem if one WAL record is split across
a segment boundary and the primary server crashes before writing down
the segment with the next portion of the WAL record: WAL writing after
crash recovery would happily resume at the point where the broken record
started, overwriting that record ... but any standby or backup may have
already received a copy of that segment, and they are not rewinding.
This causes standbys to stop following the primary after the latter
crashes:
  LOG:  invalid contrecord length 7262 at A8/D9FFFBC8
because the standby is still trying to read the continuation record
(contrecord) for the original long WAL record, but it is not there and
it will never be.  A workaround is to stop the replica, delete the WAL
file, and restart it -- at which point a fresh copy is brought over from
the primary.  But that's pretty labor intensive, and I bet many users
would just give up and re-clone the standby instead.

A fix for this problem was already attempted in commit 515e3d84a0b5, but
it only addressed the case for the scenario of WAL archiving, so
streaming replication would still be a problem (as well as other things
such as taking a filesystem-level backup while the server is down after
having crashed), and it had performance scalability problems too; so it
had to be reverted.

This commit fixes the problem using an approach suggested by Andres
Freund, whereby the initial portion(s) of the split-up WAL record are
kept, and a special type of WAL record is written where the contrecord
was lost, so that WAL replay in the replica knows to skip the broken
parts.  With this approach, we can continue to stream/archive segment
files as soon as they are complete, and replay of the broken records
will proceed across the crash point without a hitch.

Because a new type of WAL record is added, users should be careful to
upgrade standbys first, primaries later. Otherwise they risk the standby
being unable to start if the primary happens to write such a record.

A new TAP test that exercises this is added, but the portability of it
is yet to be seen.

This has been wrong since the introduction of physical replication, so
backpatch all the way back.  In stable branches, keep the new
XLogReaderState members at the end of the struct, to avoid an ABI
break.