UNOFFICIAL

Hello,

We have the following setup in our DEV environment -

- Primary/Replica using asynchronous streaming replication
- Servers run Ubuntu
              Linux 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 
2023 x86_64 x86_64 GNU/Linux
- Postgres version
              postgres (PostgreSQL) 13.13 (Ubuntu 13.13-1.pgdg22.04+1)

The replica has been running for many weeks and then the walreceiver failed 
with the following -

<2024-02-22 14:12:34.731 AEDT [565228]: [60136-1] user=,db= > DETAIL:  Last 
completed transaction was at log time 2024-02-22 14:12:00.327411+11.
<2024-02-22 14:17:34.822 AEDT [565228]: [60137-1] user=,db= > LOG:  
restartpoint starting: time
<2024-02-22 14:17:34.935 AEDT [565228]: [60138-1] user=,db= > LOG:  
restartpoint complete: wrote 1 buffers (0.0%); 0 WAL file(s) added, 0 removed, 
0 recycled; write=0.106 s, sync=0.002 s, total=0.114 s; sync files=1, 
longest=0.002 s, average=0.002 s; distance=4 kB,
estimate=14 kB
<2024-02-22 14:17:34.935 AEDT [565228]: [60139-1] user=,db= > LOG:  recovery 
restart point at 6/B0ACB188
<2024-02-22 14:17:34.935 AEDT [565228]: [60140-1] user=,db= > DETAIL:  Last 
completed transaction was at log time 2024-02-22 14:15:00.457984+11.
<2024-02-22 14:20:23.383 AEDT [565227]: [7-1] user=,db= > LOG:  invalid record 
length at 6/B0ACC360: wanted 24, got 0
<2024-02-22 14:20:23.383 AEDT [565231]: [6-1] user=,db= > FATAL:  terminating 
walreceiver process due to administrator command
<2024-02-22 14:20:23.385 AEDT [565227]: [8-1] user=,db= > LOG:  invalid record 
length at 6/B0ACC360: wanted 24, got 0
<2024-02-22 14:20:23.385 AEDT [565227]: [9-1] user=,db= > LOG:  invalid record 
length at 6/B0ACC360: wanted 24, got 0
<2024-02-22 14:20:23.385 AEDT [565227]: [10-1] user=,db= > LOG:  invalid record 
length at 6/B0ACC360: wanted 24, got 0
<2024-02-22 14:20:28.390 AEDT [565227]: [11-1] user=,db= > LOG:  invalid record 
length at 6/B0ACC360: wanted 24, got 0
<2024-02-22 14:20:33.394 AEDT [565227]: [12-1] user=,db= > LOG:  invalid record 
length at 6/B0ACC360: wanted 24, got 0
<2024-02-22 14:20:38.394 AEDT [565227]: [13-1] user=,db= > LOG:  invalid record 
length at 6/B0ACC360: wanted 24, got 0
<2024-02-22 14:20:43.398 AEDT [565227]: [14-1] user=,db= > LOG:  invalid record 
length at 6/B0ACC360: wanted 24, got 0
<2024-02-22 14:20:48.402 AEDT [565227]: [15-1] user=,db= > LOG:  invalid record 
length at 6/B0ACC360: wanted 24, got 0
<2024-02-22 14:20:53.406 AEDT [565227]: [16-1] user=,db= > LOG:  invalid record 
length at 6/B0ACC360: wanted 24, got 0


The replica was restarted and the walreceiver immediately failed again with -

2024-02-23 07:50:05.591 AEDT [1957117]: [2-1] user=,db= > LOG:  entering 
standby mode
<2024-02-23 07:50:05.607 AEDT [1957117]: [3-1] user=,db= > LOG:  redo starts at 
6/B0ACB188
<2024-02-23 07:50:05.607 AEDT [1957117]: [4-1] user=,db= > LOG:  consistent 
recovery state reached at 6/B0ACC360
<2024-02-23 07:50:05.607 AEDT [1957117]: [5-1] user=,db= > LOG:  invalid record 
length at 6/B0ACC360: wanted 24, got 0
<2024-02-23 07:50:05.609 AEDT [1957114]: [5-1] user=,db= > LOG:  database 
system is ready to accept read only connections
<2024-02-23 07:50:05.637 AEDT [1957121]: [1-1] user=,db= > LOG:  started 
streaming WAL from primary at 6/B0000000 on timeline 5
<2024-02-23 07:50:05.696 AEDT [1957117]: [6-1] user=,db= > LOG:  invalid magic 
number 0000 in log segment 0000000500000006000000B0, offset 0
<2024-02-23 07:50:05.697 AEDT [1957121]: [2-1] user=,db= > FATAL:  terminating 
walreceiver process due to administrator command
<2024-02-23 07:50:05.798 AEDT [1957117]: [7-1] user=,db= > LOG:  invalid magic 
number 0000 in log segment 0000000500000006000000B0, offset 0
<2024-02-23 07:50:05.798 AEDT [1957117]: [8-1] user=,db= > LOG:  invalid magic 
number 0000 in log segment 0000000500000006000000B0, offset 0
<2024-02-23 07:50:06.215 AEDT [1957125]: [1-1] user=[unknown],db=[unknown] > 
LOG:  connection received: host=[local]


Running a pg_waldump against the WAL file on the primary server produces -

              deveng >pg_waldump 0000000500000006000000B0
              pg_waldump: fatal: WAL segment size must be a power of two 
between 1 MB and 1 GB, but the WAL file "0000000500000006000000B0" header 
specifies 0 bytes

The same error is returned when using the WAL in pg_wal or the WAL that has 
been archived locally on the primary server.

We also stream WAL to a barman backup server. Running pg_waldump against the 
WAL file on the barman server errors with -

              pg_waldump: fatal: error in WAL record at 6/B0ACC328: invalid 
record length at 6/B0ACC360: wanted 24, got 0

The primary cluster Postgres log does not show any unusual messages at the time 
of the original walreceiver failure -

<2024-02-22 14:20:00.279 AEDT [2408502]: [2-1] 
user=crmportaluser,db=reportingentity > LOG:  connection authorized: 
user=crmportaluser database=reportingentity SSL enabled (protocol=TLSv1.3, c
ipher=TLS_AES_256_GCM_SHA384, bits=256, compression=off)
<2024-02-22 14:20:23.330 AEDT [2408557]: [1-1] user=,db= > LOG:  automatic 
analyze of table "reportingentity.crmportal.setting" system usage: CPU: user: 
0.00 s, system: 0.00 s, elapsed: 0.00 s
<2024-02-22 14:20:23.396 AEDT [10152]: [3-1] user=repmgr,db=[unknown] > LOG:  
disconnection: session time: 2140:47:39.683 user=repmgr database= 
host=192.168.136.200 port=51680
<2024-02-22 14:20:29.479 AEDT [2306390]: [251-1] 
user=origamiuser,db=reportingentity > LOG:  temporary file: path 
"base/pgsql_tmp/pgsql_tmp2306390.124", size 1370040
<

Any help with this error is appreciated.





UNOFFICIAL

**********************************************************************
Please  note  that  your  email address  is known to  AUSTRAC  for the
purposes  of  communicating with you.  The information  transmitted in
this  e-mail is  for the  use of  the intended  recipient only and may
contain confidential and/or legally  privileged  material. If you have
received  this information  in error you must not disseminate, copy or
take  any  action on  it and we  request that you delete all copies of
this transmission together with attachments and notify the sender.

This footnote also confirms that this email message has been swept for
the presence of computer viruses.
**********************************************************************

Reply via email to