Hi folks,

I am facing a strange issue, Pgbackrest backup fails for DIFF or INCR
backups but not Full backup,  with the * error WAL file cannot be archived
before 60000 ms timeout.*

The pgbackrest  " *stanza check* " command  *sometimes succeeds, but
sometimes fails.*

I don't know why  *  PG is unable to*   *copy  WAL files from   pg_wal to
/data/myarchive_di*r* in real time*. I always  observed a delay of a few
minutes for a wal file from pg_wal to appear in  /data/my_archive_dir.

 I'hv observed  in the  postgresql.conf   (checkpoint_timeout = 5 m,
max_wal_size = 16 GB, wal_keep_size=15GB, min_wal_size=80MB etc.)

and I found  pg_wal dir size is 16 GB on disk (du -h -d 1 )  and
/data/archive  = 1.2 T
/dev/mapper/rhel_bcga68-data  5.0T  1.8T  3.3T  35% /data


Can we suspect the 5 M is the reason for the WAL archiving delay  ?

Backup  to a remote RepoServer for INCR or DIFF  backup always fails, I
found a full backup always succeeds!!!

What is the ideal value needed to be set for  "*checkpoint_timeout*"   ?
*Or this doesn't have any impact on  pgbackrest failure ?*


*archive_command = 'pgbackrest --stanza=My_Repo archive-push %p && cp %p
/data/archive/%f' *


>From postgresql logs  I am seeing this ..

ERROR: [082]: unable to push WAL file '000000010000026300000002' to the
archive asynchronously after 60 second(s)
       HINT: check '/var/log/pgbackrest/My_Repo-archive-push-async.log' for
errors.
INFO: archive-push command end: aborted with exception [082]
2025-05-02 12:15:17 IST LOG:  archive command failed with exit code 82
2025-05-02 12:15:17 IST DETAIL:  The failed archive command was: pgbackrest
--stanza=My_Repo archive-push pg_wal/000000010000026300000002 && cp
pg_wal/000000010000026300000002 /data/archive/000000010000026300000002
INFO: archive-push command begin 2.52.1: [pg_wal/000000010000026300000002]
--archive-async --compress-type=zst --exec-id=2848559-384cf49c
--log-level-console=info --log-level-file=debug --log-level-stderr=info
--pg1-path= /var/lib/postgres/16/data   --pg-version-force=16
--process-max=6 --repo1-host=10.x.y.202 --repo1-host-user=pgbackrest
--spool-path=/var/spool/pgbackrest --stanza=My_Repo

top  output   on DB cluster:

top - 12:37:00 up 66 days, 17:24,  2 users,  load average: 4.04, 4.72, 4.56

Tasks: 902 total,   4 running, 897 sleeping,   0 stopped,   1 zombie
%Cpu(s):  7.4 us,  1.7 sy,  0.0 ni, 89.9 id,  0.4 wa,  0.2 hi,  0.4 si,
 0.0 st
MiB Mem :  31837.6 total,    706.1 free,  15243.0 used,  24741.0 buff/cache
MiB Swap:   8060.0 total,   6634.0 free,   1426.0 used.  16608.9 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
COMMAND
2839363 postgre+  20   0 8965608   7.2g   7.1g S  70.2  23.0   2:02.61
postgres
2864108 postgre+  20   0 8967848   7.1g   7.1g S  64.9  22.8   0:30.04
postgres
2865547 postgre+  20   0 8965432   7.1g   7.1g S  39.1  22.8   0:32.30
postgres
2865752 postgre+  20   0 8964352   6.9g   6.9g S  16.6  22.3   0:32.94
postgres



Model name:            Intel(R) Xeon(R) Gold 6430
    BIOS Model name:     Intel(R) Xeon(R) Gold 6430
    CPU family:          6
    Model:               143
    Thread(s) per core:  1
    Core(s) per socket:  16

These are vCPUs    (16 nos) , OS RHEL 9,  postgres 16, pgbackrest 2.52.1

*Any hints  most welcome to find the root cause / troubleshoot
the  pgbackrest failures for DIFF/ INCR backups.*

Thank you
Krishane





*For More Inputs:   *
For more info   :  you can see the full backup success here. but a diff
backup fails.

        full backup: 20250505-070204F
            timestamp start/stop: 2025-05-05 07:02:04+05:30 / 2025-05-05
22:11:23+05:30
            wal start/stop: 000000010000026F00000066 /
000000010000027300000045
            database size: 503.1GB, database backup size: 503.1GB
            repo1: backup size: 79GB


When I try diff backup  it always fails.

[root@Repo ~]# tail -f /var/log/pgbackrest/My_Repo-backup.log

                                    stack trace:

command/archive/find.c:walSegmentFind:191:(this: {WalSegmentFind},
walSegment: {"000000010000027B0000006C"})

command/backup/backup.c:backupArchiveCheckCopy:(backupData: {BackupData},
manifest: {Manifest})
                                    command/backup/backup.c:cmdBackup:(void)
                                    main.c:main:(debug log level required
for parameters)

--------------------------------------------------------------------
2025-05-07 15:47:49.760 P00   INFO: backup command end: aborted with
exception [082]
2025-05-07 15:47:49.760 P00  DEBUG:     command/exit::exitSafe: => 82
2025-05-07 15:47:49.860 P00  DEBUG:     main::main: => 82
^C
[ root@Repo ~   ~]# date
Wednesday 07 May 2025 04:06:37 PM IST

*The postgres  log says*

=316781-61d82f85 --log-level-console=info --log-level-file=debug
--log-level-stderr=info --pg1-path=
/var/lib/postgres/16/data   --pg-version-force=16 --process-max=3
--repo1-host=10.x.y.202 --repo1-host-user=pgbackrest
--spool-path=/var/spool/pgbackrest --stanza=My_Repo
INFO: pushed WAL file '000000010000027B00000082' to the archive
asynchronously
INFO: archive-push command end: completed successfully (37506ms)
INFO: archive-push command begin 2.52.1: [pg_wal/000000010000027B00000083]
--archive-async --compress-type=zst --exec-id=317334-27a30b57
--log-level-console=info --log-level-file=debug --log-level-stderr=info
--pg1-path= /var/lib/postgres/16/data   --pg-version-force=16
--process-max=3 --repo1-host=10.x.y.202 --repo1-host-user=pgbackrest
--spool-path=/var/spool/pgbackrest --stanza=My_Repo
ERROR: [082]: unable to push WAL file '000000010000027B00000083' to the
archive asynchronously after 60 second(s)
       HINT: check '/var/log/pgbackrest/My_Repo-archive-push-async.log' for
errors.
INFO: archive-push command end: aborted with exception [082]
2025-05-07 16:22:56 IST LOG:  archive command failed with exit code 82

[ root@Repo ~   ~]#

Reply via email to