On Sat, Jun 1, 2019 at 8:25 PM Tom K <tomk...@gmail.com> wrote:

> So the best bet will be trying to get through this error then:
>
> [ PSQL02 ]
> PANIC:replication check point has wrong magic 0 instead of  307747550
>
>
>
>
> On Sat, Jun 1, 2019 at 8:21 PM Tom K <tomk...@gmail.com> wrote:
>
>>
>>
>> On Sat, Jun 1, 2019 at 7:34 PM Adrian Klaver <adrian.kla...@aklaver.com>
>> wrote:
>>
>>> On 6/1/19 4:22 PM, Tom K wrote:
>>> >
>>> >
>>>
>>> >
>>> > Looks like this crash was far more catastrophic then I thought.  By
>>> the
>>> > looks of things, thinking on psql02 would be my best bet.
>>> >
>>>
>>> The more I look at it the more I think the replication was not doing
>>> what you thought it was doing. That psql02 was the primary and that
>>> psql01 and psql03 where out of sync and/or defunct standbys.
>>>
>>
>> Now that I look at the files myself, that's the conclusion I was coming
>> to myself.  Sample config:
>>
>> [root@psql02 base]# cat /etc/patroni.yml
>> scope: postgres
>> namespace: /db/
>> name: postgresql1
>>
>> restapi:
>>     listen: 192.168.0.124:8008
>>     connect_address: 192.168.0.124:8008
>>
>> etcd:
>>     host: 192.168.0.124:2379
>>
>> bootstrap:
>>     dcs:
>>         ttl: 30
>>         loop_wait: 10
>>         retry_timeout: 10
>>         maximum_lag_on_failover: 1048576
>>         postgresql:
>>             use_pg_rewind: true
>>
>>     initdb:
>>     - encoding: UTF8
>>     - data-checksums
>>
>>     pg_hba:
>>     - host replication replicator 127.0.0.1/32 md5
>>     - host replication replicator 192.168.0.108/0 md5
>>     - host replication replicator 192.168.0.124/0 md5
>>     - host replication replicator 192.168.0.118/0 md5
>>     - host all all 0.0.0.0/0 md5
>>
>>     users:
>>         admin:
>>             password: admin
>>             options:
>>                 - createrole
>>                 - createdb
>>
>> postgresql:
>>     listen: 192.168.0.124:5432
>>     bin_dir: /usr/pgsql-10/bin
>>     connect_address: 192.168.0.124:5432
>>     data_dir: /data/patroni
>>     pgpass: /tmp/pgpass
>>     unix_socket_directories: /data/patroni
>>     authentication:
>>         replication:
>>             username: replicator
>>             password: rep-pass
>>         superuser:
>>             username: postgres
>>             password: <SECRET>
>>     parameters:
>>         unix_socket_directories: '.'
>>
>> tags:
>>     nofailover: false
>>     noloadbalance: false
>>     clonefrom: false
>>     nosync: false
>> [root@psql02 base]#
>>
>> Or perhaps when the system crashed, the filesystem check simply moved the
>> folders out due to corruption.
>>
>>
>>>
>>>
Trying what we did above but on the second node:

2019-06-01 20:28:45.305 EDT [21745] LOG:  database system was interrupted;
last known up at 2019-04-28 06:06:24 EDT
2019-06-01 20:28:46.142 EDT [21745] DEBUG:  primary_slot_name =
'postgresql1'
2019-06-01 20:28:46.142 EDT [21745] DEBUG:  standby_mode = 'on'
2019-06-01 20:28:46.142 EDT [21745] DEBUG:  recovery_target_timeline =
latest
2019-06-01 20:28:46.142 EDT [21745] WARNING:  recovery command file
"recovery.conf" specified neither primary_conninfo nor restore_command
2019-06-01 20:28:46.142 EDT [21745] HINT:  The database server will
regularly poll the pg_wal subdirectory to check for files placed there.
2019-06-01 20:28:46.142 EDT [21745] LOG:  entering standby mode
2019-06-01 20:28:46.142 EDT [21745] LOG:  invalid record length at
0/4C35CDF8: wanted 24, got 0
2019-06-01 20:28:46.142 EDT [21745] LOG:  invalid primary checkpoint record
2019-06-01 20:28:46.142 EDT [21745] LOG:  using previous checkpoint record
at 0/4C34EDA8
2019-06-01 20:28:46.142 EDT [21745] DEBUG:  redo record is at 0/4C34ED70;
shutdown FALSE
2019-06-01 20:28:46.143 EDT [21745] DEBUG:  next transaction ID: 0:1409831;
next OID: 237578
2019-06-01 20:28:46.143 EDT [21745] DEBUG:  next MultiXactId: 48; next
MultiXactOffset: 174
2019-06-01 20:28:46.143 EDT [21745] DEBUG:  oldest unfrozen transaction ID:
549, in database 1
2019-06-01 20:28:46.143 EDT [21745] DEBUG:  oldest MultiXactId: 1, in
database 1
2019-06-01 20:28:46.143 EDT [21745] DEBUG:  commit timestamp Xid
oldest/newest: 0/0
2019-06-01 20:28:46.143 EDT [21745] DEBUG:  transaction ID wrap limit is
2147484196, limited by database with OID 1
2019-06-01 20:28:46.143 EDT [21745] DEBUG:  MultiXactId wrap limit is
2147483648, limited by database with OID 1
2019-06-01 20:28:46.143 EDT [21745] DEBUG:  starting up replication slots
2019-06-01 20:28:46.143 EDT [21745] DEBUG:  starting up replication origin
progress state
2019-06-01 20:28:46.143 EDT [21745] PANIC:  replication checkpoint has
wrong magic 0 instead of 307747550
2019-06-01 20:28:46.143 EDT [21743] DEBUG:  reaping dead processes
2019-06-01 20:28:46.143 EDT [21743] LOG:  startup process (PID 21745) was
terminated by signal 6: Aborted
2019-06-01 20:28:46.143 EDT [21743] LOG:  aborting startup due to startup
process failure
2019-06-01 20:28:46.144 EDT [21743] DEBUG:  shmem_exit(1): 0
before_shmem_exit callbacks to make
2019-06-01 20:28:46.144 EDT [21743] DEBUG:  shmem_exit(1): 5 on_shmem_exit
callbacks to make
2019-06-01 20:28:46.144 EDT [21743] DEBUG:  cleaning up dynamic shared
memory control segment with ID 222930115
2019-06-01 20:28:46.145 EDT [21743] DEBUG:  proc_exit(1): 2 callbacks to
make
2019-06-01 20:28:46.146 EDT [21743] LOG:  database system is shut down
2019-06-01 20:28:46.146 EDT [21743] DEBUG:  exit(1)
2019-06-01 20:28:46.146 EDT [21743] DEBUG:  shmem_exit(-1): 0
before_shmem_exit callbacks to make
2019-06-01 20:28:46.146 EDT [21743] DEBUG:  shmem_exit(-1): 0 on_shmem_exit
callbacks to make
2019-06-01 20:28:46.146 EDT [21743] DEBUG:  proc_exit(-1): 0 callbacks to
make
2019-06-01 20:28:46.147 EDT [21744] DEBUG:  logger shutting down
2019-06-01 20:28:46.147 EDT [21744] DEBUG:  shmem_exit(0): 0
before_shmem_exit callbacks to make
2019-06-01 20:28:46.147 EDT [21744] DEBUG:  shmem_exit(0): 0 on_shmem_exit
callbacks to make
2019-06-01 20:28:46.147 EDT [21744] DEBUG:  proc_exit(0): 0 callbacks to
make
2019-06-01 20:28:46.147 EDT [21744] DEBUG:  exit(0)
2019-06-01 20:28:46.147 EDT [21744] DEBUG:  shmem_exit(-1): 0
before_shmem_exit callbacks to make
2019-06-01 20:28:46.147 EDT [21744] DEBUG:  shmem_exit(-1): 0 on_shmem_exit
callbacks to make
2019-06-01 20:28:46.147 EDT [21744] DEBUG:  proc_exit(-1): 0 callbacks to
make
^C
-bash-4.2$

Removing the recovery.conf and restarting yields the following messages:

2019-06-01 20:31:12.231 EDT [21910] LOG:  database system was interrupted;
last known up at 2019-04-28 06:06:24 EDT
2019-06-01 20:31:12.505 EDT [21910] LOG:  invalid record length at
0/4C35CDF8: wanted 24, got 0
2019-06-01 20:31:12.505 EDT [21910] LOG:  invalid primary checkpoint record
2019-06-01 20:31:12.505 EDT [21910] LOG:  using previous checkpoint record
at 0/4C34EDA8
2019-06-01 20:31:12.505 EDT [21910] DEBUG:  redo record is at 0/4C34ED70;
shutdown FALSE
2019-06-01 20:31:12.505 EDT [21910] DEBUG:  next transaction ID: 0:1409831;
next OID: 237578
2019-06-01 20:31:12.505 EDT [21910] DEBUG:  next MultiXactId: 48; next
MultiXactOffset: 174
2019-06-01 20:31:12.505 EDT [21910] DEBUG:  oldest unfrozen transaction ID:
549, in database 1
2019-06-01 20:31:12.505 EDT [21910] DEBUG:  oldest MultiXactId: 1, in
database 1
2019-06-01 20:31:12.505 EDT [21910] DEBUG:  commit timestamp Xid
oldest/newest: 0/0
2019-06-01 20:31:12.505 EDT [21910] DEBUG:  transaction ID wrap limit is
2147484196, limited by database with OID 1
2019-06-01 20:31:12.505 EDT [21910] DEBUG:  MultiXactId wrap limit is
2147483648, limited by database with OID 1
2019-06-01 20:31:12.505 EDT [21910] DEBUG:  starting up replication slots
2019-06-01 20:31:12.505 EDT [21910] DEBUG:  starting up replication origin
progress state
2019-06-01 20:31:12.505 EDT [21910] PANIC:  replication checkpoint has
wrong magic 0 instead of 307747550
2019-06-01 20:31:12.506 EDT [21908] DEBUG:  reaping dead processes
2019-06-01 20:31:12.506 EDT [21908] LOG:  startup process (PID 21910) was
terminated by signal 6: Aborted
2019-06-01 20:31:12.506 EDT [21908] LOG:  aborting startup due to startup
process failure
2019-06-01 20:31:12.506 EDT [21908] DEBUG:  shmem_exit(1): 0
before_shmem_exit callbacks to make
2019-06-01 20:31:12.506 EDT [21908] DEBUG:  shmem_exit(1): 5 on_shmem_exit
callbacks to make
2019-06-01 20:31:12.506 EDT [21908] DEBUG:  cleaning up dynamic shared
memory control segment with ID 976986759
2019-06-01 20:31:12.508 EDT [21908] DEBUG:  proc_exit(1): 2 callbacks to
make
2019-06-01 20:31:12.508 EDT [21908] LOG:  database system is shut down
2019-06-01 20:31:12.508 EDT [21908] DEBUG:  exit(1)
2019-06-01 20:31:12.508 EDT [21908] DEBUG:  shmem_exit(-1): 0
before_shmem_exit callbacks to make
2019-06-01 20:31:12.508 EDT [21908] DEBUG:  shmem_exit(-1): 0 on_shmem_exit
callbacks to make
2019-06-01 20:31:12.508 EDT [21908] DEBUG:  proc_exit(-1): 0 callbacks to
make
2019-06-01 20:31:12.510 EDT [21909] DEBUG:  logger shutting down
2019-06-01 20:31:12.510 EDT [21909] DEBUG:  shmem_exit(0): 0
before_shmem_exit callbacks to make
2019-06-01 20:31:12.510 EDT [21909] DEBUG:  shmem_exit(0): 0 on_shmem_exit
callbacks to make
2019-06-01 20:31:12.510 EDT [21909] DEBUG:  proc_exit(0): 0 callbacks to
make
2019-06-01 20:31:12.510 EDT [21909] DEBUG:  exit(0)
2019-06-01 20:31:12.510 EDT [21909] DEBUG:  shmem_exit(-1): 0
before_shmem_exit callbacks to make
2019-06-01 20:31:12.510 EDT [21909] DEBUG:  shmem_exit(-1): 0 on_shmem_exit
callbacks to make
2019-06-01 20:31:12.510 EDT [21909] DEBUG:  proc_exit(-1): 0 callbacks to
make



> --
>>> Adrian Klaver
>>> adrian.kla...@aklaver.com
>>>
>>

Reply via email to