> On Jul 12, 2018, at 10:29, Andres Freund <and...@anarazel.de> wrote:
> 
> This needs a lot more information before somebody can reasonably act on
> it.

Happy to provide, of course!

The two relevant hosts are "Ash" and "Chi".  As mentioned, they've been flipped 
back and forth repeatedly using pg_rewind: One will be promoted, the other 
pg_rewind'd, and then brought up as a WAL shipping secondary to the first.  
This procedure has worked repeatedly, but this last instance failed.  The logs 
from the failure are attached below, along with the output of from 
pg_controldata for both hosts.  The systems are still in this configuration, so 
we can gather more as required.

What surprises me about the error is that while the recovery point seems 
reasonable, it shouldn't be on timeline 103, but on timeline 105.


Failover to Ash:
2018-07-10 19:14:22 UTC [2072]: [5153-1] user=,db=,app=,client= LOG:  received 
promote request
2018-07-10 19:14:22 UTC [2072]: [5154-1] user=,db=,app=,client= LOG:  redo done 
at A58/4F0822D0
2018-07-10 19:14:22 UTC [2072]: [5155-1] user=,db=,app=,client= LOG:  last 
completed transaction was at log time 2018-07-10 19:13:54.515121+00
2018-07-10 19:14:23 UTC [2072]: [5156-1] user=,db=,app=,client= LOG:  restored 
log file "0000006700000A580000004F" from archive
2018-07-10 19:14:23 UTC [2072]: [5157-1] user=,db=,app=,client= LOG:  selected 
new timeline ID: 104
2018-07-10 19:14:24 UTC [2072]: [5158-1] user=,db=,app=,client= LOG:  restored 
log file "00000067.history" from archive

[Chi rewound using pg_rewind against Ash, brought up]

Chi:
2018-07-10 19:26:05 UTC [3260]: [4-1] user=,db=,app=,client= LOG:  restored log 
file "00000068.history" from archive
2018-07-10 19:26:06 UTC [3260]: [5-1] user=,db=,app=,client= LOG:  redo starts 
at A58/4E061EF8
2018-07-10 19:26:07 UTC [3260]: [6-1] user=,db=,app=,client= LOG:  restored log 
file "0000006800000A5800000050" from archive
2018-07-10 19:26:08 UTC [3260]: [7-1] user=,db=,app=,client= LOG:  restored log 
file "0000006800000A5800000051" from archive
2018-07-10 19:26:09 UTC [3260]: [8-1] user=,db=,app=,client= LOG:  restored log 
file "0000006800000A5800000052" from archive
2018-07-10 19:26:10 UTC [3260]: [9-1] user=,db=,app=,client= LOG:  restored log 
file "0000006800000A5800000053" from archive
2018-07-10 19:26:11 UTC [3260]: [10-1] user=,db=,app=,client= LOG:  restored 
log file "0000006800000A5800000054" from archive
2018-07-10 19:26:12 UTC [3260]: [11-1] user=,db=,app=,client= LOG:  restored 
log file "0000006800000A5800000055" from archive
2018-07-10 19:26:13 UTC [3260]: [12-1] user=,db=,app=,client= LOG:  restored 
log file "0000006800000A5800000056" from archive
2018-07-10 19:26:13 UTC [3260]: [13-1] user=,db=,app=,client= LOG:  restored 
log file "0000006800000A5800000057" from archive
2018-07-10 19:26:14 UTC [3260]: [14-1] user=,db=,app=,client= LOG:  restored 
log file "0000006800000A5800000058" from archive
2018-07-10 19:26:15 UTC [3260]: [15-1] user=,db=,app=,client= LOG:  restored 
log file "0000006800000A5800000059" from archive
2018-07-10 19:26:15 UTC [3260]: [16-1] user=,db=,app=,client= LOG:  restored 
log file "0000006800000A580000005A" from archive
2018-07-10 19:26:16 UTC [3260]: [17-1] user=,db=,app=,client= LOG:  restored 
log file "0000006800000A580000005B" from archive
2018-07-10 19:26:17 UTC [3260]: [18-1] user=,db=,app=,client= LOG:  restored 
log file "0000006800000A580000005C" from archive
2018-07-10 19:27:28 UTC [3260]: [19-1] user=,db=,app=,client= LOG:  restored 
log file "0000006800000A580000005D" from archive
2018-07-10 19:27:28 UTC [3260]: [20-1] user=,db=,app=,client= LOG:  consistent 
recovery state reached at A58/5D01BCA8
2018-07-10 19:27:28 UTC [2564]: [3-1] user=,db=,app=,client= LOG:  database 
system is ready to accept read only connections
2018-07-10 19:28:28 UTC [3260]: [21-1] user=,db=,app=,client= LOG:  restored 
log file "0000006800000A580000005E" from archive

Chi promoted:
2018-07-10 19:28:37 UTC [3260]: [22-1] user=,db=,app=,client= LOG:  received 
promote request
2018-07-10 19:28:37 UTC [3260]: [23-1] user=,db=,app=,client= LOG:  redo done 
at A58/5E0817D0
2018-07-10 19:28:37 UTC [3260]: [24-1] user=,db=,app=,client= LOG:  last 
completed transaction was at log time 2018-07-10 19:28:12.954559+00
2018-07-10 19:28:37 UTC [3260]: [25-1] user=,db=,app=,client= LOG:  restored 
log file "0000006800000A580000005E" from archive
2018-07-10 19:28:37 UTC [3260]: [26-1] user=,db=,app=,client= LOG:  selected 
new timeline ID: 105
2018-07-10 19:28:38 UTC [3260]: [27-1] user=,db=,app=,client= LOG:  restored 
log file "00000068.history" from archive
2018-07-10 19:28:38 UTC [3260]: [28-1] user=,db=,app=,client= LOG:  archive 
recovery complete
2018-07-10 19:28:38 UTC [3260]: [29-1] user=,db=,app=,client= LOG:  MultiXact 
member wraparound protections are now enabled
2018-07-10 19:28:38 UTC [5068]: [1-1] user=,db=,app=,client= LOG:  checkpoint 
starting: force
2018-07-10 19:28:38 UTC [2564]: [4-1] user=,db=,app=,client= LOG:  database 
system is ready to accept connections

[Ash rewound using pg_rewind against Chi, brought up]

Ash:
2018-07-10 19:40:09 UTC [520]: [3-1] user=,db=,app=,client= LOG:  restored log 
file "00000069.history" from archive
2018-07-10 19:40:10 UTC [520]: [4-1] user=,db=,app=,client= LOG:  entering 
standby mode
2018-07-10 19:40:10 UTC [520]: [5-1] user=,db=,app=,client= LOG:  restored log 
file "00000069.history" from archive
2018-07-10 19:40:12 UTC [520]: [6-1] user=,db=,app=,client= LOG:  restored log 
file "0000006700000A580000004E" from archive
2018-07-10 19:40:12 UTC [520]: [7-1] user=,db=,app=,client= FATAL:  requested 
timeline 105 does not contain minimum recovery point A58/6B109F28 on timeline 
103

--
From Ash:

C:\Windows\system32>pg_controldata -D e:\pgdata
pg_control version number:            960
Catalog version number:               201608131
Database system identifier:           6377726782605679660
Database cluster state:               in archive recovery
pg_control last modified:             7/10/2018 7:28:38 PM
Latest checkpoint location:           A58/4E0689F0
Prior checkpoint location:            A58/51E0EF50
Latest checkpoint's REDO location:    A58/4E061EF8
Latest checkpoint's REDO WAL file:    0000006700000A580000004E
Latest checkpoint's TimeLineID:       103
Latest checkpoint's PrevTimeLineID:   103
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0:2428875781
Latest checkpoint's NextOID:          2368444
Latest checkpoint's NextMultiXactId:  2
Latest checkpoint's NextMultiOffset:  3
Latest checkpoint's oldestXID:        2243636521
Latest checkpoint's oldestXID's DB:   16397
Latest checkpoint's oldestActiveXID:  2428782169
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 16396
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint:            7/10/2018 7:12:11 PM
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location:     A58/6B109F28
Min recovery ending loc's timeline:   103
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
wal_level setting:                    replica
wal_log_hints setting:                on
max_connections setting:              1000
max_worker_processes setting:         8
max_prepared_xacts setting:           0
max_locks_per_xact setting:           64
track_commit_timestamp setting:       off
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Data page checksum version:           0

From Chi:

C:\Windows\system32>pg_controldata -D e:\pgdata
pg_control version number:            960
Catalog version number:               201608131
Database system identifier:           6377726782605679660
Database cluster state:               in production
pg_control last modified:             7/11/2018 3:39:21 PM
Latest checkpoint location:           A5A/C205EE68
Prior checkpoint location:            A5A/BF15ED30
Latest checkpoint's REDO location:    A5A/C1102590
Latest checkpoint's REDO WAL file:    0000006900000A5A000000C1
Latest checkpoint's TimeLineID:       105
Latest checkpoint's PrevTimeLineID:   105
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0:2431541507
Latest checkpoint's NextOID:          2384828
Latest checkpoint's NextMultiXactId:  2
Latest checkpoint's NextMultiOffset:  3
Latest checkpoint's oldestXID:        2243636521
Latest checkpoint's oldestXID's DB:   16397
Latest checkpoint's oldestActiveXID:  2431541507
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 16396
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint:            7/11/2018 3:38:41 PM
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
wal_level setting:                    replica
wal_log_hints setting:                on
max_connections setting:              1000
max_worker_processes setting:         8
max_prepared_xacts setting:           0
max_locks_per_xact setting:           64
track_commit_timestamp setting:       off
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Data page checksum version:           0

--
-- Christophe Pettus
   x...@thebuild.com


Reply via email to