On Thu, Aug 25, 2016 at 10:33 PM, Tsunakawa, Takayuki
<tsunakawa.ta...@jp.fujitsu.com> wrote:
> The processing went as follows.
> 1. node1's timeline is 140.  It wrote a WAL record at the end of WAL segment 
> 117.  The WAL record didn't fit the last page, so it was split across 
> segments 117 and 118.
> 2. WAL segment 117 was archived.
> 3. node1 got down, and node2 was promoted.
> 4. As part of the recovery process, node2 retrieves WAL segment 117 from 
> archive.  It found a WAL record fragment at the end of the segment but could 
> not find the remaining fragment in segment 118, so node2 stops recovery there.
> LOG:  restored log file "0000008C0000028C00000075" from archive
> LOG:  received promote request
> LOG:  redo done at 28C/75FFF738
> 5. node2 becomes the primary, and its timeline becomes 118.  node3 is 
> disconnected by node2 (but later reconnectes to node2).
> LOG:  terminating all walsender processes to force cascaded standby(s) to 
> update timeline and reconnect
> 6. node3 retrieves and applies WAL segment 117 from archive.
> LOG:  restored log file "0000008C0000028C00000075" from archive
> 7. node3 found .history file for time line 141 and renews its timeline to 141.
> 8. Because node3 found a WAL record fragment at the end of segment 117, it 
> expects to find the remaining fragment at the beginning of WAL segment 118 
> streamed from node2.  But there was a fragment of a different WAL record, 
> because node2 overwrote a different WAL record at the end of segment 117 
> across to 118.
> LOG:  invalid contrecord length 5892 in log file 652, segment 118, offset 0
> 9. node3 then retrieves segment 117 from archive again to get the WAL record 
> at the end of segment 117.  However, as node3's timeline is already 141, it 
> complains about the older timeline when it sees the timeline 140 at the 
> beginning of segment 117.
> LOG:  out-of-sequence timeline ID 140 (after 141) in log file 652, segment 
> 117, offset 0

OK.  I agree that's a problem.  However, your patch adds zero new
comment text while removing some existing comments, so I can't easily
tell how it solves that problem or whether it does so correctly.  Even
if I were smart enough to figure it out, I wouldn't want to rely on
the next person also being that smart.  This is obviously a subtle
problem in tricky code, so a clear explanation of the fix seems like a
very good idea.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to