Re: [HACKERS] Assertion failure at standby promotion
On 06.05.2013 13:08, Heikki Linnakangas wrote: On 03.05.2013 18:17, Fujii Masao wrote: Hi, I got the following assertion failure when I promoted the standby. 2013-05-04 00:12:31 JST sby1 LOG: received promote request 2013-05-04 00:12:31 JST sby1 FATAL: terminating walreceiver process due to administrator command 2013-05-04 00:12:31 JST sby1 LOG: redo done at 0/6FFE038 2013-05-04 00:12:31 JST sby1 LOG: last completed transaction was at log time 2013-05-04 00:12:25.878909+09 2013-05-04 00:12:31 JST sby1 LOG: selected new timeline ID: 2 2013-05-04 00:12:31 JST sby1 LOG: archive recovery complete 2013-05-04 00:12:31 JST sby1 LOG: checkpoint starting: TRAP: FailedAssertion("!(sentPtr<= sendTimeLineValidUpto)", File: "walsender.c", Line: 1465) 2013-05-04 00:12:31 JST sby1 LOG: autovacuum launcher started The way to reproduce this is: 1. Create one master A, one standby B, and one cascade standby C. 2. Run pgbench -i -s 10 3. Promote the standby B before pgbench -i finishes I was able to reproduce this. The assertion checks that if the system is promoted at WAL location X, we must not have already sent WAL at > X to the client. As the code stands, that assumption is wrong; the walsender will merrily stream WAL that hasn't been replayed yet, and the system can be promoted before replaying all the WAL that has been streamed to a cascading standby. The comment in GetStandbyFlushRecPtr(), which is the function that determined how far the WAL may be streamed to a cascading standby, says this: /* * We can safely send what's already been replayed. Also, if walreceiver * is streaming WAL from the same timeline, we can send anything that * it has streamed, but hasn't been replayed yet. */ There seems to be two bugs here: 1. This used to work in 9.2, because the startup process would always replay all the WAL present in pg_xlog before promoting (the WAL present in pg_xlog was streamed from master). But the refactorings in xlog.c in 9.3devel broke that, so that the startup process can promote earlier. 2. Even after fixing the logic in xlog.c, there is still a corner-case where the startup process can promote before all the WAL that has been received from walreceiver has been received. That happens if the WAL streaming is terminated at a page boundary, rather than at a record boundary. For example, we might have received WAL up to the page boundary at 0/5BFA000, but the last *complete* record that we have received ends at 0/5BF9BD8. To fix the second issue, I think two things need to happen. First, we need to suppress the check in walsender. Second, we need to teach the WAL replay to back off when that happens. At the moment, the replay in the cascading standby gets stuck, trying to fetch the next page containing rest of the partial WAL record. Instead, it should throw away the partial record it has, and resync at the end of the last replayed record. I came up with the attached patch for this (fix-standby-promotion-assert-fail-2.patch). You will also need to apply fast-promotion-quick-fix.patch to work around the bug in fast promotion I reported here: http://www.postgresql.org/message-id/5188cffa.3020...@vmware.com. pg_receivexlog has a variant of the same bug. If the server has sent WAL up to segment boundary, and the segment boundary splits a WAL record in half, and the server is then promoted so that the promotion checkpoint (or end-of-recovery) record goes to the previous segment, pg_receivexlog will not fetch the segment containing the checkpoint record correctly. When following a timeline switch, it should restart streaming from the new timeline at the point where the timeline switch occurred, rather than at the point on the old timeline where it stopped. Usually it's the same thing, but not if the streaming was paused at a page or segment boundary. To fix this for pg_receivexlog, I added the start position of the next timeline to the result set that the server sends at end of WAL streaming. Previously, it only sent the TLI of the next timeline, and it was assumed that the starting point is the same as the point where streaming was stopped. It's a bit late to change it, but I have to say I don't like this whole business of relaying WAL to a cascading standby that hasn't been replayed. It would be a lot simpler if you could assume that whatever gets streamed downstream has already been replayed upstream. I understand that it's nice from a performance point of view, and because a cascading standby isn't then lagged behind if the replay gets stalled in the upstream server because of a hot standby conflict. But still.. I'm going to sleep over this and continue testing tomorrow. Please have a look. I think 9.2 has the same bug, BTW. Without support for timeline switches over streaming replication, it was just more difficult to hit. Looking closer, I believe 9.2 is OK. Recovery loops back to retry the whole record correctly, it was just the timeline switch over streamin
Re: [HACKERS] Assertion failure at standby promotion
On 05.05.2013 12:13, Amit Langote wrote: Hello, I tried reproducing the scenario. Note that I did not archive xlogs (that is, archive_command = '/bin/true' and corresponding restore_command = '/bin/false'). I performed the steps you mentioned and could find following: [snip] [Standby-2]FATAL: could not receive data from WAL stream: ERROR: server switched off timeline 1 at 0/3510B14, but walsender already streamed up to 0/3512000 Is this related to the assertion failure that you have reported? Yep, same issue. If you compile with assertions disabled, you get that error instead of the assertion. They both check for the same thing, it was an oversight to have both. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assertion failure at standby promotion
On 03.05.2013 18:17, Fujii Masao wrote: Hi, I got the following assertion failure when I promoted the standby. 2013-05-04 00:12:31 JST sby1 LOG: received promote request 2013-05-04 00:12:31 JST sby1 FATAL: terminating walreceiver process due to administrator command 2013-05-04 00:12:31 JST sby1 LOG: redo done at 0/6FFE038 2013-05-04 00:12:31 JST sby1 LOG: last completed transaction was at log time 2013-05-04 00:12:25.878909+09 2013-05-04 00:12:31 JST sby1 LOG: selected new timeline ID: 2 2013-05-04 00:12:31 JST sby1 LOG: archive recovery complete 2013-05-04 00:12:31 JST sby1 LOG: checkpoint starting: TRAP: FailedAssertion("!(sentPtr<= sendTimeLineValidUpto)", File: "walsender.c", Line: 1465) 2013-05-04 00:12:31 JST sby1 LOG: autovacuum launcher started The way to reproduce this is: 1. Create one master A, one standby B, and one cascade standby C. 2. Run pgbench -i -s 10 3. Promote the standby B before pgbench -i finishes I was able to reproduce this. The assertion checks that if the system is promoted at WAL location X, we must not have already sent WAL at > X to the client. As the code stands, that assumption is wrong; the walsender will merrily stream WAL that hasn't been replayed yet, and the system can be promoted before replaying all the WAL that has been streamed to a cascading standby. The comment in GetStandbyFlushRecPtr(), which is the function that determined how far the WAL may be streamed to a cascading standby, says this: /* * We can safely send what's already been replayed. Also, if walreceiver * is streaming WAL from the same timeline, we can send anything that * it has streamed, but hasn't been replayed yet. */ There seems to be two bugs here: 1. This used to work in 9.2, because the startup process would always replay all the WAL present in pg_xlog before promoting (the WAL present in pg_xlog was streamed from master). But the refactorings in xlog.c in 9.3devel broke that, so that the startup process can promote earlier. 2. Even after fixing the logic in xlog.c, there is still a corner-case where the startup process can promote before all the WAL that has been received from walreceiver has been received. That happens if the WAL streaming is terminated at a page boundary, rather than at a record boundary. For example, we might have received WAL up to the page boundary at 0/5BFA000, but the last *complete* record that we have received ends at 0/5BF9BD8. To fix the second issue, I think two things need to happen. First, we need to suppress the check in walsender. Second, we need to teach the WAL replay to back off when that happens. At the moment, the replay in the cascading standby gets stuck, trying to fetch the next page containing rest of the partial WAL record. Instead, it should throw away the partial record it has, and resync at the end of the last replayed record. I think 9.2 has the same bug, BTW. Without support for timeline switches over streaming replication, it was just more difficult to hit. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assertion failure at standby promotion
Hello, I tried reproducing the scenario. Note that I did not archive xlogs (that is, archive_command = '/bin/true' and corresponding restore_command = '/bin/false'). I performed the steps you mentioned and could find following: * Log on standby-1: [Standby-1]LOG: database system was interrupted; last known up at 2013-05-05 14:05:08 IST [Standby-1]LOG: creating missing WAL directory "pg_xlog/archive_status" [Standby-1]LOG: entering standby mode [Standby-1]LOG: started streaming WAL from primary at 0/200 on timeline 1 [Standby-1]LOG: redo starts at 0/224 [Standby-1]LOG: consistent recovery state reached at 0/2DC [Standby-1]LOG: database system is ready to accept read only connections [Standby-1]LOG: received promote request [Standby-1]FATAL: terminating walreceiver process due to administrator command [Standby-1]LOG: invalid magic number in log segment 00010003, offset 5316608 [Standby-1]LOG: redo done at 0/350F0B8 [Standby-1]LOG: last completed transaction was at log time 2013-05-05 14:05:14.571492+05:30 [Standby-1]LOG: selected new timeline ID: 2 [Standby-1]LOG: archive recovery complete >> [Standby-1]ERROR: server switched off timeline 1 at 0/3510B14, but >> walsender already streamed up to 0/3512000 [Standby-1]LOG: database system is ready to accept connections [Standby-1]LOG: autovacuum launcher started ** Log on Standby-2: [Standby-2]LOG: database system was interrupted while in recovery at log time 2013-05-05 14:05:07 IST [Standby-2]HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. [Standby-2]LOG: creating missing WAL directory "pg_xlog/archive_status" [Standby-2]LOG: entering standby mode [Standby-2]LOG: started streaming WAL from primary at 0/200 on timeline 1 [Standby-2]LOG: redo starts at 0/224 [Standby-2]LOG: consistent recovery state reached at 0/300 [Standby-2]LOG: database system is ready to accept read only connections >> [Standby-2]FATAL: could not receive data from WAL stream: ERROR: server >> switched off timeline 1 at 0/3510B14, but walsender already streamed up to >> 0/3512000 [Standby-2]LOG: invalid magic number in log segment 00010003, offset 5316608 [Standby-2]LOG: fetching timeline history file for timeline 2 from primary server [Standby-2]LOG: started streaming WAL from primary at 0/300 on timeline 1 [Standby-2]LOG: replication terminated by primary server [Standby-2]DETAIL: End of WAL reached on timeline 1 at 0/3510B14 [Standby-2]LOG: restarted WAL streaming at 0/300 on timeline 1 [Standby-2]LOG: replication terminated by primary server [Standby-2]DETAIL: End of WAL reached on timeline 1 at 0/3510B14 [Standby-2]LOG: restarted WAL streaming at 0/300 on timeline 1 [Standby-2]LOG: replication terminated by primary server [Standby-2]DETAIL: End of WAL reached on timeline 1 at 0/3510B14 [Standby-2]LOG: restarted WAL streaming at 0/300 on timeline 1 [Standby-2]LOG: replication terminated by primary server [Standby-2]DETAIL: End of WAL reached on timeline 1 at 0/3510B14 [Standby-2]LOG: restarted WAL streaming at 0/300 on timeline 1 [Standby-2]LOG: replication terminated by primary server [Standby-2]DETAIL: End of WAL reached on timeline 1 at 0/3510B14 [Standby-2]LOG: restarted WAL streaming at 0/300 on timeline 1 [Standby-2]LOG: replication terminated by primary server [Standby-2]DETAIL: End of WAL reached on timeline 1 at 0/3510B14 [Standby-2]LOG: restarted WAL streaming at 0/300 on timeline 1 [Standby-2]LOG: replication terminated by primary server [Standby-2]DETAIL: End of WAL reached on timeline 1 at 0/3510B14 [Standby-2]LOG: restarted WAL streaming at 0/300 on timeline 1 [Standby-2]LOG: replication terminated by primary server [Standby-2]DETAIL: End of WAL reached on timeline 1 at 0/3510B14 ... ... ... ** Also, in the ps output, following is the state of wal sender (standby-1) and wal receiver (standby-2) amit 8084 5675 0 14:13 ?00:00:00 postgres: wal receiver process restarting at 0/300 amit 8085 5648 0 14:13 ?00:00:00 postgres: wal sender process amit [local] idle Is this related to the assertion failure that you have reported? -- Amit Langote -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Assertion failure at standby promotion
Hi, I got the following assertion failure when I promoted the standby. 2013-05-04 00:12:31 JST sby1 LOG: received promote request 2013-05-04 00:12:31 JST sby1 FATAL: terminating walreceiver process due to administrator command 2013-05-04 00:12:31 JST sby1 LOG: redo done at 0/6FFE038 2013-05-04 00:12:31 JST sby1 LOG: last completed transaction was at log time 2013-05-04 00:12:25.878909+09 2013-05-04 00:12:31 JST sby1 LOG: selected new timeline ID: 2 2013-05-04 00:12:31 JST sby1 LOG: archive recovery complete 2013-05-04 00:12:31 JST sby1 LOG: checkpoint starting: TRAP: FailedAssertion("!(sentPtr <= sendTimeLineValidUpto)", File: "walsender.c", Line: 1465) 2013-05-04 00:12:31 JST sby1 LOG: autovacuum launcher started The way to reproduce this is: 1. Create one master A, one standby B, and one cascade standby C. 2. Run pgbench -i -s 10 3. Promote the standby B before pgbench -i finishes Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers