subject:"\[HACKERS\] Assertion failure at standby promotion"

Re: [HACKERS] Assertion failure at standby promotion

2013-05-07 Thread Heikki Linnakangas


On 06.05.2013 13:08, Heikki Linnakangas wrote:

On 03.05.2013 18:17, Fujii Masao wrote:

Hi,

I got the following assertion failure when I promoted the standby.

2013-05-04 00:12:31 JST sby1 LOG: received promote request
2013-05-04 00:12:31 JST sby1 FATAL: terminating walreceiver process
due to administrator command
2013-05-04 00:12:31 JST sby1 LOG: redo done at 0/6FFE038
2013-05-04 00:12:31 JST sby1 LOG: last completed transaction was at
log time 2013-05-04 00:12:25.878909+09
2013-05-04 00:12:31 JST sby1 LOG: selected new timeline ID: 2
2013-05-04 00:12:31 JST sby1 LOG: archive recovery complete
2013-05-04 00:12:31 JST sby1 LOG: checkpoint starting:
TRAP: FailedAssertion("!(sentPtr<= sendTimeLineValidUpto)", File:
"walsender.c", Line: 1465)
2013-05-04 00:12:31 JST sby1 LOG: autovacuum launcher started

The way to reproduce this is:

1. Create one master A, one standby B, and one cascade standby C.
2. Run pgbench -i -s 10
3. Promote the standby B before pgbench -i finishes


I was able to reproduce this. The assertion checks that if the system is
promoted at WAL location X, we must not have already sent WAL at > X to
the client. As the code stands, that assumption is wrong; the walsender
will merrily stream WAL that hasn't been replayed yet, and the system
can be promoted before replaying all the WAL that has been streamed to a
cascading standby. The comment in GetStandbyFlushRecPtr(), which is the
function that determined how far the WAL may be streamed to a cascading
standby, says this:


/*
* We can safely send what's already been replayed. Also, if walreceiver
* is streaming WAL from the same timeline, we can send anything that
* it has streamed, but hasn't been replayed yet.
*/


There seems to be two bugs here:

1. This used to work in 9.2, because the startup process would always
replay all the WAL present in pg_xlog before promoting (the WAL present
in pg_xlog was streamed from master). But the refactorings in xlog.c in
9.3devel broke that, so that the startup process can promote earlier.

2. Even after fixing the logic in xlog.c, there is still a corner-case
where the startup process can promote before all the WAL that has been
received from walreceiver has been received. That happens if the WAL
streaming is terminated at a page boundary, rather than at a record
boundary. For example, we might have received WAL up to the page
boundary at 0/5BFA000, but the last *complete* record that we have
received ends at 0/5BF9BD8.

To fix the second issue, I think two things need to happen. First, we
need to suppress the check in walsender. Second, we need to teach the
WAL replay to back off when that happens. At the moment, the replay in
the cascading standby gets stuck, trying to fetch the next page
containing rest of the partial WAL record. Instead, it should throw away
the partial record it has, and resync at the end of the last replayed
record.


I came up with the attached patch for this 
(fix-standby-promotion-assert-fail-2.patch). You will also need to apply 
fast-promotion-quick-fix.patch to work around the bug in fast promotion 
I reported here: 
http://www.postgresql.org/message-id/5188cffa.3020...@vmware.com.


pg_receivexlog has a variant of the same bug. If the server has sent WAL 
up to segment boundary, and the segment boundary splits a WAL record in 
half, and the server is then promoted so that the promotion checkpoint 
(or end-of-recovery) record goes to the previous segment, pg_receivexlog 
will not fetch the segment containing the checkpoint record correctly. 
When following a timeline switch, it should restart streaming from the 
new timeline at the point where the timeline switch occurred, rather 
than at the point on the old timeline where it stopped. Usually it's the 
same thing, but not if the streaming was paused at a page or segment 
boundary.


To fix this for pg_receivexlog, I added the start position of the next 
timeline to the result set that the server sends at end of WAL 
streaming. Previously, it only sent the TLI of the next timeline, and it 
was assumed that the starting point is the same as the point where 
streaming was stopped.


It's a bit late to change it, but I have to say I don't like this whole 
business of relaying WAL to a cascading standby that hasn't been 
replayed. It would be a lot simpler if you could assume that whatever 
gets streamed downstream has already been replayed upstream. I 
understand that it's nice from a performance point of view, and because 
a cascading standby isn't then lagged behind if the replay gets stalled 
in the upstream server because of a hot standby conflict. But still..


I'm going to sleep over this and continue testing tomorrow. Please have 
a look.



I think 9.2 has the same bug, BTW. Without support for timeline
switches over streaming replication, it was just more difficult to hit.


Looking closer, I believe 9.2 is OK. Recovery loops back to retry the 
whole record correctly, it was just the timeline switch over streamin

Re: [HACKERS] Assertion failure at standby promotion

2013-05-06 Thread Heikki Linnakangas


On 05.05.2013 12:13, Amit Langote wrote:

Hello,

I  tried reproducing the scenario. Note that I did not archive xlogs
(that is, archive_command = '/bin/true' and corresponding
restore_command = '/bin/false'). I performed the steps you mentioned
and could find following:
[snip]

[Standby-2]FATAL:  could not receive data from WAL stream: ERROR:  server 
switched off timeline 1 at 0/3510B14, but walsender already streamed up to 
0/3512000



Is this related to the assertion failure that you have reported?


Yep, same issue. If you compile with assertions disabled, you get that 
error instead of the assertion. They both check for the same thing, it 
was an oversight to have both.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure at standby promotion

2013-05-06 Thread Heikki Linnakangas


On 03.05.2013 18:17, Fujii Masao wrote:

Hi,

I got the following assertion failure when I promoted the standby.

2013-05-04 00:12:31 JST sby1 LOG:  received promote request
2013-05-04 00:12:31 JST sby1 FATAL:  terminating walreceiver process
due to administrator command
2013-05-04 00:12:31 JST sby1 LOG:  redo done at 0/6FFE038
2013-05-04 00:12:31 JST sby1 LOG:  last completed transaction was at
log time 2013-05-04 00:12:25.878909+09
2013-05-04 00:12:31 JST sby1 LOG:  selected new timeline ID: 2
2013-05-04 00:12:31 JST sby1 LOG:  archive recovery complete
2013-05-04 00:12:31 JST sby1 LOG:  checkpoint starting:
TRAP: FailedAssertion("!(sentPtr<= sendTimeLineValidUpto)", File:
"walsender.c", Line: 1465)
2013-05-04 00:12:31 JST sby1 LOG:  autovacuum launcher started

The way to reproduce this is:

1. Create one master A, one standby B, and one cascade standby C.
2. Run pgbench -i -s 10
3. Promote the standby B before pgbench -i finishes


I was able to reproduce this. The assertion checks that if the system is 
promoted at WAL location X, we must not have already sent WAL at > X to 
the client. As the code stands, that assumption is wrong; the walsender 
will merrily stream WAL that hasn't been replayed yet, and the system 
can be promoted before replaying all the WAL that has been streamed to a 
cascading standby. The comment in GetStandbyFlushRecPtr(), which is the 
function that determined how far the WAL may be streamed to a cascading 
standby, says this:



/*
 * We can safely send what's already been replayed. Also, if walreceiver
 * is streaming WAL from the same timeline, we can send anything that
 * it has streamed, but hasn't been replayed yet.
 */


There seems to be two bugs here:

1. This used to work in 9.2, because the startup process would always 
replay all the WAL present in pg_xlog before promoting (the WAL present 
in pg_xlog was streamed from master). But the refactorings in xlog.c in 
9.3devel broke that, so that the startup process can promote earlier.


2. Even after fixing the logic in xlog.c, there is still a corner-case 
where the startup process can promote before all the WAL that has been 
received from walreceiver has been received. That happens if the WAL 
streaming is terminated at a page boundary, rather than at a record 
boundary. For example, we might have received WAL up to the page 
boundary at 0/5BFA000, but the last *complete* record that we have 
received ends at 0/5BF9BD8.


To fix the second issue, I think two things need to happen. First, we 
need to suppress the check in walsender. Second, we need to teach the 
WAL replay to back off when that happens. At the moment, the replay in 
the cascading standby gets stuck, trying to fetch the next page 
containing rest of the partial WAL record. Instead, it should throw away 
the partial record it has, and resync at the end of the last replayed 
record. I think 9.2 has the same bug, BTW. Without support for timeline 
switches over streaming replication, it was just more difficult to hit.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure at standby promotion

2013-05-05 Thread Amit Langote

Hello,

I  tried reproducing the scenario. Note that I did not archive xlogs
(that is, archive_command = '/bin/true' and corresponding
restore_command = '/bin/false'). I performed the steps you mentioned
and could find following:

* Log on standby-1:
[Standby-1]LOG:  database system was interrupted; last known up at
2013-05-05 14:05:08 IST
[Standby-1]LOG:  creating missing WAL directory "pg_xlog/archive_status"
[Standby-1]LOG:  entering standby mode
[Standby-1]LOG:  started streaming WAL from primary at 0/200 on timeline 1
[Standby-1]LOG:  redo starts at 0/224
[Standby-1]LOG:  consistent recovery state reached at 0/2DC
[Standby-1]LOG:  database system is ready to accept read only connections
[Standby-1]LOG:  received promote request
[Standby-1]FATAL:  terminating walreceiver process due to administrator command
[Standby-1]LOG:  invalid magic number  in log segment
00010003, offset 5316608
[Standby-1]LOG:  redo done at 0/350F0B8
[Standby-1]LOG:  last completed transaction was at log time 2013-05-05
14:05:14.571492+05:30
[Standby-1]LOG:  selected new timeline ID: 2
[Standby-1]LOG:  archive recovery complete
>> [Standby-1]ERROR:  server switched off timeline 1 at 0/3510B14, but 
>> walsender already streamed up to 0/3512000
[Standby-1]LOG:  database system is ready to accept connections
[Standby-1]LOG:  autovacuum launcher started


** Log on Standby-2:
[Standby-2]LOG:  database system was interrupted while in recovery at
log time 2013-05-05 14:05:07 IST
[Standby-2]HINT:  If this has occurred more than once some data might
be corrupted and you might need to choose an earlier recovery target.
[Standby-2]LOG:  creating missing WAL directory "pg_xlog/archive_status"
[Standby-2]LOG:  entering standby mode
[Standby-2]LOG:  started streaming WAL from primary at 0/200 on timeline 1
[Standby-2]LOG:  redo starts at 0/224
[Standby-2]LOG:  consistent recovery state reached at 0/300
[Standby-2]LOG:  database system is ready to accept read only connections
>> [Standby-2]FATAL:  could not receive data from WAL stream: ERROR:  server 
>> switched off timeline 1 at 0/3510B14, but walsender already streamed up to 
>> 0/3512000

[Standby-2]LOG:  invalid magic number  in log segment
00010003, offset 5316608
[Standby-2]LOG:  fetching timeline history file for timeline 2 from
primary server
[Standby-2]LOG:  started streaming WAL from primary at 0/300 on timeline 1
[Standby-2]LOG:  replication terminated by primary server
[Standby-2]DETAIL:  End of WAL reached on timeline 1 at 0/3510B14
[Standby-2]LOG:  restarted WAL streaming at 0/300 on timeline 1
[Standby-2]LOG:  replication terminated by primary server
[Standby-2]DETAIL:  End of WAL reached on timeline 1 at 0/3510B14
[Standby-2]LOG:  restarted WAL streaming at 0/300 on timeline 1
[Standby-2]LOG:  replication terminated by primary server
[Standby-2]DETAIL:  End of WAL reached on timeline 1 at 0/3510B14
[Standby-2]LOG:  restarted WAL streaming at 0/300 on timeline 1
[Standby-2]LOG:  replication terminated by primary server
[Standby-2]DETAIL:  End of WAL reached on timeline 1 at 0/3510B14
[Standby-2]LOG:  restarted WAL streaming at 0/300 on timeline 1
[Standby-2]LOG:  replication terminated by primary server
[Standby-2]DETAIL:  End of WAL reached on timeline 1 at 0/3510B14
[Standby-2]LOG:  restarted WAL streaming at 0/300 on timeline 1
[Standby-2]LOG:  replication terminated by primary server
[Standby-2]DETAIL:  End of WAL reached on timeline 1 at 0/3510B14
[Standby-2]LOG:  restarted WAL streaming at 0/300 on timeline 1
[Standby-2]LOG:  replication terminated by primary server
[Standby-2]DETAIL:  End of WAL reached on timeline 1 at 0/3510B14
[Standby-2]LOG:  restarted WAL streaming at 0/300 on timeline 1
[Standby-2]LOG:  replication terminated by primary server
[Standby-2]DETAIL:  End of WAL reached on timeline 1 at 0/3510B14
...
...
...

** Also, in the ps output, following is the state of wal sender
(standby-1) and wal receiver (standby-2)

amit  8084  5675  0 14:13 ?00:00:00 postgres: wal receiver
process   restarting at 0/300
amit  8085  5648  0 14:13 ?00:00:00 postgres: wal sender
process amit [local] idle


Is this related to the assertion failure that you have reported?

--

Amit Langote


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Assertion failure at standby promotion

2013-05-03 Thread Fujii Masao

Hi,

I got the following assertion failure when I promoted the standby.

2013-05-04 00:12:31 JST sby1 LOG:  received promote request
2013-05-04 00:12:31 JST sby1 FATAL:  terminating walreceiver process
due to administrator command
2013-05-04 00:12:31 JST sby1 LOG:  redo done at 0/6FFE038
2013-05-04 00:12:31 JST sby1 LOG:  last completed transaction was at
log time 2013-05-04 00:12:25.878909+09
2013-05-04 00:12:31 JST sby1 LOG:  selected new timeline ID: 2
2013-05-04 00:12:31 JST sby1 LOG:  archive recovery complete
2013-05-04 00:12:31 JST sby1 LOG:  checkpoint starting:
TRAP: FailedAssertion("!(sentPtr <= sendTimeLineValidUpto)", File:
"walsender.c", Line: 1465)
2013-05-04 00:12:31 JST sby1 LOG:  autovacuum launcher started

The way to reproduce this is:

1. Create one master A, one standby B, and one cascade standby C.
2. Run pgbench -i -s 10
3. Promote the standby B before pgbench -i finishes

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure at standby promotion

Re: [HACKERS] Assertion failure at standby promotion

Re: [HACKERS] Assertion failure at standby promotion

Re: [HACKERS] Assertion failure at standby promotion

[HACKERS] Assertion failure at standby promotion

5 matches

Site Navigation

Mail list logo

Footer information