Re: [HACKERS] Assertion failure when promoting node by deleting recovery.conf and restart node

2013-05-19 Thread Simon Riggs
On 25 March 2013 19:14, Heikki Linnakangas  wrote:
> On 15.03.2013 04:25, Michael Paquier wrote:
>>
>> Hi,
>>
>> When trying to *promote* a slave as master by removing recovery.conf and
>> restarting node, I found an assertion failure on master branch:
>> LOG:  database system was shut down in recovery at 2013-03-15 10:22:27 JST
>> TRAP: FailedAssertion("!(ControlFile->minRecoveryPointTLI != 1)", File:
>> "xlog.c", Line: 4954)
>> (gdb) bt
>> #0  0x7f95af03b2c5 in raise () from /usr/lib/libc.so.6
>> #1  0x7f95af03c748 in abort () from /usr/lib/libc.so.6
>> #2  0x0086ce71 in ExceptionalCondition (conditionName=0x8f2af0
>> "!(ControlFile->minRecoveryPointTLI != 1)", errorType=0x8f0813
>> "FailedAssertion", fileName=0x8f076b "xlog.c",
>>  lineNumber=4954) at assert.c:54
>> #3  0x004fe499 in StartupXLOG () at xlog.c:4954
>> #4  0x006f9d34 in StartupProcessMain () at startup.c:224
>> #5  0x0050ef92 in AuxiliaryProcessMain (argc=2,
>> argv=0x7fffa6fc3d20) at bootstrap.c:423
>> #6  0x006f8816 in StartChildProcess (type=StartupProcess) at
>> postmaster.c:4956
>> #7  0x006f39e9 in PostmasterMain (argc=6, argv=0x1c950a0) at
>> postmaster.c:1237
>> #8  0x0065d59b in main (argc=6, argv=0x1c950a0) at main.c:197
>> Ok, this is not the cleanest way to promote a node as it doesn't do any
>> safety checks relation at promotion but 9.2 and previous versions allowed
>> to do that properly.
>>
>> The assertion has been introduced by commit 3f0ab05 in order to record
>> properly minRecoveryPointTLI in control file at the end of recovery in the
>> case of a crash.
>> However, in the case of a slave node properly shutdown in recovery which
>> is
>> then restarted as a master, the code path of this assertion is taken.
>> What do you think of the patch attached? It avoids the update of
>> recoveryTargetTLI and recoveryTargetIsLatest if the node has been shutdown
>> while in recovery.
>> Another possibility could be to add in the assertion some conditions based
>> on the state of controlFile but I think it is more consistent simply not
>> to
>> update those fields.
>
>
> Simon, can you comment on this? ISTM we could just remove the assertion and
> update the comment to mention that this can happen. If there is a min
> recovery point, surely we always need to recover to the timeline containing
> that point, so setting recoveryTargetTLI to minRecoveryPointTLI seems
> sensible.

Fixed using the latest TLI available and removing the assertion.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Assertion failure when promoting node by deleting recovery.conf and restart node

2013-03-26 Thread Simon Riggs
On 25 March 2013 19:14, Heikki Linnakangas  wrote:

> Simon, can you comment on this?

Yes, will do.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Assertion failure when promoting node by deleting recovery.conf and restart node

2013-03-25 Thread Heikki Linnakangas

On 15.03.2013 04:25, Michael Paquier wrote:

Hi,

When trying to *promote* a slave as master by removing recovery.conf and
restarting node, I found an assertion failure on master branch:
LOG:  database system was shut down in recovery at 2013-03-15 10:22:27 JST
TRAP: FailedAssertion("!(ControlFile->minRecoveryPointTLI != 1)", File:
"xlog.c", Line: 4954)
(gdb) bt
#0  0x7f95af03b2c5 in raise () from /usr/lib/libc.so.6
#1  0x7f95af03c748 in abort () from /usr/lib/libc.so.6
#2  0x0086ce71 in ExceptionalCondition (conditionName=0x8f2af0
"!(ControlFile->minRecoveryPointTLI != 1)", errorType=0x8f0813
"FailedAssertion", fileName=0x8f076b "xlog.c",
 lineNumber=4954) at assert.c:54
#3  0x004fe499 in StartupXLOG () at xlog.c:4954
#4  0x006f9d34 in StartupProcessMain () at startup.c:224
#5  0x0050ef92 in AuxiliaryProcessMain (argc=2,
argv=0x7fffa6fc3d20) at bootstrap.c:423
#6  0x006f8816 in StartChildProcess (type=StartupProcess) at
postmaster.c:4956
#7  0x006f39e9 in PostmasterMain (argc=6, argv=0x1c950a0) at
postmaster.c:1237
#8  0x0065d59b in main (argc=6, argv=0x1c950a0) at main.c:197
Ok, this is not the cleanest way to promote a node as it doesn't do any
safety checks relation at promotion but 9.2 and previous versions allowed
to do that properly.

The assertion has been introduced by commit 3f0ab05 in order to record
properly minRecoveryPointTLI in control file at the end of recovery in the
case of a crash.
However, in the case of a slave node properly shutdown in recovery which is
then restarted as a master, the code path of this assertion is taken.
What do you think of the patch attached? It avoids the update of
recoveryTargetTLI and recoveryTargetIsLatest if the node has been shutdown
while in recovery.
Another possibility could be to add in the assertion some conditions based
on the state of controlFile but I think it is more consistent simply not to
update those fields.


Simon, can you comment on this? ISTM we could just remove the assertion 
and update the comment to mention that this can happen. If there is a 
min recovery point, surely we always need to recover to the timeline 
containing that point, so setting recoveryTargetTLI to 
minRecoveryPointTLI seems sensible.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Assertion failure when promoting node by deleting recovery.conf and restart node

2013-03-14 Thread Michael Paquier
Hi,

When trying to *promote* a slave as master by removing recovery.conf and
restarting node, I found an assertion failure on master branch:
LOG:  database system was shut down in recovery at 2013-03-15 10:22:27 JST
TRAP: FailedAssertion("!(ControlFile->minRecoveryPointTLI != 1)", File:
"xlog.c", Line: 4954)
(gdb) bt
#0  0x7f95af03b2c5 in raise () from /usr/lib/libc.so.6
#1  0x7f95af03c748 in abort () from /usr/lib/libc.so.6
#2  0x0086ce71 in ExceptionalCondition (conditionName=0x8f2af0
"!(ControlFile->minRecoveryPointTLI != 1)", errorType=0x8f0813
"FailedAssertion", fileName=0x8f076b "xlog.c",
lineNumber=4954) at assert.c:54
#3  0x004fe499 in StartupXLOG () at xlog.c:4954
#4  0x006f9d34 in StartupProcessMain () at startup.c:224
#5  0x0050ef92 in AuxiliaryProcessMain (argc=2,
argv=0x7fffa6fc3d20) at bootstrap.c:423
#6  0x006f8816 in StartChildProcess (type=StartupProcess) at
postmaster.c:4956
#7  0x006f39e9 in PostmasterMain (argc=6, argv=0x1c950a0) at
postmaster.c:1237
#8  0x0065d59b in main (argc=6, argv=0x1c950a0) at main.c:197
Ok, this is not the cleanest way to promote a node as it doesn't do any
safety checks relation at promotion but 9.2 and previous versions allowed
to do that properly.

The assertion has been introduced by commit 3f0ab05 in order to record
properly minRecoveryPointTLI in control file at the end of recovery in the
case of a crash.
However, in the case of a slave node properly shutdown in recovery which is
then restarted as a master, the code path of this assertion is taken.
What do you think of the patch attached? It avoids the update of
recoveryTargetTLI and recoveryTargetIsLatest if the node has been shutdown
while in recovery.
Another possibility could be to add in the assertion some conditions based
on the state of controlFile but I think it is more consistent simply not to
update those fields.

Regards,
-- 
Michael


20130315_crash_tli.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers