Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-10-02 Thread Michael Paquier
On Mon, Oct 2, 2017 at 8:13 AM, Daniel Gustafsson  wrote:
> I’ve moved this to the next CF, but since this no longer applies cleanly I’ve
> reset it to Waiting for author.

Thanks Daniel for the reminder. Attached are rebased patches. This
thing rots easily...
-- 
Michael


0001-Change-detection-of-corrupted-2PC-files-as-FATAL.patch
Description: Binary data


0002-Minimize-window-between-history-file-and-end-of-reco.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-10-01 Thread Daniel Gustafsson
> On 02 Sep 2017, at 14:17, Michael Paquier  wrote:
> 
> On Fri, Sep 1, 2017 at 7:17 PM, Thomas Munro
>  wrote:
>> On Sat, Aug 26, 2017 at 7:32 AM, Robert Haas  wrote:
>>> On Sat, Apr 8, 2017 at 10:05 AM, David Steele  wrote:
 On 3/28/17 1:21 AM, Tsunakawa, Takayuki wrote:
> Thanks, marked as ready for committer, as the code is good and Alexander 
> reported the test success.
 
 This bug has been moved to CF 2017-07.
>>> 
>>> This bug fix has been pending in "Ready for Committer" state for about
>>> 4.5 months.  Three committers (Magnus, Heikki, Tom) have contributed
>>> to the thread to date.  Maybe one of them would like to commit this?
>> 
>> In the meantime its bits have begun to rot.  Michael, could you please 
>> rebase?
> 
> Thanks for the reminder, Thomas. The 2PC code triggered during
> recovery has been largely reworked in PG10, explaining the conflicts.
> As this has been some time since I touched this patch, I had again a
> look at its logic and could not find any problems around it. So
> attached are a rebased versions for both 0001 and 0002, I have updated
> the messages as well to be more in-line with what is in HEAD for
> corrupted entries.

I’ve moved this to the next CF, but since this no longer applies cleanly I’ve
reset it to Waiting for author.

cheers ./daniel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-09-02 Thread Michael Paquier
On Fri, Sep 1, 2017 at 7:17 PM, Thomas Munro
 wrote:
> On Sat, Aug 26, 2017 at 7:32 AM, Robert Haas  wrote:
>> On Sat, Apr 8, 2017 at 10:05 AM, David Steele  wrote:
>>> On 3/28/17 1:21 AM, Tsunakawa, Takayuki wrote:
 Thanks, marked as ready for committer, as the code is good and Alexander 
 reported the test success.
>>>
>>> This bug has been moved to CF 2017-07.
>>
>> This bug fix has been pending in "Ready for Committer" state for about
>> 4.5 months.  Three committers (Magnus, Heikki, Tom) have contributed
>> to the thread to date.  Maybe one of them would like to commit this?
>
> In the meantime its bits have begun to rot.  Michael, could you please rebase?

Thanks for the reminder, Thomas. The 2PC code triggered during
recovery has been largely reworked in PG10, explaining the conflicts.
As this has been some time since I touched this patch, I had again a
look at its logic and could not find any problems around it. So
attached are a rebased versions for both 0001 and 0002, I have updated
the messages as well to be more in-line with what is in HEAD for
corrupted entries.
-- 
Michael
From 2f3a16c0c0cb12f8bfef2d58656352c46a681393 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Sat, 2 Sep 2017 21:15:04 +0900
Subject: [PATCH 2/2] Minimize window between history file and end-of-recovery
 record

Once a standby node is promoted, this makes the assignment of the new
timeline number booked earlier as the history file gets archived immediately.
This way the other nodes are aware that this new timeline number is taken
and should not be assigned to other nodes.

The window between which the history file is archived and the end-of-recovery
record is written cannot be zeroed, but this way it is minimized as much as
possible. The new order of actions prevents as well a corrupted data directory
on failure.
---
 src/backend/access/transam/xlog.c | 27 ++-
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 80fe2c60e6..abc0cec3d2 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7449,6 +7449,24 @@ StartupXLOG(void)
 		else
 			snprintf(reason, sizeof(reason), "no recovery target specified");
 
+		/*
+		 * We are now done reading the old WAL.  Turn off archive fetching
+		 * if it was active, and make a writable copy of the last WAL segment.
+		 * (Note that we also have a copy of the last block of the old WAL
+		 * in readBuf; we will use that below.)
+		 */
+		exitArchiveRecovery(EndOfLogTLI, EndOfLog);
+
+		/*
+		 * Write the timeline history file, and have it archived. After this
+		 * point (or rather, as soon as the file is archived), the timeline
+		 * will appear as "taken" in the WAL archive and to any standby servers.
+		 * If we crash before actually switching to the new timeline, standby
+		 * servers will nevertheless think that we switched to the new timeline,
+		 * and will try to connect to the new timeline. To minimize the window
+		 * for that, try to do as little as possible between here and writing the
+		 * end-of-recovery record.
+		 */
 		writeTimeLineHistory(ThisTimeLineID, recoveryTargetTLI,
 			 EndRecPtr, reason);
 	}
@@ -7457,15 +7475,6 @@ StartupXLOG(void)
 	XLogCtl->ThisTimeLineID = ThisTimeLineID;
 	XLogCtl->PrevTimeLineID = PrevTimeLineID;
 
-	/*
-	 * We are now done reading the old WAL.  Turn off archive fetching if it
-	 * was active, and make a writable copy of the last WAL segment. (Note
-	 * that we also have a copy of the last block of the old WAL in readBuf;
-	 * we will use that below.)
-	 */
-	if (ArchiveRecoveryRequested)
-		exitArchiveRecovery(EndOfLogTLI, EndOfLog);
-
 	/*
 	 * Prepare to write WAL starting at EndOfLog location, and init xlog
 	 * buffer cache using the block containing the last record from the
-- 
2.14.1

From 12ab1765f7ea032db217f2081f9561c00d1d96df Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Sat, 2 Sep 2017 21:07:21 +0900
Subject: [PATCH 1/2] Change detection of corrupted 2PC files as FATAL

When scanning 2PC files, be it for initializing a hot standby node or
finding the XID range at the end of recovery, bumping on a corrupted
2PC file is reported as a WARNING and are blindly corrupted. If it
happened that the corrupted file was actually legit, there is actually
a risk of corruption, so switch that to a hard failure.
---
 src/backend/access/transam/twophase.c | 27 +++
 src/backend/access/transam/xlog.c | 10 +++---
 2 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index ba03d9687e..4b4f2449f8 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1778,6 +1778,10 @@ restoreTwoPhaseData(void)
  * 

Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-09-01 Thread Thomas Munro
On Sat, Aug 26, 2017 at 7:32 AM, Robert Haas  wrote:
> On Sat, Apr 8, 2017 at 10:05 AM, David Steele  wrote:
>> On 3/28/17 1:21 AM, Tsunakawa, Takayuki wrote:
>>> Thanks, marked as ready for committer, as the code is good and Alexander 
>>> reported the test success.
>>
>> This bug has been moved to CF 2017-07.
>
> This bug fix has been pending in "Ready for Committer" state for about
> 4.5 months.  Three committers (Magnus, Heikki, Tom) have contributed
> to the thread to date.  Maybe one of them would like to commit this?

In the meantime its bits have begun to rot.  Michael, could you please rebase?

Thanks!

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-08-25 Thread Robert Haas
On Sat, Apr 8, 2017 at 10:05 AM, David Steele  wrote:
> On 3/28/17 1:21 AM, Tsunakawa, Takayuki wrote:
>> From: Michael Paquier [mailto:michael.paqu...@gmail.com]
>>> Okay. I got the message, and I agree with what you say here. You are right
>>> by the way, the error messages just use "two-phase file" and not "two-phase
>>> STATE file", missed that previously.
>>> --
>> Thanks, marked as ready for committer, as the code is good and Alexander 
>> reported the test success.
>
> This bug has been moved to CF 2017-07.

This bug fix has been pending in "Ready for Committer" state for about
4.5 months.  Three committers (Magnus, Heikki, Tom) have contributed
to the thread to date.  Maybe one of them would like to commit this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-04-08 Thread David Steele
On 3/28/17 1:21 AM, Tsunakawa, Takayuki wrote:
> From: Michael Paquier [mailto:michael.paqu...@gmail.com]
>> Okay. I got the message, and I agree with what you say here. You are right
>> by the way, the error messages just use "two-phase file" and not "two-phase
>> STATE file", missed that previously.
>> --
> Thanks, marked as ready for committer, as the code is good and Alexander 
> reported the test success.

This bug has been moved to CF 2017-07.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-03-27 Thread Tsunakawa, Takayuki
From: Michael Paquier [mailto:michael.paqu...@gmail.com]
> Okay. I got the message, and I agree with what you say here. You are right
> by the way, the error messages just use "two-phase file" and not "two-phase
> STATE file", missed that previously.
> --
Thanks, marked as ready for committer, as the code is good and Alexander 
reported the test success.

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-03-27 Thread Michael Paquier
On Tue, Mar 28, 2017 at 1:33 PM, Tsunakawa, Takayuki
 wrote:
> I get the impression that DATA_CORRUPTED means the table data is corrupted, 
> because there's an error code named INDEX_CORRUPTED.

I have interpreted that as the other way around, aka DATA_CORRUPTED
could be used as well to 2PC files :)
But grepping around it seems that you are grabbing the meaning better
than I do, ERRCODE_DATA_CORRUPTED is only used now for relation pages
or large pages.

>  Anyway, I don't think this patch needs to attach an error code because:
> * Currently, other interlal files (not tables or indexes) seem to use 
> INTERNAL_ERROR (XX000).  For example, see ReadControlFile() in xlog.c and 
> pgstat_read_statsfiles() in pgstat.c.
> * It doesn't seem that the user needs distinction.  I don't object to 
> providing a specific error code for this case, but if the patch needs a 
> specific error code to be committed, I'd like to know how that's useful (e.g. 
> how it affects the recovery action the user takes.)
> So, I'd like to mark the patch as ready for committer when ereport() and 
> errmsg() are on separate lines and the message changes to "two-phase state 
> file."

Okay. I got the message, and I agree with what you say here. You are
right by the way, the error messages just use "two-phase file" and not
"two-phase STATE file", missed that previously.
-- 
Michael


0001-Change-detection-of-corrupted-2PC-files-as-FATAL.patch
Description: Binary data


0002-Minimize-window-between-history-file-and-end-of-reco.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-03-27 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier
> While I agree with that in general, we are taking about 2PC files that are
> on disk in patch 0001, so I would think that in this case
> ERRCODE_DATA_CORRUPTED is the most adapted (or
> ERRCODE_TWOPHASE_CORRUPTED?).
> 
> The other WARNING messages refer to stale files of already committed
> transactions, which are not actually corrupted. What about in this case
> having a ERRCODE_TWOPHASE_INVALID?
> 
> Updated patches are attached, I did not change the WARNING portion though
> as I am not sure what's the consensus on the matter.
> 

I get the impression that DATA_CORRUPTED means the table data is corrupted, 
because there's an error code named INDEX_CORRUPTED.  Anyway, I don't think 
this patch needs to attach an error code because:

* Currently, other interlal files (not tables or indexes) seem to use 
INTERNAL_ERROR (XX000).  For example, see ReadControlFile() in xlog.c and 
pgstat_read_statsfiles() in pgstat.c.

* It doesn't seem that the user needs distinction.  I don't object to providing 
a specific error code for this case, but if the patch needs a specific error 
code to be committed, I'd like to know how that's useful (e.g. how it affects 
the recovery action the user takes.)


So, I'd like to mark the patch as ready for committer when ereport() and 
errmsg() are on separate lines and the message changes to "two-phase state 
file."

Regards
Takayuki Tsunakawa





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-03-27 Thread Michael Paquier
On Mon, Mar 27, 2017 at 11:20 PM, Tom Lane  wrote:
> Alexander Korotkov  writes:
>> On Mon, Mar 27, 2017 at 4:48 PM, Tom Lane  wrote:
>>> Actually, the *real* problem with that coding is it lacks a SQLSTATE
>>> (errcode call).  The only places where it's acceptable to leave that
>>> out are for internal "can't happen" cases, which this surely isn't.
>
>> Surrounding code also has ereports lacking SQLSTATE.  And that isn't "can't
>> happen" case as well.
>> Should we consider fixing them?
>
> Yup.  Just remember that the default is
> XX000EERRCODE_INTERNAL_ERROR  internal_error
>
> If that's not how you want the error case reported, you need an errcode()
> call.
>
> We might need more ERRCODEs than are there now, if none of the existing
> ones seem to fit these cases.  There's already ERRCODE_DATA_CORRUPTED
> and ERRCODE_INDEX_CORRUPTED; maybe we need ERRCODE_WAL_CORRUPTED, for
> example?

While I agree with that in general, we are taking about 2PC files that
are on disk in patch 0001, so I would think that in this case
ERRCODE_DATA_CORRUPTED is the most adapted (or
ERRCODE_TWOPHASE_CORRUPTED?).

The other WARNING messages refer to stale files of already committed
transactions, which are not actually corrupted. What about in this
case having a ERRCODE_TWOPHASE_INVALID?

Updated patches are attached, I did not change the WARNING portion
though as I am not sure what's the consensus on the matter.
-- 
Michael


0001-Change-detection-of-corrupted-2PC-files-as-FATAL.patch
Description: Binary data


0002-Minimize-window-between-history-file-and-end-of-reco.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-03-27 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane
> "Tsunakawa, Takayuki"  writes:
> > All other places in twophase.c and most places in other files put ereport()
> and errmsg() on separate lines.  I think it would be better to align with
> surrounding code.
> 
> > +   ereport(FATAL, (errmsg("corrupted
> two-phase file \"%s\"",
> 
> Actually, the *real* problem with that coding is it lacks a SQLSTATE (errcode
> call).  The only places where it's acceptable to leave that out are for
> internal "can't happen" cases, which this surely isn't.

Oh, I overlooked it.  But...


> Yup.  Just remember that the default is
> XX000EERRCODE_INTERNAL_ERROR  internal_error
> 
> If that's not how you want the error case reported, you need an errcode()
> call.
> 
> We might need more ERRCODEs than are there now, if none of the existing
> ones seem to fit these cases.  There's already ERRCODE_DATA_CORRUPTED and
> ERRCODE_INDEX_CORRUPTED; maybe we need ERRCODE_WAL_CORRUPTED, for
> example?

I'd be always happy if the error code is more specific, but maybe that would be 
a separate patch.  WAL corruption message so far doesn't accompany a specific 
error code like this in xlog.c:

/*
 * We only end up here without a message when 
XLogPageRead()
 * failed - in that case we already logged something. In
 * StandbyMode that only happens if we have been 
triggered, so we
 * shouldn't loop anymore in that case.
 */
if (errormsg)
ereport(emode_for_corrupt_record(emode,

 RecPtr ? RecPtr : EndRecPtr),
(errmsg_internal("%s", errormsg) /* already 
translated */ ));

Regards
Takayuki Tsunakawa




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-03-27 Thread Alexander Korotkov
On Mon, Mar 27, 2017 at 11:26 AM, Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:

> From: pgsql-hackers-ow...@postgresql.org
> > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier
> > Moved to CF 2017-03. Both patches still apply.
>
> Sorry to be late for reviewing this, but done now.  The patch applied,
> make check passed, and the code looks almost good.  I could successfully
> perform a simple archive recovery.  Finally, I broke the 2pc state file
> while the server is down, and could confirm that the server failed to start
> as expected, emitting a FATAL message.  Worked nicely.
>

I've tested moving recovery.conf away case which was originally reported by
Magnus.

When I'm trying to promote standby when recovery.conf is moved away, I get
FATAL error.

waiting for server to promote2017-03-27 17:12:38.225 MSK [30307] LOG:
 received promote request
2017-03-27 17:12:38.225 MSK [30311] FATAL:  terminating walreceiver process
due to administrator command
2017-03-27 17:12:38.225 MSK [30307] LOG:  redo done at 0/328
2017-03-27 17:12:38.226 MSK [30307] LOG:  selected new timeline ID: 2
2017-03-27 17:12:38.253 MSK [30307] FATAL:  could not open file
"recovery.conf": No such file or directory
2017-03-27 17:12:38.253 MSK [30306] LOG:  startup process (PID 30307)
exited with exit code 1
2017-03-27 17:12:38.253 MSK [30306] LOG:  terminating any other active
server processes
2017-03-27 17:12:38.256 MSK [30306] LOG:  database system is shut down
... stopped waiting
server is still promoting

If I try to start it after crash, it becomes a master on timeline 1.  Just
like in case we deleted recovery.conf while server was shut down.

waiting for server to start2017-03-27 17:16:54.186 MSK [30413] LOG:
 listening on IPv6 address "::1", port 5430
2017-03-27 17:16:54.186 MSK [30413] LOG:  listening on IPv6 address
"fe80::1%lo0", port 5430
2017-03-27 17:16:54.186 MSK [30413] LOG:  listening on IPv4 address
"127.0.0.1", port 5430
2017-03-27 17:16:54.187 MSK [30413] LOG:  listening on Unix socket
"/tmp/.s.PGSQL.5430"
2017-03-27 17:16:54.195 MSK [30414] LOG:  database system was interrupted
while in recovery at log time 2017-03-27 17:10:23 MSK
2017-03-27 17:16:54.195 MSK [30414] HINT:  If this has occurred more than
once some data might be corrupted and you might need to choose an earlier
recovery target.
2017-03-27 17:16:54.218 MSK [30414] LOG:  database system was not properly
shut down; automatic recovery in progress
2017-03-27 17:16:54.219 MSK [30414] LOG:  redo starts at 0/260
2017-03-27 17:16:54.219 MSK [30414] LOG:  consistent recovery state reached
at 0/360
2017-03-27 17:16:54.219 MSK [30414] LOG:  invalid record length at
0/360: wanted 24, got 0
2017-03-27 17:16:54.219 MSK [30414] LOG:  redo done at 0/328
2017-03-27 17:16:54.224 MSK [30413] LOG:  database system is ready to
accept connections
 done
server started

# select pg_is_in_recovery();
 pg_is_in_recovery
───
 f
(1 row)

# select pg_walfile_name(pg_current_wal_location());
 pg_walfile_name
──
 00010003
(1 row)

If instead I put recovery back and start server, then streaming replication
continues normally.

waiting for server to start2017-03-27 17:32:16.887 MSK [30783] LOG:
 listening on IPv6 address "::1", port 5430
2017-03-27 17:32:16.887 MSK [30783] LOG:  listening on IPv6 address
"fe80::1%lo0", port 5430
2017-03-27 17:32:16.887 MSK [30783] LOG:  listening on IPv4 address
"127.0.0.1", port 5430
2017-03-27 17:32:16.888 MSK [30783] LOG:  listening on Unix socket
"/tmp/.s.PGSQL.5430"
2017-03-27 17:32:16.897 MSK [30784] LOG:  database system was interrupted
while in recovery at log time 2017-03-27 17:28:05 MSK
2017-03-27 17:32:16.897 MSK [30784] HINT:  If this has occurred more than
once some data might be corrupted and you might need to choose an earlier
recovery target.
2017-03-27 17:32:16.914 MSK [30784] LOG:  entering standby mode
2017-03-27 17:32:16.916 MSK [30784] LOG:  redo starts at 0/828
2017-03-27 17:32:16.916 MSK [30784] LOG:  consistent recovery state reached
at 0/960
2017-03-27 17:32:16.916 MSK [30784] LOG:  invalid record length at
0/960: wanted 24, got 0
2017-03-27 17:32:16.916 MSK [30783] LOG:  database system is ready to
accept read only connections
2017-03-27 17:32:16.920 MSK [30788] LOG:  started streaming WAL from
primary at 0/900 on timeline 1
 done
server started

# select pg_is_in_recovery();
 pg_is_in_recovery
───
 t
(1 row)

Thus, I think patch is working as expected in this case.

Also, I'd like to notice that it passes check-world without warnings on my
laptop.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-03-27 Thread Tom Lane
Alexander Korotkov  writes:
> On Mon, Mar 27, 2017 at 4:48 PM, Tom Lane  wrote:
>> Actually, the *real* problem with that coding is it lacks a SQLSTATE
>> (errcode call).  The only places where it's acceptable to leave that
>> out are for internal "can't happen" cases, which this surely isn't.

> Surrounding code also has ereports lacking SQLSTATE.  And that isn't "can't
> happen" case as well.
> Should we consider fixing them?

Yup.  Just remember that the default is
XX000EERRCODE_INTERNAL_ERROR  internal_error

If that's not how you want the error case reported, you need an errcode()
call.

We might need more ERRCODEs than are there now, if none of the existing
ones seem to fit these cases.  There's already ERRCODE_DATA_CORRUPTED
and ERRCODE_INDEX_CORRUPTED; maybe we need ERRCODE_WAL_CORRUPTED, for
example?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-03-27 Thread Alexander Korotkov
On Mon, Mar 27, 2017 at 4:48 PM, Tom Lane  wrote:

> "Tsunakawa, Takayuki"  writes:
> > All other places in twophase.c and most places in other files put
> ereport() and errmsg() on separate lines.  I think it would be better to
> align with surrounding code.
>
> > + ereport(FATAL, (errmsg("corrupted
> two-phase file \"%s\"",
>
> Actually, the *real* problem with that coding is it lacks a SQLSTATE
> (errcode call).  The only places where it's acceptable to leave that
> out are for internal "can't happen" cases, which this surely isn't.
>

Surrounding code also has ereports lacking SQLSTATE.  And that isn't "can't
happen" case as well.

if (ControlFile->backupEndRequired)
> ereport(FATAL,
> (errmsg("WAL ends before end of online backup"),
>  errhint("All WAL generated while online backup was taken must be
> available at recovery.")));
> else if (!XLogRecPtrIsInvalid(ControlFile->backupStartPoint))
> ereport(FATAL,
> (errmsg("WAL ends before end of online backup"),
>  errhint("Online backup started with pg_start_backup() must be ended with
> pg_stop_backup(), and all WAL up to that point must be available at
> recovery.")));
> else
> ereport(FATAL,
>   (errmsg("WAL ends before consistent recovery point")));


Should we consider fixing them?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-03-27 Thread Tom Lane
"Tsunakawa, Takayuki"  writes:
> All other places in twophase.c and most places in other files put ereport() 
> and errmsg() on separate lines.  I think it would be better to align with 
> surrounding code.

> + ereport(FATAL, (errmsg("corrupted two-phase 
> file \"%s\"",

Actually, the *real* problem with that coding is it lacks a SQLSTATE
(errcode call).  The only places where it's acceptable to leave that
out are for internal "can't happen" cases, which this surely isn't.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-03-27 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier
> Moved to CF 2017-03. Both patches still apply.

Sorry to be late for reviewing this, but done now.  The patch applied, make 
check passed, and the code looks almost good.  I could successfully perform a 
simple archive recovery.  Finally, I broke the 2pc state file while the server 
is down, and could confirm that the server failed to start as expected, 
emitting a FATAL message.  Worked nicely.

Just two cosmetic points:

(1)
Other places use "two-phase state file", not "two-phase file".


(2)
All other places in twophase.c and most places in other files put ereport() and 
errmsg() on separate lines.  I think it would be better to align with 
surrounding code.

+   ereport(FATAL, (errmsg("corrupted two-phase 
file \"%s\"",


Regards
Takayuki Tsunakawa




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-03-16 Thread Tsunakawa, Takayuki
From: David Steele [mailto:da...@pgmasters.net]
> Any idea when you'll have a chance to review?

I'll do it by early next week.

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-03-16 Thread David Steele
On 12/20/16 2:54 AM, Michael Paquier wrote:
> On Sat, Dec 17, 2016 at 9:23 PM, Magnus Hagander  wrote:
>> On Fri, Dec 16, 2016 at 7:08 AM, Michael Paquier 
>> wrote:
>>> Looking at PrescanPreparedTransactions(), I am thinking as well that it
>>> would
>>> be better to get a hard failure when bumping on a corrupted 2PC file.
>>> Future
>>> files are one thing, but corrupted files should be treated more carefully.
>>
>>
>> Again without looking at it, I agree (so much easier that way :P). Ignoring
>> corruption is generally a bad idea. Failing hard makes the user notice the
>> error, and makes it possible to initiate recovery from a standby or from
>> backups or something, or to *intentionally* remove/clear/ignore it.
> 
> And I am finishing with the two patches attached:
> - 0001 changes the 2PC checks so as corrupted entries are FATAL.
> PreScanPreparedTransaction is used when a hot standby is initialized.
> In this case a failure protects the range of XIDs generated,
> potentially saving from corruption of data. At the end of recovery,
> this is done before any on-disk actions are taken.
> - 0002 is the thing that Heikki has sent previously to minimize the
> window between end-of-recovery record write and timeline history file
> archiving.
> 
> I am attaching that to next CF.

This patch still applies cleanly and compiles at cccbdde.

Any idea when you'll have a chance to review?

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-01-30 Thread Michael Paquier
On Tue, Dec 20, 2016 at 4:54 PM, Michael Paquier
 wrote:
> I am attaching that to next CF.

Moved to CF 2017-03. Both patches still apply.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2016-12-19 Thread Michael Paquier
On Sat, Dec 17, 2016 at 9:23 PM, Magnus Hagander  wrote:
> On Fri, Dec 16, 2016 at 7:08 AM, Michael Paquier 
> wrote:
>> Looking at PrescanPreparedTransactions(), I am thinking as well that it
>> would
>> be better to get a hard failure when bumping on a corrupted 2PC file.
>> Future
>> files are one thing, but corrupted files should be treated more carefully.
>
>
> Again without looking at it, I agree (so much easier that way :P). Ignoring
> corruption is generally a bad idea. Failing hard makes the user notice the
> error, and makes it possible to initiate recovery from a standby or from
> backups or something, or to *intentionally* remove/clear/ignore it.

And I am finishing with the two patches attached:
- 0001 changes the 2PC checks so as corrupted entries are FATAL.
PreScanPreparedTransaction is used when a hot standby is initialized.
In this case a failure protects the range of XIDs generated,
potentially saving from corruption of data. At the end of recovery,
this is done before any on-disk actions are taken.
- 0002 is the thing that Heikki has sent previously to minimize the
window between end-of-recovery record write and timeline history file
archiving.

I am attaching that to next CF.
-- 
Michael
From 0e816454c57b851f4aa2743ea776b1e604a27d77 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 20 Dec 2016 16:47:35 +0900
Subject: [PATCH 2/2] Minimize window between history file and end-of-recovery
 record

Once a standby node is promoted, this makes the assignment of the new
timeline number booked earlier as the history file gets archived immediately.
This way the other nodes are aware that this new timeline number is taken
and should not be assigned to other nodes.

The window between which the history file is archived and the end-of-recovery
record is written cannot be zeroed, but this way it is minimized as much as
possible. The new order of actions prevents as well a corrupted data directory
on failure.
---
 src/backend/access/transam/xlog.c | 27 ++-
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ea2c523e6e..f23feb47e8 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7210,6 +7210,24 @@ StartupXLOG(void)
 		else
 			snprintf(reason, sizeof(reason), "no recovery target specified");
 
+		/*
+		 * We are now done reading the old WAL.  Turn off archive fetching if it
+		 * was active, and make a writable copy of the last WAL segment. (Note
+		 * that we also have a copy of the last block of the old WAL in readBuf;
+		 * we will use that below.)
+		 */
+		exitArchiveRecovery(EndOfLogTLI, EndOfLog);
+
+		/*
+		 * Write the timeline history file, and have it archived. After this
+		 * point (or rather, as soon as the file is archived), the timeline
+		 * will appear as "taken" in the WAL archive and to any standby servers.
+		 * If we crash before actually switching to the new timeline, standby
+		 * servers will nevertheless think that we switched to the new timeline,
+		 * and will try to connect to the new timeline. To minimize the window
+		 * for that, try to do as little as possible between here and writing the
+		 * end-of-recovery record.
+		 */
 		writeTimeLineHistory(ThisTimeLineID, recoveryTargetTLI,
 			 EndRecPtr, reason);
 	}
@@ -7219,15 +7237,6 @@ StartupXLOG(void)
 	XLogCtl->PrevTimeLineID = PrevTimeLineID;
 
 	/*
-	 * We are now done reading the old WAL.  Turn off archive fetching if it
-	 * was active, and make a writable copy of the last WAL segment. (Note
-	 * that we also have a copy of the last block of the old WAL in readBuf;
-	 * we will use that below.)
-	 */
-	if (ArchiveRecoveryRequested)
-		exitArchiveRecovery(EndOfLogTLI, EndOfLog);
-
-	/*
 	 * Prepare to write WAL starting at EndOfLog position, and init xlog
 	 * buffer cache using the block containing the last record from the
 	 * previous incarnation.
-- 
2.11.0

From 4037db778a4353fc79b15eb683a5b9e3f648db24 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 20 Dec 2016 16:41:44 +0900
Subject: [PATCH 1/2] Change detection of corrupted 2PC files as FATAL

When scanning 2PC files, be it for initializing a hot standby node or
finding the XID range at the end of recovery, bumping on a corrupted
2PC file is reported as a WARNING and are blindly corrupted. If it
happened that the corrupted file was actually legit, there is actually
a risk of corruption, so switch that to a hard failure.
---
 src/backend/access/transam/twophase.c | 23 ---
 src/backend/access/transam/xlog.c | 10 +++---
 2 files changed, 15 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 5415604993..9cbcb83062 100644
--- a/src/backend/access/transam/twophase.c
+++ 

Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2016-12-17 Thread Magnus Hagander
On Fri, Dec 16, 2016 at 7:08 AM, Michael Paquier 
wrote:

> On Thu, Dec 15, 2016 at 11:25:10AM +0200, Heikki Linnakangas wrote:
> > On 12/15/2016 10:44 AM, Magnus Hagander wrote:
> > > I wonder if there might be more corner cases like this, but in this
> > > particular one it seems easy enough to just say that failing to rename
> > > recovery.conf because it didn't exist is safe.
> >
> > Yeah. It's unexpected though, so I think erroring out is quite
> reasonable.
> > If the recovery.conf file went missing, who knows what else is wrong.
>
> i'd rather not mess up with the exiting behavior and just complain to the
> pilot to not do that. This enters in the category of not removing the
> backup_label file after taking a backup...
>

I'm happy to say that people shouldn't do that. However, we have a system
that is unnecessarily fragile, by making something like that so easy to
break. This could equally happen in other ways. For example, someone could
accidentally provision a recovery.conf with the wrong permissions.

Reducing the fragility of such an important part of the system is a big
improvement, IMO. Of course, we have to be extra careful when touching
those parts.

Bottom line, I'm perfectly fine with failing in such a scenario. I'm not
fine with leaving the user with a corrupt cluster if we can avoid it.


As for your comparison, we fixed the backup_label part by adding support
for taking backups without using the fragile system. So it's not like we
didn't recognize the problem.



>
> > > But in the case of failing to rename recovery.conf for example because
> of
> > > permissions errors, we don't want to ignore it. But we also really
> don't
> > > want to end up with this kind of inconsistent data directory IMO. I
> don't
> > > know that code well enough to suggest how to fix it though -- hoping
> for
> > > input for someone who knows it closer?
> >
> > Hmm. Perhaps we should write the timeline history file only after
> renaming
> > recovery.conf. In general, we should keep the window between writing the
> > timeline history file and writing the end-of-recovery record as small as
> > possible. I don't think we can eliminate it completely, but it makes
> sense
> > to minimize it. Something like the attached (completely untested).
>

I haven't looked at the code either, but the reason is definitely solid -
let's keep the time as short as possible. In particular, keeping it
recoverable for things that are caused by humans, because they will keep
making mistakes. And editing recovery.conf is a normal thing for a DBA to
do (sure, not removing it -- but it's one of the few files in the data
directory that the DBA is actually supposed to handle manually, so that
increases the risk).

And also by doing things in an order that will at least make it less likely
to end up corrupt if people manage to do it anyway.


I did some tests. And after a lookup it looks like a good thing to book
> the timeline number at an earlier step and let other nodes know about
> it. So +1 on it.
>


>
> Looking at PrescanPreparedTransactions(), I am thinking as well that it
> would
> be better to get a hard failure when bumping on a corrupted 2PC file.
> Future
> files are one thing, but corrupted files should be treated more carefully.
>

Again without looking at it, I agree (so much easier that way :P). Ignoring
corruption is generally a bad idea. Failing hard makes the user notice the
error, and makes it possible to initiate recovery from a standby or from
backups or something, or to *intentionally* remove/clear/ignore it.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2016-12-15 Thread Michael Paquier
On Thu, Dec 15, 2016 at 11:25:10AM +0200, Heikki Linnakangas wrote:
> On 12/15/2016 10:44 AM, Magnus Hagander wrote:
> > I wonder if there might be more corner cases like this, but in this
> > particular one it seems easy enough to just say that failing to rename
> > recovery.conf because it didn't exist is safe.
> 
> Yeah. It's unexpected though, so I think erroring out is quite reasonable.
> If the recovery.conf file went missing, who knows what else is wrong.

i'd rather not mess up with the exiting behavior and just complain to the
pilot to not do that. This enters in the category of not removing the
backup_label file after taking a backup...

> > But in the case of failing to rename recovery.conf for example because of
> > permissions errors, we don't want to ignore it. But we also really don't
> > want to end up with this kind of inconsistent data directory IMO. I don't
> > know that code well enough to suggest how to fix it though -- hoping for
> > input for someone who knows it closer?
> 
> Hmm. Perhaps we should write the timeline history file only after renaming
> recovery.conf. In general, we should keep the window between writing the
> timeline history file and writing the end-of-recovery record as small as
> possible. I don't think we can eliminate it completely, but it makes sense
> to minimize it. Something like the attached (completely untested).

I did some tests. And after a lookup it looks like a good thing to book
the timeline number at an earlier step and let other nodes know about
it. So +1 on it.

Looking at PrescanPreparedTransactions(), I am thinking as well that it would
be better to get a hard failure when bumping on a corrupted 2PC file. Future
files are one thing, but corrupted files should be treated more carefully.
-- 
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2016-12-15 Thread Heikki Linnakangas

On 12/15/2016 10:44 AM, Magnus Hagander wrote:

I wonder if there might be more corner cases like this, but in this
particular one it seems easy enough to just say that failing to rename
recovery.conf because it didn't exist is safe.


Yeah. It's unexpected though, so I think erroring out is quite 
reasonable. If the recovery.conf file went missing, who knows what else 
is wrong.



But in the case of failing to rename recovery.conf for example because of
permissions errors, we don't want to ignore it. But we also really don't
want to end up with this kind of inconsistent data directory IMO. I don't
know that code well enough to suggest how to fix it though -- hoping for
input for someone who knows it closer?


Hmm. Perhaps we should write the timeline history file only after 
renaming recovery.conf. In general, we should keep the window between 
writing the timeline history file and writing the end-of-recovery record 
as small as possible. I don't think we can eliminate it completely, but 
it makes sense to minimize it. Something like the attached (completely 
untested).


- Heikki



reorder-end-of-archive-recovery-actions-1.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers