Re: [HACKERS] END_OF_RECOVERY shutdowns and ResetUnloggedRelations()

2014-11-14 Thread Andres Freund
On 2014-10-27 16:09:30 +0530, Abhijit Menon-Sen wrote:
 At 2014-09-25 22:41:18 +0200, and...@2ndquadrant.com wrote:
 
  On 2014-09-24 17:06:05 +0530, Abhijit Menon-Sen wrote:
   
   1. Move the call to ResetUnloggedRelations(UNLOGGED_RELATION_INIT) to
  earlier in StartupXLOG.
   
   2. Inside that function, issue fsync()s for the main forks we create by
  copying the _init fork.
  
  These two imo should definitely be backpatched.
 
 I've attached updated versions of these two patches. The only changes
 are to revise the comments as you suggested.

Committed and backpatched (to 9.1) these. That required also
backpatching the move of fsync_fname() to fd.c/h to 9.1-9.3.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] END_OF_RECOVERY shutdowns and ResetUnloggedRelations()

2014-10-27 Thread Abhijit Menon-Sen
At 2014-09-25 22:41:18 +0200, and...@2ndquadrant.com wrote:

 On 2014-09-24 17:06:05 +0530, Abhijit Menon-Sen wrote:
  
  1. Move the call to ResetUnloggedRelations(UNLOGGED_RELATION_INIT) to
 earlier in StartupXLOG.
  
  2. Inside that function, issue fsync()s for the main forks we create by
 copying the _init fork.
 
 These two imo should definitely be backpatched.

I've attached updated versions of these two patches. The only changes
are to revise the comments as you suggested.

  3. A small fixup to add a const to typedef char *FileName, because the
 earlier patch gave me warnings about discarding const-ness. This is
 consistent with many other functions in fd.c that take const char *.
 
 I'm happy with consider that one for master (although I seem to recall
 having had a patch for it rejected?), but I don't think we want to do
 that in the backbranches.

(I gave up on this for now as an unrelated diversion.)

  4. Issue an fsync() on the data directory at startup if we need to
  perform crash recovery.
 
 I personally think this one should be backpatched too. Does anyone
 believe differently?

I revised this patch as well, I'll rebase and send it separately along
with the initdb -S patch shortly.

-- Abhijit
From 8487185a5f2073ed475f971e6203d49e1e21a987 Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen a...@2ndquadrant.com
Date: Wed, 8 Oct 2014 11:48:25 +0530
Subject: Call ResetUnloggedRelations(UNLOGGED_RELATION_INIT) earlier

We need to call this after recovery, but not after the END_OF_RECOVERY
checkpoint is written. If we crash after that checkpoint, for example
because of ENOSPC in PreallocXlogFiles, the checkpoint has already set
the ControlFile-state to DB_SHUTDOWNED, so we don't enter recovery
again at startup.

Because we did call ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP)
in the first cleanup, this leaves the database with a bunch of _init
forks for unlogged relations, but no main forks, leading to scary
errors.

See thread from 20140912112246.ga4...@alap3.anarazel.de for details.
---
 src/backend/access/transam/xlog.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3c9aeae..2ab9da5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6871,6 +6871,16 @@ StartupXLOG(void)
 	ShutdownWalRcv();
 
 	/*
+	 * Reset unlogged relations to the contents of their INIT fork. This
+	 * is done AFTER recovery is complete so as to include any unlogged
+	 * relations created during recovery, but BEFORE recovery is marked
+	 * as having completed successfully (because this step can fail,
+	 * e.g. due to ENOSPC).
+	 */
+	if (InRecovery)
+		ResetUnloggedRelations(UNLOGGED_RELATION_INIT);
+
+	/*
 	 * We don't need the latch anymore. It's not strictly necessary to disown
 	 * it, but let's do it for the sake of tidiness.
 	 */
@@ -7138,14 +7148,6 @@ StartupXLOG(void)
 	PreallocXlogFiles(EndOfLog);
 
 	/*
-	 * Reset initial contents of unlogged relations.  This has to be done
-	 * AFTER recovery is complete so that any unlogged relations created
-	 * during recovery also get picked up.
-	 */
-	if (InRecovery)
-		ResetUnloggedRelations(UNLOGGED_RELATION_INIT);
-
-	/*
 	 * Okay, we're officially UP.
 	 */
 	InRecovery = false;
-- 
1.9.1

From 944e7c4e27fca5589b8a103f7f470df23a5161c2 Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen a...@2ndquadrant.com
Date: Wed, 24 Sep 2014 16:01:37 +0530
Subject: =?UTF-8?q?Make=20ResetUnloggedRelations(=E2=80=A6=5FINIT)=20fsync?=
 =?UTF-8?q?=20newly-created=20main=20forks?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

During UNLOGGED_RELATION_INIT, after we have copied ${x}_init to $x, we
issue fsync()s for the newly-created files. We depend on their existence
and a checkpoint isn't going to fsync them for us during recovery.

See thread from 20140912112246.ga4...@alap3.anarazel.de for details.
---
 src/backend/storage/file/reinit.c | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/src/backend/storage/file/reinit.c b/src/backend/storage/file/reinit.c
index 3229f41..4ad5987 100644
--- a/src/backend/storage/file/reinit.c
+++ b/src/backend/storage/file/reinit.c
@@ -339,6 +339,44 @@ ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname, int op)
 		}
 
 		FreeDir(dbspace_dir);
+
+		/*
+		 * copy_file() above has already called pg_flush_data() on the
+		 * files it created. Now we need to fsync those files, because
+		 * a checkpoint won't do it for us while we're in recovery. We
+		 * do this in a separate pass to allow the kernel to perform
+		 * all the flushes at once.
+		 */
+
+		dbspace_dir = AllocateDir(dbspacedirname);
+		while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
+		{
+			ForkNumber forkNum;
+			int			oidchars;
+			char		oidbuf[OIDCHARS + 1];
+			char		mainpath[MAXPGPATH];
+
+			/* Skip 

Re: [HACKERS] END_OF_RECOVERY shutdowns and ResetUnloggedRelations()

2014-09-25 Thread Andres Freund
Hi,

On 2014-09-24 17:06:05 +0530, Abhijit Menon-Sen wrote:
 Hi Andres, Robert.
 
 I've attached four patches here.

Cool. Will review.

 1. Move the call to ResetUnloggedRelations(UNLOGGED_RELATION_INIT) to
earlier in StartupXLOG.
 
 2. Inside that function, issue fsync()s for the main forks we create by
copying the _init fork.

These two imo should definitely be backpatched.

 3. A small fixup to add a const to typedef char *FileName, because the
earlier patch gave me warnings about discarding const-ness. This is
consistent with many other functions in fd.c that take const char *.

I'm happy with consider that one for master (although I seem to recall having 
had
a patch for it rejected?), but I don't think we want to do that in the
backbranches.

 4. Issue an fsync() on the data directory at startup if we need to
perform crash recovery.

I personally think this one should be backpatched too. Does anyone
believe differently?

 From d8726c06cdf11674661eac1d091cf7edd05c2a0c Mon Sep 17 00:00:00 2001
 From: Abhijit Menon-Sen a...@2ndquadrant.com
 Date: Wed, 24 Sep 2014 14:43:18 +0530
 Subject: Call ResetUnloggedRelations(UNLOGGED_RELATION_INIT) earlier
 
 We need to call this after recovery, but not after the END_OF_RECOVERY
 checkpoint is written. If we crash after that checkpoint, for example
 because of ENOSPC in PreallocXlogFiles, the checkpoint has already set
 the ControlFile-state to DB_SHUTDOWNED, so we don't enter recovery
 again at startup.
 
 Because we did call ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP)
 in the first cleanup, this leaves the database with a bunch of _init
 forks for unlogged relations, but no main forks, leading to scary
 errors.
 
 See thread from 20140912112246.ga4...@alap3.anarazel.de for details.

With a explanation. Shiny!

 ---
  src/backend/access/transam/xlog.c | 16 
  1 file changed, 8 insertions(+), 8 deletions(-)
 
 diff --git a/src/backend/access/transam/xlog.c 
 b/src/backend/access/transam/xlog.c
 index 46eef5f..218f7fb 100644
 --- a/src/backend/access/transam/xlog.c
 +++ b/src/backend/access/transam/xlog.c
 @@ -6863,6 +6863,14 @@ StartupXLOG(void)
   ShutdownWalRcv();
  
   /*
 +  * Reset initial contents of unlogged relations.  This has to be done
 +  * AFTER recovery is complete so that any unlogged relations created
 +  * during recovery also get picked up.
 +  */
 + if (InRecovery)
 + ResetUnloggedRelations(UNLOGGED_RELATION_INIT);
 +

+

 * Also recovery shouldn't be regarded successful if the reset fails -
 * e.g. because of ENOSPC.

 +
 + /*
 +  * copy_file() above has already called pg_flush_data() on the
 +  * files it created. Now we need to fsync those files, because
 +  * a checkpoint won't do it for us while we're in recovery.
 +  */

+

* Doing this in a separate pass is advantageous for performance reasons
* because it allows the kernel to perform all the flushes at once.

 + /*
 +  * If we need to perform crash recovery, we issue an fsync on the
 +  * data directory to try to ensure that any data written before the
 +  * crash are flushed to disk. Otherwise a power failure in the near
 +  * future might mean that earlier unflushed writes are lost, but the
 +  * more recent data written to disk from here on are persisted.
 +  */
 +
 + if (ControlFile-state != DB_SHUTDOWNED 
 + ControlFile-state != DB_SHUTDOWNED_IN_RECOVERY)
 + fsync_fname(data_directory, true);
 +
   if (ControlFile-state == DB_SHUTDOWNED)
   {
   /* This is the expected case, so don't be chatty in standalone 
 mode */
 -- 

Unless I miss something this isn't sufficient. We need to fsync the
files in the data directory, not just the toplevel directory?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] END_OF_RECOVERY shutdowns and ResetUnloggedRelations()

2014-09-25 Thread Abhijit Menon-Sen
At 2014-09-25 22:41:18 +0200, and...@2ndquadrant.com wrote:

  * Also recovery shouldn't be regarded successful if the reset fails -
  * e.g. because of ENOSPC.

OK.

 * Doing this in a separate pass is advantageous for performance reasons
 * because it allows the kernel to perform all the flushes at once.

OK.

 Unless I miss something this isn't sufficient. We need to fsync the
 files in the data directory, not just the toplevel directory?

No, of course you're right. So a separate function that does the moral
equivalent of find $PGDATA -exec fsync_fname …?

Will resubmit with the additional comments.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] END_OF_RECOVERY shutdowns and ResetUnloggedRelations()

2014-09-25 Thread Andres Freund
On 2014-09-26 02:34:06 +0530, Abhijit Menon-Sen wrote:
 At 2014-09-25 22:41:18 +0200, and...@2ndquadrant.com wrote:
  Unless I miss something this isn't sufficient. We need to fsync the
  files in the data directory, not just the toplevel directory?
 
 No, of course you're right. So a separate function that does the moral
 equivalent of find $PGDATA -exec fsync_fname …?

Probably will require some care to deal correctly with tablespaces.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] END_OF_RECOVERY shutdowns and ResetUnloggedRelations()

2014-09-24 Thread Abhijit Menon-Sen
Hi Andres, Robert.

I've attached four patches here.

1. Move the call to ResetUnloggedRelations(UNLOGGED_RELATION_INIT) to
   earlier in StartupXLOG.

2. Inside that function, issue fsync()s for the main forks we create by
   copying the _init fork.

3. A small fixup to add a const to typedef char *FileName, because the
   earlier patch gave me warnings about discarding const-ness. This is
   consistent with many other functions in fd.c that take const char *.

4. Issue an fsync() on the data directory at startup if we need to
   perform crash recovery.

I created some unlogged relations, performed an immediate shutdown, and
then straced postgres as it performed crash recovery. The changes in (2)
do indeed fsync the files we create by copying *_init, and don't fsync
anything else (at least not after I fixed a bug ;-).

I did not do anything about the END_OF_RECOVERY checkpoint setting the
ControlFile-state to DB_SHUTDOWNED, because it wasn't clear to me if
there was any agreement on what to do. I would be happy to submit a
followup patch if we reach some decision about it.

Is this what you had in mind?

-- Abhijit
From d8726c06cdf11674661eac1d091cf7edd05c2a0c Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen a...@2ndquadrant.com
Date: Wed, 24 Sep 2014 14:43:18 +0530
Subject: Call ResetUnloggedRelations(UNLOGGED_RELATION_INIT) earlier

We need to call this after recovery, but not after the END_OF_RECOVERY
checkpoint is written. If we crash after that checkpoint, for example
because of ENOSPC in PreallocXlogFiles, the checkpoint has already set
the ControlFile-state to DB_SHUTDOWNED, so we don't enter recovery
again at startup.

Because we did call ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP)
in the first cleanup, this leaves the database with a bunch of _init
forks for unlogged relations, but no main forks, leading to scary
errors.

See thread from 20140912112246.ga4...@alap3.anarazel.de for details.
---
 src/backend/access/transam/xlog.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 46eef5f..218f7fb 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6863,6 +6863,14 @@ StartupXLOG(void)
 	ShutdownWalRcv();
 
 	/*
+	 * Reset initial contents of unlogged relations.  This has to be done
+	 * AFTER recovery is complete so that any unlogged relations created
+	 * during recovery also get picked up.
+	 */
+	if (InRecovery)
+		ResetUnloggedRelations(UNLOGGED_RELATION_INIT);
+
+	/*
 	 * We don't need the latch anymore. It's not strictly necessary to disown
 	 * it, but let's do it for the sake of tidiness.
 	 */
@@ -7130,14 +7138,6 @@ StartupXLOG(void)
 	PreallocXlogFiles(EndOfLog);
 
 	/*
-	 * Reset initial contents of unlogged relations.  This has to be done
-	 * AFTER recovery is complete so that any unlogged relations created
-	 * during recovery also get picked up.
-	 */
-	if (InRecovery)
-		ResetUnloggedRelations(UNLOGGED_RELATION_INIT);
-
-	/*
 	 * Okay, we're officially UP.
 	 */
 	InRecovery = false;
-- 
1.9.1

From 7eba57b5ed9e1eda1fa1b14b32a82828617d823e Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen a...@2ndquadrant.com
Date: Wed, 24 Sep 2014 16:01:37 +0530
Subject: =?UTF-8?q?Make=20ResetUnloggedRelations(=E2=80=A6=5FINIT)=20fsync?=
 =?UTF-8?q?=20newly-created=20main=20forks?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

During UNLOGGED_RELATION_INIT, after we have copied ${x}_init to $x, we
issue fsync()s for the newly-created files. We depend on their existence
and a checkpoint isn't going to fsync them for us during recovery.

See thread from 20140912112246.ga4...@alap3.anarazel.de for details.
---
 src/backend/storage/file/reinit.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/src/backend/storage/file/reinit.c b/src/backend/storage/file/reinit.c
index 3229f41..4febf6f 100644
--- a/src/backend/storage/file/reinit.c
+++ b/src/backend/storage/file/reinit.c
@@ -339,6 +339,42 @@ ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname, int op)
 		}
 
 		FreeDir(dbspace_dir);
+
+		/*
+		 * copy_file() above has already called pg_flush_data() on the
+		 * files it created. Now we need to fsync those files, because
+		 * a checkpoint won't do it for us while we're in recovery.
+		 */
+
+		dbspace_dir = AllocateDir(dbspacedirname);
+		while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
+		{
+			ForkNumber forkNum;
+			int			oidchars;
+			char		oidbuf[OIDCHARS + 1];
+			char		mainpath[MAXPGPATH];
+
+			/* Skip anything that doesn't look like a relation data file. */
+			if (!parse_filename_for_nontemp_relation(de-d_name, oidchars,
+	 forkNum))
+continue;
+
+			/* Also skip it unless this is the init fork. */
+			if (forkNum != INIT_FORKNUM)
+continue;
+
+			/* Construct main fork pathname. */
+			memcpy(oidbuf, de-d_name, 

Re: [HACKERS] END_OF_RECOVERY shutdowns and ResetUnloggedRelations()

2014-09-22 Thread Robert Haas
On Thu, Sep 18, 2014 at 4:31 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-09-12 14:44:48 -0400, Robert Haas wrote:
 On Fri, Sep 12, 2014 at 7:22 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  What I like even less is that pg_control is actually marked as
  DB_SHUTDOWNED due to END_OF_RECOVERY. That's just plain wrong. Obviously
  the database was *NOT* shutdown peacefully. I don't see active bugs due
  it besides this, but I think it's likely to either have or create futher
  ones.

 I agree.  The database clearly isn't shut down at end of recovery;
 it's still active and we're still doing things to it.  If we crash at
 that point, we need to go back into recovery on restart.  This seems
 open and shut, but maybe I'm missing something.  Why shouldn't we fix
 *that*?

 Well, I think we might want to do both. There doesn't seem to be a
 reason to *not* do the ResetUnloggedRelation(UNLOGGED_RELATION_INIT)
 around the ShutdownWalRcv(). That seems much closer where it, for me,
 logically belongs. And it'd fix the concrete problem.

That might be all right.  I've booted enough things in this area of
the code to not have a lot of confidence in my ability to not boot
things in this area of the code ... but I don't see a problem with it.
It does seem a little odd to do that before checking whether we
reached the minimum recovery point, or deciding whether to assign a
new TLI, but I don't see a concrete problem with putting it there.

 With regard to your second email, I agree that
 ResetUnloggedRelation(UNLOGGED_RELATION_INIT) needs to issue fsyncs.

 Good.

 The topic reminds me of the fact that I actually think at the very least
 pg_xlog/ and pg_control needs the same treatment. Consider the following
 sequence:
 1) postgres writes loads of stuff. Lots of it to *not* fsynced WAL
segments
 2) postgres crashes and crash recovery happens. Replays *not* fsynced
WAL
 3) the OS crashes
 4) Bad. We now might hava pg_control with a minRecovery that's *later*
than some potentially unsynced WAL segments

 I think the easiest would be to just fsync() the entire data directory
 at the start when ControlFile-state !=
 DB_SHUTDOWNED/DB_SHUTDOWNED_IN_RECOVERY

 Note that that's independent of the fsync for unlogged relations.

Seems reasonable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] END_OF_RECOVERY shutdowns and ResetUnloggedRelations()

2014-09-18 Thread Andres Freund
On 2014-09-12 14:44:48 -0400, Robert Haas wrote:
 On Fri, Sep 12, 2014 at 7:22 AM, Andres Freund and...@2ndquadrant.com wrote:
  What I like even less is that pg_control is actually marked as
  DB_SHUTDOWNED due to END_OF_RECOVERY. That's just plain wrong. Obviously
  the database was *NOT* shutdown peacefully. I don't see active bugs due
  it besides this, but I think it's likely to either have or create futher
  ones.
 
 I agree.  The database clearly isn't shut down at end of recovery;
 it's still active and we're still doing things to it.  If we crash at
 that point, we need to go back into recovery on restart.  This seems
 open and shut, but maybe I'm missing something.  Why shouldn't we fix
 *that*?

Well, I think we might want to do both. There doesn't seem to be a
reason to *not* do the ResetUnloggedRelation(UNLOGGED_RELATION_INIT)
around the ShutdownWalRcv(). That seems much closer where it, for me,
logically belongs. And it'd fix the concrete problem.

For DB_SHUTDOWNED (is that actually a word? Looks like it could be from
me...) the case isn't that clear:

If you start a node after a crash and stop it as soon as it finished, it
doesn't need recovery again. Similar if a node is promoted and doesn't
use fast promotion or a older release. Now, I think this is a pretty
dubious benefit. But I'm not sure it's wise to change it in the back
branches.

 With regard to your second email, I agree that
 ResetUnloggedRelation(UNLOGGED_RELATION_INIT) needs to issue fsyncs.

Good.

The topic reminds me of the fact that I actually think at the very least
pg_xlog/ and pg_control needs the same treatment. Consider the following
sequence:
1) postgres writes loads of stuff. Lots of it to *not* fsynced WAL
   segments
2) postgres crashes and crash recovery happens. Replays *not* fsynced
   WAL
3) the OS crashes
4) Bad. We now might hava pg_control with a minRecovery that's *later*
   than some potentially unsynced WAL segments

I think the easiest would be to just fsync() the entire data directory
at the start when ControlFile-state !=
DB_SHUTDOWNED/DB_SHUTDOWNED_IN_RECOVERY

Note that that's independent of the fsync for unlogged relations.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] END_OF_RECOVERY shutdowns and ResetUnloggedRelations()

2014-09-12 Thread Andres Freund
Hi,

Abhijit and I investigated a customer problem which has showed that crash 
recovery +
unlogged relations don't always work well together:

A condensed version of how crash recovery works is:

StartupXLOG()
{
...
if (ControlFile-state != DB_SHUTDOWNED)
   InRecovery = true;

if (InRecovery)
{
ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP);

/* perform crash recovery till the end of WAL */
...
CreateCheckPoint(CHECKPOINT_END_OF_RECOVERY | CHECKPOINT_IMMEDIATE);
...
}

PreallocXlogFiles(EndOfLog);

if (InRecovery)
ResetUnloggedRelations(UNLOGGED_RELATION_INIT);
...
/* finish startup */
}

the second important part is:

CreateCheckPoint(flags)
{
...
if (flags  (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY))
shutdown = true;
...
if (shutdown)
ControlFile-state = DB_SHUTDOWNED;
UpdateControlFile();
...
}

If you consider a crash due to ENOSPC the problem is that first crash
recovery is performed. Then a checkpoint is performed which is marked as
END_OF_RECOVERY - which marks the database as being properly shut down!
So far, while not pretty, so good. The problem is that if we crash after
the CreateCheckPoint(), e.g. because of xlog preallocation or the new
files created in ResetUnloggedRelations(), the server will restart *but*
will *not* perform crash recovery anymore as pg_control marked as
DB_SHUTDOWNED!

That leaves you with a database which has all the _init forks, but not
the main forks... Leading to scary an unexpected errors.

Should somebody google this: The problem can be fixed by forcing the
server into crash recovery again using an immediate shutdown.


Personally I think it's quite the mistake that END_OF_RECOVERY
checkpoints are treated as shutdown checkpoints. The claimed reason
that:
 *
 * Note that we write a shutdown checkpoint rather than an on-line
 * one. This is not particularly critical, but since we may be
 * assigning a new TLI, using a shutdown checkpoint allows us to have
 * the rule that TLI only changes in shutdown checkpoints, which
 * allows some extra error checking in xlog_redo.
 *
and
/*
 * An end-of-recovery checkpoint is really a shutdown checkpoint, just
 * issued at a different time.
 */

isn't very convincing as those checks could just as well be saved in a
flags argument in the checkpoint. The likelihood of this confusion
causing further bugs (IIRC it already has caused a couple) seems high.

What I like even less is that pg_control is actually marked as
DB_SHUTDOWNED due to END_OF_RECOVERY. That's just plain wrong. Obviously
the database was *NOT* shutdown peacefully. I don't see active bugs due
it besides this, but I think it's likely to either have or create futher
ones.


Because at least the former is something that obviously we can't (and
don't want) to change in the back branches I think the solution for this
particular problem is to simply move the ResetUnloggedRelations() call a
couple lines up to before the CreateCheckPoint(). That should fix this.

Comments, other opinions?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] END_OF_RECOVERY shutdowns and ResetUnloggedRelations()

2014-09-12 Thread Andres Freund
Hi,

On 2014-09-12 13:22:46 +0200, Andres Freund wrote:
 Because at least the former is something that obviously we can't (and
 don't want) to change in the back branches I think the solution for this
 particular problem is to simply move the ResetUnloggedRelations() call a
 couple lines up to before the CreateCheckPoint(). That should fix this.
 
 Comments, other opinions?

A second thing I realized, just after hitting send, is that I think
ResetUnloggedRelations(UNLOGGED_RELATION_INIT) actually has to fsync the
created files. As we rely on the !init relations being there they really
need to be fsynced. During normal running that's ensured via the smgr
during checkpoints, but that's not the case here.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] END_OF_RECOVERY shutdowns and ResetUnloggedRelations()

2014-09-12 Thread Robert Haas
On Fri, Sep 12, 2014 at 7:22 AM, Andres Freund and...@2ndquadrant.com wrote:
 What I like even less is that pg_control is actually marked as
 DB_SHUTDOWNED due to END_OF_RECOVERY. That's just plain wrong. Obviously
 the database was *NOT* shutdown peacefully. I don't see active bugs due
 it besides this, but I think it's likely to either have or create futher
 ones.

I agree.  The database clearly isn't shut down at end of recovery;
it's still active and we're still doing things to it.  If we crash at
that point, we need to go back into recovery on restart.  This seems
open and shut, but maybe I'm missing something.  Why shouldn't we fix
*that*?

With regard to your second email, I agree that
ResetUnloggedRelation(UNLOGGED_RELATION_INIT) needs to issue fsyncs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers