Oops! The comment in the previous patch is wrong.
At Mon, 22 Apr 2019 16:15:13 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI
<[email protected]> wrote in
<[email protected]>
> At Mon, 22 Apr 2019 12:36:43 +0800, Paul Guo <[email protected]> wrote in
> <CAEET0ZGpUrMGUzfyzVF9FuSq+zb=qovya2cvyrndotvz5vx...@mail.gmail.com>
> > Please see my replies inline. Thanks.
> >
> > On Fri, Apr 19, 2019 at 12:38 PM Asim R P <[email protected]> wrote:
> >
> > > On Wed, Apr 17, 2019 at 1:27 PM Paul Guo <[email protected]> wrote:
> > > >
> > > > create db with tablespace
> > > > drop database
> > > > drop tablespace.
> > >
> > > Essentially, that sequence of operations causes crash recovery to fail
> > > if the "drop tablespace" transaction was committed before crashing.
> > > This is a bug in crash recovery in general and should be reproducible
> > > without configuring a standby. Is that right?
> > >
> >
> > No. In general, checkpoint is done for drop_db/create_db/drop_tablespace on
> > master.
> > That makes the file/directory update-to-date if I understand the related
> > code correctly.
> > For standby, checkpoint redo does not ensure that.
>
> That's right partly. As you must have seen, fast shutdown forces
> restartpoint for the last checkpoint and it prevents the problem
> from happening. Anyway it seems to be a problem.
>
> > > Your patch creates missing directories in the destination. Don't we
> > > need to create the tablespace symlink under pg_tblspc/? I would
> > >
> >
> > 'create db with tablespace' redo log does not include the tablespace real
> > directory information.
> > Yes, we could add in it into the xlog, but that seems to be an overdesign.
>
> But I don't think creating directory that is to be removed just
> after is a wanted solution. The directory most likely to be be
> removed just after.
>
> > > prefer extending the invalid page mechanism to deal with this, as
> > > suggested by Ashwin off-list. It will allow us to avoid creating
> >
> > directories and files only to remove them shortly afterwards when the
> > > drop database and drop tablespace records are replayed.
> > >
> > >
> > 'invalid page' mechanism seems to be more proper for missing pages of a
> > file. For
> > missing directories, we could, of course, hack to use that (e.g. reading
> > any page of
> > a relfile in that database) to make sure the tablespace create code
> > (without symlink)
> > safer (It assumes those directories will be deleted soon).
> >
> > More feedback about all of the previous discussed solutions is welcome.
>
> It doesn't seem to me that the invalid page mechanism is
> applicable in straightforward way, because it doesn't consider
> simple file copy.
>
> Drop failure is ignored any time. I suppose we can ignore the
> error to continue recovering as far as recovery have not reached
> consistency. The attached would work *at least* your case, but I
> haven't checked this covers all places where need the same
> treatment.
The comment for the new function XLogMakePGDirectory is wrong:
+ * There is a possibility that WAL replay causes a creation of the same
+ * directory left by the previous crash. Issuing ERROR prevents the caller
+ * from continuing recovery.
The correct one is:
+ * There is a possibility that WAL replay causes an error by creation of a
+ * directory under a directory removed before the previous crash. Issuing
+ * ERROR prevents the caller from continuing recovery.
It is fixed in the attached.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 10a663bae6..01331f0da9 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -522,6 +522,44 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
return buffer;
}
+/*
+ * XLogMakePGDirectory
+ *
+ * There is a possibility that WAL replay causes an error by creation of a
+ * directory under a directory removed before the previous crash. Issuing
+ * ERROR prevents the caller from continuing recovery.
+ *
+ * To prevent that case, this function issues WARNING instead of ERROR on
+ * error if consistency is not reached yet.
+ */
+int
+XLogMakePGDirectory(const char *directoryName)
+{
+ int ret;
+
+ ret = MakePGDirectory(directoryName);
+
+ if (ret != 0)
+ {
+ int elevel = ERROR;
+
+ /*
+ * We might get error trying to create existing directory that is to
+ * be removed just after. Don't issue ERROR in the case. Recovery
+ * will stop if we again failed after reaching consistency.
+ */
+ if (InRecovery && !reachedConsistency)
+ elevel = WARNING;
+
+ ereport(elevel,
+ (errcode_for_file_access(),
+ errmsg("could not create directory \"%s\": %m", directoryName)));
+ return ret;
+ }
+
+ return 0;
+}
+
/*
* Struct actually returned by XLogFakeRelcacheEntry, though the declared
* return type is Relation.
diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
index 30f6200a86..0216270dd3 100644
--- a/src/backend/storage/file/copydir.c
+++ b/src/backend/storage/file/copydir.c
@@ -22,11 +22,11 @@
#include <unistd.h>
#include <sys/stat.h>
+#include "access/xlogutils.h"
#include "storage/copydir.h"
#include "storage/fd.h"
#include "miscadmin.h"
#include "pgstat.h"
-
/*
* copydir: copy a directory
*
@@ -41,10 +41,12 @@ copydir(char *fromdir, char *todir, bool recurse)
char fromfile[MAXPGPATH * 2];
char tofile[MAXPGPATH * 2];
- if (MakePGDirectory(todir) != 0)
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not create directory \"%s\": %m", todir)));
+ /*
+ * We might have to skip copydir to continue recovery. See the function
+ * for details.
+ */
+ if (XLogMakePGDirectory(todir) != 0)
+ return;
xldir = AllocateDir(fromdir);
diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h
index 0ab5ba62f5..46a7596315 100644
--- a/src/include/access/xlogutils.h
+++ b/src/include/access/xlogutils.h
@@ -43,6 +43,7 @@ extern XLogRedoAction XLogReadBufferForRedoExtended(XLogReaderState *record,
extern Buffer XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
BlockNumber blkno, ReadBufferMode mode);
+extern int XLogMakePGDirectory(const char *directoryName);
extern Relation CreateFakeRelcacheEntry(RelFileNode rnode);
extern void FreeFakeRelcacheEntry(Relation fakerel);