Hi.
At Tue, 30 Apr 2019 14:33:47 +0800, Paul Guo <[email protected]> wrote in
<caeet0zghmdkrq7jju2rllqcjbr8pa4oyrksirz5ft8-deg1...@mail.gmail.com>
> I updated the original patch to
It's reasonable not to touch copydir.
> 1) skip copydir() if either src path or dst parent path is missing in
> dbase_redo(). Both missing cases seem to be possible. For the src path
> missing case, mkdir_p() is meaningless. It seems that moving the directory
> existence check step to dbase_redo() has less impact on other code.
Nice catch.
+ if (!(stat(parent_path, &st) == 0 && S_ISDIR(st.st_mode)))
+ {
This patch is allowing missing source and destination directory
even in consistent state. I don't think it is safe.
+ ereport(WARNING,
+ (errmsg("directory \"%s\" for copydir() does not exists."
+ "It is possibly expected. Skip copydir().",
+ parent_path)));
This message seems unfriendly to users, or it seems like an elog
message. How about something like this. The same can be said for
the source directory.
| WARNING: skipped creating database directory: "%s"
| DETAIL: The tabelspace %u may have been removed just before crash.
# I'm not confident in this at all:(
> 2) Fixed dbase_desc(). Now the xlog output looks correct.
>
> rmgr: Database len (rec/tot): 42/ 42, tx: 486, lsn:
> 0/016386A8, prev 0/01638630, desc: CREATE copy dir base/1 to
> pg_tblspc/16384/PG_12_201904281/16386
>
> rmgr: Database len (rec/tot): 34/ 34, tx: 487, lsn:
> 0/01638EB8, prev 0/01638E40, desc: DROP dir
> pg_tblspc/16384/PG_12_201904281/16386
WAL records don't convey such information. The previous
description seems right to me.
> I'm not familiar with the TAP test details previously. I learned a lot
> about how to test such case from Kyotaro's patch series.👍
Yeah, good to hear.
> On Sun, Apr 28, 2019 at 3:33 PM Paul Guo <[email protected]> wrote:
> > If we want to fix the issue by ignoring the dst path create failure, I do
> > not think we should do
> > that in copydir() since copydir() seems to be a common function. I'm not
> > sure whether it is
> > used by some extensions or not. If no maybe we should move the dst patch
> > create logic
> > out of copydir().
Agreed to this.
> > Also I'd suggest we should use pg_mkdir_p() in TablespaceCreateDbspace()
> > to replace
> > the code block includes a lot of
> > get_parent_directory(), MakePGDirectory(), etc even it
> > is not fixing a bug since pg_mkdir_p() code change seems to be more
> > graceful and simpler.
But I don't agree to this. pg_mkdir_p goes above two-parents up,
which would be unwanted here.
> > Whatever ignore mkdir failure or mkdir_p, I found that these steps seem to
> > be error-prone
> > along with postgre evolving since they are hard to test and also we are
> > not easy to think out
> > various potential bad cases. Is it possible that we should do real
> > checkpoint (flush & update
> > redo lsn) when seeing checkpoint xlogs for these operations? This will
> > slow down standby
> > but master also does this and also these operations are not usual,
> > espeically it seems that it
> > does not slow down wal receiving usually?
That dramatically slows recovery (not replication) if databases
are created and deleted frequently. That wouldn't be acceptable.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center