On Tue, May 14, 2019 at 11:06 AM Kyotaro HORIGUCHI < horiguchi.kyot...@lab.ntt.co.jp> wrote:
> Hello. > > At Mon, 13 May 2019 17:37:50 +0800, Paul Guo <p...@pivotal.io> wrote in < > caeet0zf9yn4daxyuflzocayyxuff1ms_oqwea+rwv3gha5q...@mail.gmail.com> > > Thanks for the reply. > > > > On Tue, May 7, 2019 at 2:47 PM Kyotaro HORIGUCHI < > > horiguchi.kyot...@lab.ntt.co.jp> wrote: > > > > > > > > + if (!(stat(parent_path, &st) == 0 && S_ISDIR(st.st_mode))) > > > + { > > > > > > This patch is allowing missing source and destination directory > > > even in consistent state. I don't think it is safe. > > > > > > > I do not understand this. Can you elaborate? > > Suppose we were recoverying based on a backup at LSN1 targeting > to LSN3 then it crashed at LSN2, where LSN1 < LSN2 <= LSN3. LSN2 > is called as "consistency point", before where the database is > not consistent. It's because we are applying WAL recored older > than those that were already applied in the second trial. The > same can be said for crash recovery, where LSN1 is the latest > checkpoint ('s redo LSN) and LSN2=LSN3 is the crashed LSN. > > Creation of an existing directory or dropping of a non-existent > directory are apparently inconsistent or "broken" so we should > stop recovery when seeing such WAL records while database is in > consistent state. > This seems to be hard to detect. I thought using invalid_page mechanism long ago, but it seems to be hard to fully detect a dropped tablespace. > > > 2) Fixed dbase_desc(). Now the xlog output looks correct. > > > > > > > > rmgr: Database len (rec/tot): 42/ 42, tx: 486, lsn: > > > > 0/016386A8, prev 0/01638630, desc: CREATE copy dir base/1 to > > > > pg_tblspc/16384/PG_12_201904281/16386 > > > > > > > > rmgr: Database len (rec/tot): 34/ 34, tx: 487, lsn: > > > > 0/01638EB8, prev 0/01638E40, desc: DROP dir > > > > pg_tblspc/16384/PG_12_201904281/16386 > > > > > > WAL records don't convey such information. The previous > > > description seems right to me. > > > > > > > 2019-04-17 14:52:14.951 CST [23030] CONTEXT: WAL redo at 0/3011650 for > > Database/CREATE: copy dir 1663/1 to 65546/65547 > > The directories are definitely wrong and misleading. > > The original description is right in the light of how the server > recognizes. The record exactly says that "copy dir 1663/1 to > 65546/65547" and the latter path is converted in filesystem layer > via a symlink. > In either $PG_DATA/pg_tblspc or symlinked real tablespace directory, there is an additional directory like PG_12_201905221 between tablespace oid and database oid. See the directory layout as below, so the directory info in xlog dump output was not correct. $ ls -lh data/pg_tblspc/ total 0 lrwxrwxrwx. 1 gpadmin gpadmin 6 May 27 17:23 16384 -> /tmp/2 $ ls -lh /tmp/2 total 0 drwx------. 3 gpadmin gpadmin 18 May 27 17:24 PG_12_201905221 > > > > > > > Also I'd suggest we should use pg_mkdir_p() in > > > TablespaceCreateDbspace() > > > > > to replace > > > > > the code block includes a lot of > > > > > get_parent_directory(), MakePGDirectory(), etc even it > > > > > is not fixing a bug since pg_mkdir_p() code change seems to be more > > > > > graceful and simpler. > > > > > > But I don't agree to this. pg_mkdir_p goes above two-parents up, > > > which would be unwanted here. > > > > > > I do not understand this also. pg_mkdir_p() is similar to 'mkdir -p'. > > This change just makes the code concise. Though in theory the change is > not > > needed. > > We don't want to create tablespace direcotory after concurrent > DROPing, as the comment just above is saying: > > | * Acquire TablespaceCreateLock to ensure that no DROP TABLESPACE > | * or TablespaceCreateDbspace is running concurrently. > > If the concurrent DROP TABLESPACE destroyed the grand parent > directory, we mustn't create it again. > Yes, this is a good reason to keep the original code. Thanks. By the way, based on your previous test patch I added another test which could easily detect the missing src directory case.