Re: standby recovery fails (tablespace related) (tentative patch and discussion)

Paul Guo Mon, 27 May 2019 06:40:32 -0700

On Tue, May 14, 2019 at 11:06 AM Kyotaro HORIGUCHI <
[email protected]> wrote:


> Hello.
>
> At Mon, 13 May 2019 17:37:50 +0800, Paul Guo <[email protected]> wrote in <
> caeet0zf9yn4daxyuflzocayyxuff1ms_oqwea+rwv3gha5q...@mail.gmail.com>
> > Thanks for the reply.
> >
> > On Tue, May 7, 2019 at 2:47 PM Kyotaro HORIGUCHI <
> > [email protected]> wrote:
> >
> > >
> > > +      if (!(stat(parent_path, &st) == 0 && S_ISDIR(st.st_mode)))
> > > +      {
> > >
> > > This patch is allowing missing source and destination directory
> > > even in consistent state. I don't think it is safe.
> > >
> >
> > I do not understand this. Can you elaborate?
>
> Suppose we were recoverying based on a backup at LSN1 targeting
> to LSN3 then it crashed at LSN2, where LSN1 < LSN2 <= LSN3. LSN2
> is called as "consistency point", before where the database is
> not consistent. It's because we are applying WAL recored older
> than those that were already applied in the second trial. The
> same can be said for crash recovery, where LSN1 is the latest
> checkpoint ('s redo LSN) and LSN2=LSN3 is the crashed LSN.
>
> Creation of an existing directory or dropping of a non-existent
> directory are apparently inconsistent or "broken" so we should
> stop recovery when seeing such WAL records while database is in
> consistent state.
>

This seems to be hard to detect. I thought using invalid_page mechanism
long ago,
but it seems to be hard to fully detect a dropped tablespace.

> > > 2) Fixed dbase_desc(). Now the xlog output looks correct.
> > > >
> > > > rmgr: Database    len (rec/tot):     42/    42, tx:        486, lsn:
> > > > 0/016386A8, prev 0/01638630, desc: CREATE copy dir base/1 to
> > > > pg_tblspc/16384/PG_12_201904281/16386
> > > >
> > > > rmgr: Database    len (rec/tot):     34/    34, tx:        487, lsn:
> > > > 0/01638EB8, prev 0/01638E40, desc: DROP dir
> > > > pg_tblspc/16384/PG_12_201904281/16386
> > >
> > > WAL records don't convey such information. The previous
> > > description seems right to me.
> > >
> >
> > 2019-04-17 14:52:14.951 CST [23030] CONTEXT:  WAL redo at 0/3011650 for
> > Database/CREATE: copy dir 1663/1 to 65546/65547
> > The directories are definitely wrong and misleading.
>
> The original description is right in the light of how the server
> recognizes. The record exactly says that "copy dir 1663/1 to
> 65546/65547" and the latter path is converted in filesystem layer
> via a symlink.
>

In either $PG_DATA/pg_tblspc or symlinked real tablespace directory,
there is an additional directory like PG_12_201905221 between
tablespace oid and database oid. See the directory layout as below,
so the directory info in xlog dump output was not correct.

$ ls -lh data/pg_tblspc/


total 0


lrwxrwxrwx. 1 gpadmin gpadmin 6 May 27 17:23 16384 -> /tmp/2


$ ls -lh /tmp/2


total 0


drwx------. 3 gpadmin gpadmin 18 May 27 17:24 PG_12_201905221

>
>
> > > > > Also I'd suggest we should use pg_mkdir_p() in
> > > TablespaceCreateDbspace()
> > > > > to replace
> > > > > the code block includes a lot of
> > > > > get_parent_directory(), MakePGDirectory(), etc even it
> > > > > is not fixing a bug since pg_mkdir_p() code change seems to be more
> > > > > graceful and simpler.
> > >
> > > But I don't agree to this. pg_mkdir_p goes above two-parents up,
> > > which would be unwanted here.
> > >
> > > I do not understand this also. pg_mkdir_p() is similar to 'mkdir -p'.
> > This change just makes the code concise. Though in theory the change is
> not
> > needed.
>
> We don't want to create tablespace direcotory after concurrent
> DROPing, as the comment just above is saying:
>
> |  * Acquire TablespaceCreateLock to ensure that no DROP TABLESPACE
> |  * or TablespaceCreateDbspace is running concurrently.
>
> If the concurrent DROP TABLESPACE destroyed the grand parent
> directory, we mustn't create it again.
>

Yes, this is a good reason to keep the original code. Thanks.

By the way, based on your previous test patch I added another test which
could easily detect
the missing src directory case.

Re: standby recovery fails (tablespace related) (tentative patch and discussion)

Reply via email to