Re: TODO item: WAL replay of CREATE TABLESPACE with differing directory structure

2018-02-18 Thread Michael Paquier
On Sun, Feb 18, 2018 at 04:43:38PM -0800, Patrick Krecker wrote:
> On Tue, Feb 13, 2018 at 8:24 PM, Michael Paquier  wrote:
>> Let's be clear here. There is no hard restriction with tablespace paths
>> within the data directory, though you should not do that, and you get a
>> nice warning when trying to do so with CREATE TABLESPACE (see 33cb8ff6).
>> This also causes pg_basebackup to fail.  It is also bad design to create
>> tablespaces within the data directory as those are aimed at making hot
>> paths work on different partitions with different I/O properties.
> 
> Sorry, my language was imprecise here. What I meant is that the
> pg_tablespace directory contains no symlinks when a tablespace
> creation is streamed to a replica, i.e. the data files reside within
> pg_tablespace on the replica.

There is nothing preventing you to do so I think, and base backups
should work properly as basebackup.c just loops through the paths of the
tablespace links, which leads to errors if the links are within the data
folder itself.  I don't think that we would want a mode where CREATE
TABLESPACE does not create a link at recovery as well, be it controlled
by a system-wide GUC or a switch at DDL level.  That's more likely to
trap users by putting hot data on the same partition as the data folder.

> Thank you for the response. I would suggest that we link to it from
> the wiki so as to provide clarification to future readers of the todo
> list.

Good idea!  I have just updated the wiki page with a link to my previous
post.
--
Michael


signature.asc
Description: PGP signature


Re: TODO item: WAL replay of CREATE TABLESPACE with differing directory structure

2018-02-18 Thread Patrick Krecker
On Tue, Feb 13, 2018 at 8:24 PM, Michael Paquier  wrote:
> On Tue, Feb 13, 2018 at 01:44:34PM -0800, Patrick Krecker wrote:
>> I am searching for a way to make a contribution to Postgres and I came
>> across this TODO item (I realize there has been some controversy
>> around the TODO list [1], and I hope that my use of it doesn't spark
>> another discussion about removing it altogether):
>
> Well, it will point out again that TODO items are hard, complicated and
> mostly impossible projects.
>
>> "Allow WAL replay of CREATE TABLESPACE to work when the directory
>> structure on the recovery computer is different from the original"
>>
>> Currently it looks like tablespaces have to live inside the data
>> directory on the replica, notwithstanding administrator intervention
>> by manipulating the tablespace directory with symlinks after (or even
>> before?) it has been created via replay.
>
> Let's be clear here. There is no hard restriction with tablespace paths
> within the data directory, though you should not do that, and you get a
> nice warning when trying to do so with CREATE TABLESPACE (see 33cb8ff6).
> This also causes pg_basebackup to fail.  It is also bad design to create
> tablespaces within the data directory as those are aimed at making hot
> paths work on different partitions with different I/O properties.

Sorry, my language was imprecise here. What I meant is that the
pg_tablespace directory contains no symlinks when a tablespace
creation is streamed to a replica, i.e. the data files reside within
pg_tablespace on the replica.

>> Is the idea behind this task to allow the master to instruct the
>> replica where to put the tablespace on its filesystem, so as to allow
>> it to live outside of the data directory without direct manipulation
>> of the filesystem?
>
> WAL records associated to CREATE TABLESPACE (xl_tblspc_create_rec)
> register the location where a tablespace is located.  The location of a
> tablespace is not saved in the system catalogs, which offers flexibility
> in the way the symlink from pg_tblspc can be handled.  This is where the
> tablespace path remapping of pg_basebackup becomes handy, because you
> can repurpose paths easily when taking a base backup, but this forces
> you to create tablespaces first, and then create standbys.  We have also
> a set of existing problems:
> 1) If a primary and its standby are on the same server and you issue a
> CREATE TABLESPACE, then they would try to write to the same paths.
> 2) How do we design at DDL level a command which allows for specifying
> different paths depending on the node where the recovery happens.
>
> You would need in both cases a sort of ability to define a node name, so
> as for 1) you append the node name to the path and both primary and
> standby can use the same tablespace path, but with different sub-paths.
> And for 2), you can enforce a patch name by defining as well a path
> associated to a node name so as when xl_tblspc_create_rec records are
> replayed at recovery, you know which path to create.  Just designing
> that the right way as its own set of complications.
>
>> If this task is a worthwhile endeavor, I would be happy to take it on.
>> If not, I am open to other ideas :)
>
> This is part of the difficult, perhaps-not-worth doing impossible
> problems.  As a first contribution, you may want something easier.

Thank you for the response. I would suggest that we link to it from
the wiki so as to provide clarification to future readers of the todo
list.

> --
> Michael



Re: TODO item: WAL replay of CREATE TABLESPACE with differing directory structure

2018-02-13 Thread Michael Paquier
On Tue, Feb 13, 2018 at 01:44:34PM -0800, Patrick Krecker wrote:
> I am searching for a way to make a contribution to Postgres and I came
> across this TODO item (I realize there has been some controversy
> around the TODO list [1], and I hope that my use of it doesn't spark
> another discussion about removing it altogether):

Well, it will point out again that TODO items are hard, complicated and
mostly impossible projects.

> "Allow WAL replay of CREATE TABLESPACE to work when the directory
> structure on the recovery computer is different from the original"
> 
> Currently it looks like tablespaces have to live inside the data
> directory on the replica, notwithstanding administrator intervention
> by manipulating the tablespace directory with symlinks after (or even
> before?) it has been created via replay.

Let's be clear here. There is no hard restriction with tablespace paths
within the data directory, though you should not do that, and you get a
nice warning when trying to do so with CREATE TABLESPACE (see 33cb8ff6).
This also causes pg_basebackup to fail.  It is also bad design to create
tablespaces within the data directory as those are aimed at making hot
paths work on different partitions with different I/O properties.

> Is the idea behind this task to allow the master to instruct the
> replica where to put the tablespace on its filesystem, so as to allow
> it to live outside of the data directory without direct manipulation
> of the filesystem?

WAL records associated to CREATE TABLESPACE (xl_tblspc_create_rec)
register the location where a tablespace is located.  The location of a
tablespace is not saved in the system catalogs, which offers flexibility
in the way the symlink from pg_tblspc can be handled.  This is where the
tablespace path remapping of pg_basebackup becomes handy, because you
can repurpose paths easily when taking a base backup, but this forces
you to create tablespaces first, and then create standbys.  We have also
a set of existing problems:
1) If a primary and its standby are on the same server and you issue a
CREATE TABLESPACE, then they would try to write to the same paths.
2) How do we design at DDL level a command which allows for specifying
different paths depending on the node where the recovery happens.

You would need in both cases a sort of ability to define a node name, so
as for 1) you append the node name to the path and both primary and
standby can use the same tablespace path, but with different sub-paths.
And for 2), you can enforce a patch name by defining as well a path
associated to a node name so as when xl_tblspc_create_rec records are
replayed at recovery, you know which path to create.  Just designing
that the right way as its own set of complications.

> If this task is a worthwhile endeavor, I would be happy to take it on.
> If not, I am open to other ideas :)

This is part of the difficult, perhaps-not-worth doing impossible
problems.  As a first contribution, you may want something easier.
--
Michael


signature.asc
Description: PGP signature