Re: TODO item: WAL replay of CREATE TABLESPACE with differing directory structure
On Sun, Feb 18, 2018 at 04:43:38PM -0800, Patrick Krecker wrote: > On Tue, Feb 13, 2018 at 8:24 PM, Michael Paquierwrote: >> Let's be clear here. There is no hard restriction with tablespace paths >> within the data directory, though you should not do that, and you get a >> nice warning when trying to do so with CREATE TABLESPACE (see 33cb8ff6). >> This also causes pg_basebackup to fail. It is also bad design to create >> tablespaces within the data directory as those are aimed at making hot >> paths work on different partitions with different I/O properties. > > Sorry, my language was imprecise here. What I meant is that the > pg_tablespace directory contains no symlinks when a tablespace > creation is streamed to a replica, i.e. the data files reside within > pg_tablespace on the replica. There is nothing preventing you to do so I think, and base backups should work properly as basebackup.c just loops through the paths of the tablespace links, which leads to errors if the links are within the data folder itself. I don't think that we would want a mode where CREATE TABLESPACE does not create a link at recovery as well, be it controlled by a system-wide GUC or a switch at DDL level. That's more likely to trap users by putting hot data on the same partition as the data folder. > Thank you for the response. I would suggest that we link to it from > the wiki so as to provide clarification to future readers of the todo > list. Good idea! I have just updated the wiki page with a link to my previous post. -- Michael signature.asc Description: PGP signature
Re: TODO item: WAL replay of CREATE TABLESPACE with differing directory structure
On Tue, Feb 13, 2018 at 8:24 PM, Michael Paquierwrote: > On Tue, Feb 13, 2018 at 01:44:34PM -0800, Patrick Krecker wrote: >> I am searching for a way to make a contribution to Postgres and I came >> across this TODO item (I realize there has been some controversy >> around the TODO list [1], and I hope that my use of it doesn't spark >> another discussion about removing it altogether): > > Well, it will point out again that TODO items are hard, complicated and > mostly impossible projects. > >> "Allow WAL replay of CREATE TABLESPACE to work when the directory >> structure on the recovery computer is different from the original" >> >> Currently it looks like tablespaces have to live inside the data >> directory on the replica, notwithstanding administrator intervention >> by manipulating the tablespace directory with symlinks after (or even >> before?) it has been created via replay. > > Let's be clear here. There is no hard restriction with tablespace paths > within the data directory, though you should not do that, and you get a > nice warning when trying to do so with CREATE TABLESPACE (see 33cb8ff6). > This also causes pg_basebackup to fail. It is also bad design to create > tablespaces within the data directory as those are aimed at making hot > paths work on different partitions with different I/O properties. Sorry, my language was imprecise here. What I meant is that the pg_tablespace directory contains no symlinks when a tablespace creation is streamed to a replica, i.e. the data files reside within pg_tablespace on the replica. >> Is the idea behind this task to allow the master to instruct the >> replica where to put the tablespace on its filesystem, so as to allow >> it to live outside of the data directory without direct manipulation >> of the filesystem? > > WAL records associated to CREATE TABLESPACE (xl_tblspc_create_rec) > register the location where a tablespace is located. The location of a > tablespace is not saved in the system catalogs, which offers flexibility > in the way the symlink from pg_tblspc can be handled. This is where the > tablespace path remapping of pg_basebackup becomes handy, because you > can repurpose paths easily when taking a base backup, but this forces > you to create tablespaces first, and then create standbys. We have also > a set of existing problems: > 1) If a primary and its standby are on the same server and you issue a > CREATE TABLESPACE, then they would try to write to the same paths. > 2) How do we design at DDL level a command which allows for specifying > different paths depending on the node where the recovery happens. > > You would need in both cases a sort of ability to define a node name, so > as for 1) you append the node name to the path and both primary and > standby can use the same tablespace path, but with different sub-paths. > And for 2), you can enforce a patch name by defining as well a path > associated to a node name so as when xl_tblspc_create_rec records are > replayed at recovery, you know which path to create. Just designing > that the right way as its own set of complications. > >> If this task is a worthwhile endeavor, I would be happy to take it on. >> If not, I am open to other ideas :) > > This is part of the difficult, perhaps-not-worth doing impossible > problems. As a first contribution, you may want something easier. Thank you for the response. I would suggest that we link to it from the wiki so as to provide clarification to future readers of the todo list. > -- > Michael
Re: TODO item: WAL replay of CREATE TABLESPACE with differing directory structure
On Tue, Feb 13, 2018 at 01:44:34PM -0800, Patrick Krecker wrote: > I am searching for a way to make a contribution to Postgres and I came > across this TODO item (I realize there has been some controversy > around the TODO list [1], and I hope that my use of it doesn't spark > another discussion about removing it altogether): Well, it will point out again that TODO items are hard, complicated and mostly impossible projects. > "Allow WAL replay of CREATE TABLESPACE to work when the directory > structure on the recovery computer is different from the original" > > Currently it looks like tablespaces have to live inside the data > directory on the replica, notwithstanding administrator intervention > by manipulating the tablespace directory with symlinks after (or even > before?) it has been created via replay. Let's be clear here. There is no hard restriction with tablespace paths within the data directory, though you should not do that, and you get a nice warning when trying to do so with CREATE TABLESPACE (see 33cb8ff6). This also causes pg_basebackup to fail. It is also bad design to create tablespaces within the data directory as those are aimed at making hot paths work on different partitions with different I/O properties. > Is the idea behind this task to allow the master to instruct the > replica where to put the tablespace on its filesystem, so as to allow > it to live outside of the data directory without direct manipulation > of the filesystem? WAL records associated to CREATE TABLESPACE (xl_tblspc_create_rec) register the location where a tablespace is located. The location of a tablespace is not saved in the system catalogs, which offers flexibility in the way the symlink from pg_tblspc can be handled. This is where the tablespace path remapping of pg_basebackup becomes handy, because you can repurpose paths easily when taking a base backup, but this forces you to create tablespaces first, and then create standbys. We have also a set of existing problems: 1) If a primary and its standby are on the same server and you issue a CREATE TABLESPACE, then they would try to write to the same paths. 2) How do we design at DDL level a command which allows for specifying different paths depending on the node where the recovery happens. You would need in both cases a sort of ability to define a node name, so as for 1) you append the node name to the path and both primary and standby can use the same tablespace path, but with different sub-paths. And for 2), you can enforce a patch name by defining as well a path associated to a node name so as when xl_tblspc_create_rec records are replayed at recovery, you know which path to create. Just designing that the right way as its own set of complications. > If this task is a worthwhile endeavor, I would be happy to take it on. > If not, I am open to other ideas :) This is part of the difficult, perhaps-not-worth doing impossible problems. As a first contribution, you may want something easier. -- Michael signature.asc Description: PGP signature