Re: Avoiding Tablespace path collision for primary and standby

2018-06-20 Thread Craig Ringer
On Thu., 21 Jun. 2018, 04:30 Tom Lane,  wrote:

> Ashwin Agrawal  writes:
> > Okay just bouncing another approach, how about generating UUID for a
> > postgres instance during initdb and pg_basebackup ?
>
> There's no uuid generation code in core postgres, for excellent reasons
> (lack of portability and lack of failure modes are the main objections).
> This is not different in any meaningful way from the proposal to use
> timestamps, except for being more complicated.


A v4 UUID is just 128 random bits and some simple formatting. So I really
don't understand your concerns about UUID generation.

That said, it can already be handled with tablespace maps in pg_basebackup.
And any new scheme would need to happen in pg_basebackup too, because it
must happen before the tablespace are copied and thr replica is first
started.

I don't see a big concern with some pg_basebackup --gen-unique-tablespaces
option or the like.

UUID would be better than timestamp due to the skew issues discussed
upthread. But personally I'd just take a label argument. pg_basebackup
--tablespace-prefix or the like.

For non pg_basebackup uses you have to solve it yourself anyway. Pg doesn't
know if it's just been started as a copy, after all, and it's too late to
move tablespace then even if we'd do such a thing.


Re: Avoiding Tablespace path collision for primary and standby

2018-06-20 Thread Tom Lane
Ashwin Agrawal  writes:
> Okay just bouncing another approach, how about generating UUID for a
> postgres instance during initdb and pg_basebackup ?

There's no uuid generation code in core postgres, for excellent reasons
(lack of portability and lack of failure modes are the main objections).
This is not different in any meaningful way from the proposal to use
timestamps, except for being more complicated.

regards, tom lane



Re: Avoiding Tablespace path collision for primary and standby

2018-06-20 Thread Ashwin Agrawal
On Wed, Jun 20, 2018 at 10:50 AM Andres Freund  wrote:

>
>
> On June 20, 2018 10:31:05 AM PDT, Ashwin Agrawal 
> wrote:
> >On Wed, Jun 20, 2018 at 9:39 AM Bruce Momjian  wrote:
> >
> >> On Fri, May 25, 2018 at 02:17:23PM -0700, Ashwin Agrawal wrote:
> >> >
> >> > On Fri, May 25, 2018 at 7:33 AM, Tom Lane 
> >wrote:
> >> >
> >> > Ashwin Agrawal  writes:
> >> > > Proposing to create directory with timestamp at time of
> >creating
> >> > tablespace
> >> > > and create symbolic link to it instead.
> >> >
> >> > I'm skeptical that this solves your problem.  What happens when
> >the
> >> CREATE
> >> > TABLESPACE command is replicated to the standby with sub-second
> >> delay?
> >> >
> >> >
> >> > I thought timestamps have micro-second precision. Are we expecting
> >> tabelspace
> >> > to be created, wal logged, streamed, and replayed on mirror in
> >> micro-second ?
> >>
> >> I didn't see anyone answer your question above.  We don't expect
> >> micro-second replay, but clock skew, which Tom Lane mention, could
> >make
> >> it appear to be a micro-second replay.
> >>
> >
> >Thanks Bruce for answering. Though I still don't see why clock skew is
> >a
> >problem here. As I think clock skew only happens across machines. On
> >same
> >machine why would it be an issue. Problem is only with same machine,
> >different machines anyways paths don't collide so even if clock skew
> >happens is not a problem. (I understand there may be reservations for
> >putting timestamp in directory path, but clock skew argument is not
> >clear.)
>
> Clock skew happens within machines too. Both because of multi socket
> systems and virtualization systems. Also clock adjustments.
>

Thank You that helps.

Okay just bouncing another approach, how about generating UUID for a
postgres instance during initdb and pg_basebackup ? (unlike
`system_identifier` used in pg_controldata store it in separate independent
file which is excluded in pg_basebackup, instead created by pg_basebackup)
Read only once during startup and used in tablespace path ? (Understand
generating uuid maybe little heavy-lifting for just same node tablespace
path collision, but having unique identifier for each postgres instance
primary or standby maybe useful for long term for other purposes as well)


Re: Avoiding Tablespace path collision for primary and standby

2018-06-20 Thread Andres Freund



On June 20, 2018 10:31:05 AM PDT, Ashwin Agrawal  wrote:
>On Wed, Jun 20, 2018 at 9:39 AM Bruce Momjian  wrote:
>
>> On Fri, May 25, 2018 at 02:17:23PM -0700, Ashwin Agrawal wrote:
>> >
>> > On Fri, May 25, 2018 at 7:33 AM, Tom Lane 
>wrote:
>> >
>> > Ashwin Agrawal  writes:
>> > > Proposing to create directory with timestamp at time of
>creating
>> > tablespace
>> > > and create symbolic link to it instead.
>> >
>> > I'm skeptical that this solves your problem.  What happens when
>the
>> CREATE
>> > TABLESPACE command is replicated to the standby with sub-second
>> delay?
>> >
>> >
>> > I thought timestamps have micro-second precision. Are we expecting
>> tabelspace
>> > to be created, wal logged, streamed, and replayed on mirror in
>> micro-second ?
>>
>> I didn't see anyone answer your question above.  We don't expect
>> micro-second replay, but clock skew, which Tom Lane mention, could
>make
>> it appear to be a micro-second replay.
>>
>
>Thanks Bruce for answering. Though I still don't see why clock skew is
>a
>problem here. As I think clock skew only happens across machines. On
>same
>machine why would it be an issue. Problem is only with same machine,
>different machines anyways paths don't collide so even if clock skew
>happens is not a problem. (I understand there may be reservations for
>putting timestamp in directory path, but clock skew argument is not
>clear.)

Clock skew happens within machines too. Both because of multi socket systems 
and virtualization systems. Also clock adjustments.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: Avoiding Tablespace path collision for primary and standby

2018-06-20 Thread Ashwin Agrawal
On Wed, Jun 20, 2018 at 9:39 AM Bruce Momjian  wrote:

> On Fri, May 25, 2018 at 02:17:23PM -0700, Ashwin Agrawal wrote:
> >
> > On Fri, May 25, 2018 at 7:33 AM, Tom Lane  wrote:
> >
> > Ashwin Agrawal  writes:
> > > Proposing to create directory with timestamp at time of creating
> > tablespace
> > > and create symbolic link to it instead.
> >
> > I'm skeptical that this solves your problem.  What happens when the
> CREATE
> > TABLESPACE command is replicated to the standby with sub-second
> delay?
> >
> >
> > I thought timestamps have micro-second precision. Are we expecting
> tabelspace
> > to be created, wal logged, streamed, and replayed on mirror in
> micro-second ?
>
> I didn't see anyone answer your question above.  We don't expect
> micro-second replay, but clock skew, which Tom Lane mention, could make
> it appear to be a micro-second replay.
>

Thanks Bruce for answering. Though I still don't see why clock skew is a
problem here. As I think clock skew only happens across machines. On same
machine why would it be an issue. Problem is only with same machine,
different machines anyways paths don't collide so even if clock skew
happens is not a problem. (I understand there may be reservations for
putting timestamp in directory path, but clock skew argument is not clear.)


Re: Avoiding Tablespace path collision for primary and standby

2018-06-20 Thread Andres Freund
Hi,

Hi,

On 2018-05-26 10:08:57 -0400, Tom Lane wrote:
> Not sure about the relative-path idea.  Seems like that would create
> a huge temptation to put tablespaces inside the data directory, which
> would force us to deal with that can of worms.

It doesn't seem impossible to normalize the path, and then check for that.


> Also, to the extent that people use tablespaces for what they're
> actually meant to be used for (ie, putting some stuff into a different
> filesystem), I can't see a relative path being helpful.  Admins don't
> go mounting disks at random places in the filesystem tree.

I'm not convinced by that argument. It can certainly make sense to mount
several filesystems relative to a subdirectory. And then there's the
case we're talking about, where you have primary/standby on a single
system. It's not like we'd *force* relative tablespaces...

Greetings,

Andres Freund



Re: Avoiding Tablespace path collision for primary and standby

2018-06-20 Thread Bruce Momjian
On Fri, May 25, 2018 at 02:17:23PM -0700, Ashwin Agrawal wrote:
> 
> On Fri, May 25, 2018 at 7:33 AM, Tom Lane  wrote:
> 
> Ashwin Agrawal  writes:
> > Proposing to create directory with timestamp at time of creating
> tablespace
> > and create symbolic link to it instead.
> 
> I'm skeptical that this solves your problem.  What happens when the CREATE
> TABLESPACE command is replicated to the standby with sub-second delay?
> 
> 
> I thought timestamps have micro-second precision. Are we expecting tabelspace
> to be created, wal logged, streamed, and replayed on mirror in micro-second ?

I didn't see anyone answer your question above.  We don't expect
micro-second replay, but clock skew, which Tom Lane mention, could make
it appear to be a micro-second replay.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Avoiding Tablespace path collision for primary and standby

2018-05-29 Thread Ashwin Agrawal
On Sat, May 26, 2018 at 7:08 AM, Tom Lane  wrote:

> Thomas Munro  writes:
> > I also wondered about this when trying to figure out how to write a
> > TAP test for recovery testing with tablespaces, for my undo proposal.
> > I was starting to wonder about either allowing relative paths or
> > supporting some kind of variable in the tablespace path that could
> > then be set differently in each cluster's .conf.
>
> Yeah, the configuration-variable solution had occurred to me too.
> I'm not sure how convenient it'd be in practice, but perhaps it
> would be workable.
>

Configuration variable becomes tricky to play with for this purpose,
specially given configuration files get copied by pg_basebackup.
Will the configuration-variable be set by some option to pg_basebackup, as
even during pg_basebackup will need to use the same configuration-variable.
(I know basebackup provides way to specify different path for existing
tablespaces but seems will need to still use same static string for ALL the
tablespaces path, given how the linking and directory creation happens
today)

Also, not sure how configuration-variable will be used to solve the
problem, like changing its value shouldn't block me from accessing the
previously created tablespaces and all.

Seems as the conflict happens naturally by design, if it can be resolved
someway automatically would be better than a config option based solution.


Re: Avoiding Tablespace path collision for primary and standby

2018-05-26 Thread Tom Lane
Thomas Munro  writes:
> I also wondered about this when trying to figure out how to write a
> TAP test for recovery testing with tablespaces, for my undo proposal.
> I was starting to wonder about either allowing relative paths or
> supporting some kind of variable in the tablespace path that could
> then be set differently in each cluster's .conf.

Yeah, the configuration-variable solution had occurred to me too.
I'm not sure how convenient it'd be in practice, but perhaps it
would be workable.

Not sure about the relative-path idea.  Seems like that would create
a huge temptation to put tablespaces inside the data directory, which
would force us to deal with that can of worms.  Also, to the extent
that people use tablespaces for what they're actually meant to be
used for (ie, putting some stuff into a different filesystem), I can't
see a relative path being helpful.  Admins don't go mounting disks
at random places in the filesystem tree.

regards, tom lane



Re: Avoiding Tablespace path collision for primary and standby

2018-05-25 Thread Michael Paquier
On Sat, May 26, 2018 at 02:10:52PM +1200, Thomas Munro wrote:
> I also wondered about this when trying to figure out how to write a
> TAP test for recovery testing with tablespaces, for my undo proposal.
> I was starting to wonder about either allowing relative paths or
> supporting some kind of variable in the tablespace path that could
> then be set differently in each cluster's .conf.

As for now for tablespace creation with multiple nodes on the same host,
you really come to just using the tablespace map within pg_basebackup..
I think that this is a difficult problem as one may want to not use the
same partition space for both primary and standby, hence you would need
to associate a tablespace path with one node using for example a node
name set in postgresql.conf, while extending CREATE TABLESPACE to
support this grammar and register the paths for each nodes in WAL
records.  Using a path that variates depending on the time is not a good
idea in my opinion.
--
Michael


signature.asc
Description: PGP signature


Re: Avoiding Tablespace path collision for primary and standby

2018-05-25 Thread Thomas Munro
On Sat, May 26, 2018 at 9:17 AM, Ashwin Agrawal  wrote:
> To generate uniqueness for the path between primary and standby need to use
> something which is not represented within database. So will be random to
> some degree. Like one can use PORT number of postmaster. As only need to
> generate unique path while creating link during CREATE TABLESPACE.

I also wondered about this when trying to figure out how to write a
TAP test for recovery testing with tablespaces, for my undo proposal.
I was starting to wonder about either allowing relative paths or
supporting some kind of variable in the tablespace path that could
then be set differently in each cluster's .conf.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Avoiding Tablespace path collision for primary and standby

2018-05-25 Thread Ashwin Agrawal
On Fri, May 25, 2018 at 7:33 AM, Tom Lane  wrote:

> Ashwin Agrawal  writes:
> > Proposing to create directory with timestamp at time of creating
> tablespace
> > and create symbolic link to it instead.
>
> I'm skeptical that this solves your problem.  What happens when the CREATE
> TABLESPACE command is replicated to the standby with sub-second delay?
>

I thought timestamps have micro-second precision. Are we expecting
tabelspace to be created, wal logged, streamed, and replayed on mirror in
micro-second ?

Clock skew is another reason to doubt that timestamp == unique identifier,
> which is essentially what you're assuming here.
>

On same machine is what we care about generating uniqueness. Different
machines the problem doesn't exist anyways, so doesn't matter clock is
skewed or not.


>
> Even if we fixed that, the general idea of including a quasi-random
> component in the directory name seems like it would have a lot of
> unpleasant side effects in terms of reproduceability, testability, etc.
>

Hmm.. aren't to some degree we currently as well create directories/files
with quasi-random numbers like tablespace-oids, database-oids and
relfilenodes, etc..

To generate uniqueness for the path between primary and standby need to use
something which is not represented within database. So will be random to
some degree. Like one can use PORT number of postmaster. As only need to
generate unique path while creating link during CREATE TABLESPACE.


Re: Avoiding Tablespace path collision for primary and standby

2018-05-25 Thread Tom Lane
Ashwin Agrawal  writes:
> Proposing to create directory with timestamp at time of creating tablespace
> and create symbolic link to it instead.

I'm skeptical that this solves your problem.  What happens when the CREATE
TABLESPACE command is replicated to the standby with sub-second delay?
Clock skew is another reason to doubt that timestamp == unique identifier,
which is essentially what you're assuming here.

Even if we fixed that, the general idea of including a quasi-random
component in the directory name seems like it would have a lot of
unpleasant side effects in terms of reproduceability, testability, etc.

regards, tom lane



Avoiding Tablespace path collision for primary and standby

2018-05-25 Thread Ashwin Agrawal
Currently, if primary and standby are setup on same machine (which is
always the case for development), CREATE TABLESPACE xyz LOCATION '/abc',
primary and mirror both write to "/abc/TABLESPACE_VERSION_DIRECTORY"
directory. Collision is certainly not an issue in any production deployment
but seems still solving the same for development is extremely helpful.

Proposing to create directory with timestamp at time of creating tablespace
and create symbolic link to it instead. So, would be something like
"/abc/PG_/TABLESPACE_VERSION_DIRECTORY". This helps avoid
collision of primary and standby as timestamps would differ between primary
creating the tablespace and mirror replaying the record for the same.

Ideally other advantage of this scheme is creating that additional
TABLESPACE_VERSION_DIRECTORY inside can also be eliminated as even during
pg_upgrade the paths will not collide. So, it helps to avoid constructing
this additional string part at multiple places in code for tablespace
access.

Since this is on-disk change yes may have impact to existing tools.

Attaching the patch to showcase the proposed. Tested by creating tablespace
with primary and standby on same machine, also tablespace test passes.


adding_timestamp_to_tablespace_path
Description: Binary data