Re: [HACKERS] replication identifier format
On Mon, Jun 23, 2014 at 11:28 AM, Andres Freund wrote: >> Oh, great. Somehow I missed the fact that that had been addressed. I >> had assumed that we still needed global identifiers in which case I >> think they'd need to be 64+ bits (preferably more like 128). If they >> only need to be locally significant that makes things much better. > > Well, I was just talking about the 'short ids' here and how they are > used in crash recovery/shmem et al. Those indeed don't need to be > coordinated. > If you ever use logical decoding on a system that receives changes from > other systems (cascading replication, multimaster) you'll likely want to > add the *long* form of that identifier to the output in the output > plugin so the downstream nodes can identify the source. How one > specific replication solution deals with coordinating this between > systems is essentially that suite's problem. OK. > The external identifier currently is a 'text' column, so essentially > unlimited. (Well, I just noticed that the table currently doesn't have a > toast table assigned, so it's only a couple kb right now, but ...) OK. I have no clear reason to dislike that. >> Is there any real reason to add a pg_replication_identifier table, or >> should we just let individual replication solutions manage the >> identifiers within their own configuration tables? > > I don't think that'd work. During crash recovery the short/internal IDs > are read from WAL records and need to be unique across *all* > databases. Since there's no way for different replication solutions or > even the same to coordinate this across databases (as there's no way to > add shared relations) it has to be builtin. That makes sense. > It's also useful so we can have stuff like the > 'pg_replication_identifier_progress' view which tells you internal_id, > external_id, remote_lsn, local_lsn. Just showing the internal ID would > imo be bad. OK. >> I guess one >> question is: What happens if there are multiple replication solutions >> in use on a single server? How do they coordinate? > > What's your concern here? You're wondering how they can make sure the > identifiers they create are non-overlapping? Yeah, I was just thinking that might be why you installed a catalog table for this, but now I see that there are several other reasons also. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] replication identifier format
On 2014-06-23 10:45:51 -0400, Robert Haas wrote: > On Mon, Jun 23, 2014 at 10:11 AM, Andres Freund > wrote: > >> > Why? Users and other systems only ever see the external ID. Everything > >> > leaving the system is converted to the external form. The short id > >> > basically is only used in shared memory and in wal records. For both > >> > using longer strings would be problematic. > >> > > >> > In the patch I have the user can actually see them as they're stored in > >> > pg_replication_identifier, but there should never be a need for that. > >> > >> Hmm, so there's no requirement that the short IDs are consistent > >> across different clusters that are replication to each other? > > > > Nope. That seemed to be a hard requirement in the earlier discussions we > > had (~2 years ago). > > Oh, great. Somehow I missed the fact that that had been addressed. I > had assumed that we still needed global identifiers in which case I > think they'd need to be 64+ bits (preferably more like 128). If they > only need to be locally significant that makes things much better. Well, I was just talking about the 'short ids' here and how they are used in crash recovery/shmem et al. Those indeed don't need to be coordinated. If you ever use logical decoding on a system that receives changes from other systems (cascading replication, multimaster) you'll likely want to add the *long* form of that identifier to the output in the output plugin so the downstream nodes can identify the source. How one specific replication solution deals with coordinating this between systems is essentially that suite's problem. The external identifier currently is a 'text' column, so essentially unlimited. (Well, I just noticed that the table currently doesn't have a toast table assigned, so it's only a couple kb right now, but ...) > Is there any real reason to add a pg_replication_identifier table, or > should we just let individual replication solutions manage the > identifiers within their own configuration tables? I don't think that'd work. During crash recovery the short/internal IDs are read from WAL records and need to be unique across *all* databases. Since there's no way for different replication solutions or even the same to coordinate this across databases (as there's no way to add shared relations) it has to be builtin. It's also useful so we can have stuff like the 'pg_replication_identifier_progress' view which tells you internal_id, external_id, remote_lsn, local_lsn. Just showing the internal ID would imo be bad. > I guess one > question is: What happens if there are multiple replication solutions > in use on a single server? How do they coordinate? What's your concern here? You're wondering how they can make sure the identifiers they create are non-overlapping? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] replication identifier format
On Mon, Jun 23, 2014 at 10:11 AM, Andres Freund wrote: >> > Why? Users and other systems only ever see the external ID. Everything >> > leaving the system is converted to the external form. The short id >> > basically is only used in shared memory and in wal records. For both >> > using longer strings would be problematic. >> > >> > In the patch I have the user can actually see them as they're stored in >> > pg_replication_identifier, but there should never be a need for that. >> >> Hmm, so there's no requirement that the short IDs are consistent >> across different clusters that are replication to each other? > > Nope. That seemed to be a hard requirement in the earlier discussions we > had (~2 years ago). Oh, great. Somehow I missed the fact that that had been addressed. I had assumed that we still needed global identifiers in which case I think they'd need to be 64+ bits (preferably more like 128). If they only need to be locally significant that makes things much better. Is there any real reason to add a pg_replication_identifier table, or should we just let individual replication solutions manage the identifiers within their own configuration tables? I guess one question is: What happens if there are multiple replication solutions in use on a single server? How do they coordinate? >> If >> that's the case, that might address my concern, but I'd probably want >> to go back through the latest patch and think about it a bit more. > > I'll send out a new version after I'm finished with the newest atomic > ops patch. Sweet. I'm a little backed up right now, but will look at it when able. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] replication identifier format
On 2014-06-23 10:09:49 -0400, Robert Haas wrote: > On Wed, Jun 18, 2014 at 12:46 PM, Andres Freund > wrote: > > On 2014-06-18 12:36:13 -0400, Robert Haas wrote: > >> > I actually don't think any of the discussions I was involved in had the > >> > externally visible version of replication identifiers limited to 16bits? > >> > If you are referring to my patch, 16bits was just the width of the > >> > *internal* name that should basically never be looked at. User visible > >> > replication identifiers are always identified by an arbitrary string - > >> > whose format is determined by the user of the replication identifier > >> > facility. *BDR* currently stores the system identifer, the database id > >> > and a name in there - but that's nothing core needs to concern itself > >> > with. > >> > >> I don't think you're going to be able to avoid users needing to know > >> about those IDs. The configuration table is going to have to be the > >> same on all nodes, and how are you going to get that set up without > >> those IDs being user-visible? > > > > Why? Users and other systems only ever see the external ID. Everything > > leaving the system is converted to the external form. The short id > > basically is only used in shared memory and in wal records. For both > > using longer strings would be problematic. > > > > In the patch I have the user can actually see them as they're stored in > > pg_replication_identifier, but there should never be a need for that. > > Hmm, so there's no requirement that the short IDs are consistent > across different clusters that are replication to each other? Nope. That seemed to be a hard requirement in the earlier discussions we had (~2 years ago). > If > that's the case, that might address my concern, but I'd probably want > to go back through the latest patch and think about it a bit more. I'll send out a new version after I'm finished with the newest atomic ops patch. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] replication identifier format
On Wed, Jun 18, 2014 at 12:46 PM, Andres Freund wrote: > On 2014-06-18 12:36:13 -0400, Robert Haas wrote: >> > I actually don't think any of the discussions I was involved in had the >> > externally visible version of replication identifiers limited to 16bits? >> > If you are referring to my patch, 16bits was just the width of the >> > *internal* name that should basically never be looked at. User visible >> > replication identifiers are always identified by an arbitrary string - >> > whose format is determined by the user of the replication identifier >> > facility. *BDR* currently stores the system identifer, the database id >> > and a name in there - but that's nothing core needs to concern itself >> > with. >> >> I don't think you're going to be able to avoid users needing to know >> about those IDs. The configuration table is going to have to be the >> same on all nodes, and how are you going to get that set up without >> those IDs being user-visible? > > Why? Users and other systems only ever see the external ID. Everything > leaving the system is converted to the external form. The short id > basically is only used in shared memory and in wal records. For both > using longer strings would be problematic. > > In the patch I have the user can actually see them as they're stored in > pg_replication_identifier, but there should never be a need for that. Hmm, so there's no requirement that the short IDs are consistent across different clusters that are replication to each other? If that's the case, that might address my concern, but I'd probably want to go back through the latest patch and think about it a bit more. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] replication identifier format
On 2014-06-18 12:36:13 -0400, Robert Haas wrote: > > I actually don't think any of the discussions I was involved in had the > > externally visible version of replication identifiers limited to 16bits? > > If you are referring to my patch, 16bits was just the width of the > > *internal* name that should basically never be looked at. User visible > > replication identifiers are always identified by an arbitrary string - > > whose format is determined by the user of the replication identifier > > facility. *BDR* currently stores the system identifer, the database id > > and a name in there - but that's nothing core needs to concern itself > > with. > > I don't think you're going to be able to avoid users needing to know > about those IDs. The configuration table is going to have to be the > same on all nodes, and how are you going to get that set up without > those IDs being user-visible? Why? Users and other systems only ever see the external ID. Everything leaving the system is converted to the external form. The short id basically is only used in shared memory and in wal records. For both using longer strings would be problematic. In the patch I have the user can actually see them as they're stored in pg_replication_identifier, but there should never be a need for that. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers