Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On 2020-Jun-24, Andres Freund wrote: > As I said before, I've utilized being able to do both over a single > connection (among others to initialize a logical replica using a base > backup). And I've seen at least one other codebase (developed without my > input) doing so. I really don't understand how you just dismiss this > without any sort of actual argument. Yes, those uses can be fixed to > reconnect with a different replication parameter, but that's code that > needs to be adjusted and it requires adjustments to pg_hba.conf etc. > > And obviously you'd lock out older versions of jdbc, and possibly other > drivers. Well, I had understood that you were talking from a hypothetical position, not that you were already using the thing that way. After these arguments, I agree to leave things alone, and nobody else seems to be arguing in that direction, so I'll mark the open item as closed. Thanks, -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On Wed, Jun 24, 2020 at 3:41 PM Alvaro Herrera wrote: > People (specifically the jdbc driver) *are* using this feature in this > way, but they didn't realize they were doing it. It was an accident and > they didn't notice. But you don't know that that's true of everyone using this feature, and even if it were, so what? Breaking a feature that someone didn't know they were using is just as much of a break as breaking a feature someone DID know they were using. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
Hi, On 2020-06-24 15:41:14 -0400, Alvaro Herrera wrote: > On 2020-Jun-24, Robert Haas wrote: > > > So really I think this turns on #1: is it plausible > > that people are using this feature, however inadvertent it may be, and > > is it potentially useful? I don't see that anybody's made an argument > > against either of those things. Unless someone can do so, I think we > > shouldn't disable this. > > People (specifically the jdbc driver) *are* using this feature in this > way, but they didn't realize they were doing it. It was an accident and > they didn't notice. As I said before, I've utilized being able to do both over a single connection (among others to initialize a logical replica using a base backup). And I've seen at least one other codebase (developed without my input) doing so. I really don't understand how you just dismiss this without any sort of actual argument. Yes, those uses can be fixed to reconnect with a different replication parameter, but that's code that needs to be adjusted and it requires adjustments to pg_hba.conf etc. And obviously you'd lock out older versions of jdbc, and possibly other drivers. Obviously we should allow more granular permissions here, I don't think anybody is arguing against that. Greetings, Andres Freund
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On Wed, 24 Jun 2020 at 15:41, Alvaro Herrera wrote: > On 2020-Jun-24, Robert Haas wrote: > > > So really I think this turns on #1: is it plausible > > that people are using this feature, however inadvertent it may be, and > > is it potentially useful? I don't see that anybody's made an argument > > against either of those things. Unless someone can do so, I think we > > shouldn't disable this. > > People (specifically the jdbc driver) *are* using this feature in this > way, but they didn't realize they were doing it. It was an accident and > they didn't notice. > > Not sure we are using it as much as we accidently did it that way. It would be trivial to fix. That said I think we should fix the security hole this opens and leave the functionality. Dave
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On 2020-Jun-24, Robert Haas wrote: > So really I think this turns on #1: is it plausible > that people are using this feature, however inadvertent it may be, and > is it potentially useful? I don't see that anybody's made an argument > against either of those things. Unless someone can do so, I think we > shouldn't disable this. People (specifically the jdbc driver) *are* using this feature in this way, but they didn't realize they were doing it. It was an accident and they didn't notice. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On Wed, Jun 24, 2020 at 1:06 PM Alvaro Herrera wrote: > On 2020-Jun-24, Stephen Frost wrote: > > Doesn't mean it makes sense or that we should be supporting that. What > > we should have is a way to allow administrators to configure a system > > for exactly what they want to allow, and it doesn't seem like we're > > doing that today and therefore we should fix it. This isn't the only > > area we have that issue in. > > The way to do that, for the case under discussion, is to reject using a > logical replication connection for physical replication commands. Reading over this discussion, I see basically three arguments: 1. Andres argues that being able to execute physical replication commands from the same connection as SQL queries is useful, and that people may be relying on it, and that we shouldn't break it without need. 2. Fujii Masao argues that the current situation makes it impossible to write a pg_hba.conf rule that disallows all physical replication connections, because people could get around it by using a logical replication connection for physical replication. 3. Various people argue that it's only accidental that physical replication on a replication=database connection ever worked at all, and therefore we ought to block it. I find argument #1 most convincing, #2 less convincing, and #3 least convincing. In my view, the problem with argument #3 is that just because some feature combination was unintentional doesn't mean it's unuseful or unused. As for #2, suppose someone were to propose a design for logical replication that allowed it to take place without a database connection, so that it could be done with just a regular replication connection. Such a feature would create the same problem Fujii Masao mentions here, but it seems inconceivable that we would for that reason reject it; we make decisions about features based on their usefulness, not their knock-on effects on pg_hba.conf rules. We can always add new kinds of access control restrictions if they are needed; that is a better approach than removing features so that the existing pg_hba.conf facilities can be used to accomplish some particular goal. So really I think this turns on #1: is it plausible that people are using this feature, however inadvertent it may be, and is it potentially useful? I don't see that anybody's made an argument against either of those things. Unless someone can do so, I think we shouldn't disable this. That having been said, I think that the fact that you can execute SQL queries in replication=database connections is horrifying. I really hate that feature. I think it's a bad design, and a bad implementation, and a recipe for tons of bugs. But, blocking physical replication commands on such connections isn't going to solve any of that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On 2020-Jun-24, Stephen Frost wrote: > Doesn't mean it makes sense or that we should be supporting that. What > we should have is a way to allow administrators to configure a system > for exactly what they want to allow, and it doesn't seem like we're > doing that today and therefore we should fix it. This isn't the only > area we have that issue in. The way to do that, for the case under discussion, is to reject using a logical replication connection for physical replication commands. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
Greetings, * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > On 2020-Jun-24, Kyotaro Horiguchi wrote: > > > In logical replication, a replication role is intended to be > > accessible only to the GRANTed databases. On the other hand the same > > role can create a dead copy of the whole cluster, including > > non-granted databases. > > In other words -- essentially, if you grant replication access to a role > only to a specific database, they can steal the whole cluster. > > I don't see what's so great about that, but apparently people like it. Sure, people who aren't in charge of security I'm sure like the ease of use. Doesn't mean it makes sense or that we should be supporting that. What we should have is a way to allow administrators to configure a system for exactly what they want to allow, and it doesn't seem like we're doing that today and therefore we should fix it. This isn't the only area we have that issue in. Thanks, Stephen signature.asc Description: PGP signature
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On 2020-Jun-24, Kyotaro Horiguchi wrote: > In logical replication, a replication role is intended to be > accessible only to the GRANTed databases. On the other hand the same > role can create a dead copy of the whole cluster, including > non-granted databases. In other words -- essentially, if you grant replication access to a role only to a specific database, they can steal the whole cluster. I don't see what's so great about that, but apparently people like it. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On 2020/06/24 11:56, Kyotaro Horiguchi wrote: At Tue, 23 Jun 2020 10:51:40 +0900, Michael Paquier wrote in On Sun, Jun 21, 2020 at 01:02:34PM -0700, Andres Freund wrote: I still maintain that adding restrictions here is a bad idea. Even disregarding the discussion of running normal queries interspersed, it's useful to be able to both request WAL and receive logical changes over the same connection. E.g. for creating a logical replica by first doing a physical base backup (vastly faster), or fetching WAL for decoding large transactions onto a standby. And I just don't see any reasons to disallow it. There's basically no reduction in complexity by doing so. Yeah, I still stand by the same opinion here to do nothing. I suspect that we have good chances to annoy people and some cases we are overlooking here, that used to work. In logical replication, a replication role is intended to be accessible only to the GRANTed databases. On the other hand the same role can create a dead copy of the whole cluster, including non-granted databases. It seems like a sieve missing a mesh screen. Personally I'd like to disallow physical replication commands when I explicitly reject physical replication connection (i.e., set "host replication user x.x.x.x/x reject") in pg_hba.conf, whether on physical or logical replication connection. I agree that that doesn't harm as far as roles are strictly managed so I don't insist so strongly on inhibiting the behavior. However, the documentation at least needs amendment. +1 Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
At Tue, 23 Jun 2020 10:51:40 +0900, Michael Paquier wrote in > On Sun, Jun 21, 2020 at 01:02:34PM -0700, Andres Freund wrote: > > I still maintain that adding restrictions here is a bad idea. Even > > disregarding the discussion of running normal queries interspersed, it's > > useful to be able to both request WAL and receive logical changes over > > the same connection. E.g. for creating a logical replica by first doing > > a physical base backup (vastly faster), or fetching WAL for decoding > > large transactions onto a standby. > > > > And I just don't see any reasons to disallow it. There's basically no > > reduction in complexity by doing so. > > Yeah, I still stand by the same opinion here to do nothing. I suspect > that we have good chances to annoy people and some cases we are > overlooking here, that used to work. In logical replication, a replication role is intended to be accessible only to the GRANTed databases. On the other hand the same role can create a dead copy of the whole cluster, including non-granted databases. It seems like a sieve missing a mesh screen. I agree that that doesn't harm as far as roles are strictly managed so I don't insist so strongly on inhibiting the behavior. However, the documentation at least needs amendment. https://www.postgresql.org/docs/13/protocol-replication.html To initiate streaming replication, the frontend sends the replication parameter in the startup message. A Boolean value of true (or on, yes, 1) tells the backend to go into physical replication walsender mode, wherein a small set of replication commands, shown below, can be issued instead of SQL statements. Passing database as the value for the replication parameter instructs the backend to go into logical replication walsender mode, connecting to the database specified in the dbname parameter. In logical replication walsender mode, the replication commands shown below as well as normal SQL commands can be issued. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On Sun, Jun 21, 2020 at 01:02:34PM -0700, Andres Freund wrote: > I still maintain that adding restrictions here is a bad idea. Even > disregarding the discussion of running normal queries interspersed, it's > useful to be able to both request WAL and receive logical changes over > the same connection. E.g. for creating a logical replica by first doing > a physical base backup (vastly faster), or fetching WAL for decoding > large transactions onto a standby. > > And I just don't see any reasons to disallow it. There's basically no > reduction in complexity by doing so. Yeah, I still stand by the same opinion here to do nothing. I suspect that we have good chances to annoy people and some cases we are overlooking here, that used to work. -- Michael signature.asc Description: PGP signature
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
Hi, On 2020-06-21 13:45:36 -0400, Jonathan S. Katz wrote: > The PG13 RMT had a discussion about this thread, and while the initial > crash has been fixed, we decided to re-open the Open Item around whether > we should allow physical replication to be initiated in a logical > replication session. Since this is a long-time issue, this doesn't quite seem like an issue for the RMT? > We anticipate a resolution for PG13, whether it is explicitly > disallowing physical replication from occurring on a logical replication > slot, maintaining the status quo, or something else such that there is > consensus on the approach. s/logical replication slot/logical replication connection/? I still maintain that adding restrictions here is a bad idea. Even disregarding the discussion of running normal queries interspersed, it's useful to be able to both request WAL and receive logical changes over the same connection. E.g. for creating a logical replica by first doing a physical base backup (vastly faster), or fetching WAL for decoding large transactions onto a standby. And I just don't see any reasons to disallow it. There's basically no reduction in complexity by doing so. Greetings, Andres Freund
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
Hi, On 6/5/20 11:51 AM, Alvaro Herrera wrote: > On 2020-Jun-05, Dave Cramer wrote: > >> On Thu, 4 Jun 2020 at 19:46, Alvaro Herrera >> wrote: > >>> Ouch ... so they made IDENT in the replication grammar be a trigger to >>> enter the regular grammar. Crazy. No way to put those worms back in >>> the tin now, I guess. >> >> Is that documented ? > > I don't think it is. > >>> It is still my opinion that we should prohibit a logical replication >>> connection from being used to do physical replication. Horiguchi-san, >>> Sawada-san and Masao-san are all of the same opinion. Dave Cramer (of >>> the JDBC team) is not opposed to the change -- he says they're just >>> using it because they didn't realize they should be doing differently. >> >> I think my exact words were >> >> "I don't see this is a valid reason to keep doing something. If it is >> broken then fix it. >> Clients can deal with the change." >> >> in response to: >> >>> Well, I don't really think that we can just break a behavior that >>> exists since 9.4 as you could break applications relying on the >>> existing behavior, and that's also the point of Vladimir upthread. >> >> Which is different than not being opposed to the change. I don't see this >> as broken, and it's quite possible that some of our users are using >> it. > > Apologies for misinterpreting. > >> It certainly needs to be documented > > I'd rather not. The PG13 RMT had a discussion about this thread, and while the initial crash has been fixed, we decided to re-open the Open Item around whether we should allow physical replication to be initiated in a logical replication session. We anticipate a resolution for PG13, whether it is explicitly disallowing physical replication from occurring on a logical replication slot, maintaining the status quo, or something else such that there is consensus on the approach. Thanks, Jonathan signature.asc Description: OpenPGP digital signature
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On Thu, Jun 04, 2020 at 11:07:29AM +0900, Michael Paquier wrote: > Are there any objections in fixing the issue first then? As far as I > can see there is no objection to this part, like here: > https://www.postgresql.org/message-id/20200603214448.GA901@alvherre.pgsql Hearing nothing, I have applied this part and fixed the crash to take care of the open item. -- Michael signature.asc Description: PGP signature
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On 2020-Jun-05, Dave Cramer wrote: > On Thu, 4 Jun 2020 at 19:46, Alvaro Herrera > wrote: > > Ouch ... so they made IDENT in the replication grammar be a trigger to > > enter the regular grammar. Crazy. No way to put those worms back in > > the tin now, I guess. > > Is that documented ? I don't think it is. > > It is still my opinion that we should prohibit a logical replication > > connection from being used to do physical replication. Horiguchi-san, > > Sawada-san and Masao-san are all of the same opinion. Dave Cramer (of > > the JDBC team) is not opposed to the change -- he says they're just > > using it because they didn't realize they should be doing differently. > > I think my exact words were > > "I don't see this is a valid reason to keep doing something. If it is > broken then fix it. > Clients can deal with the change." > > in response to: > > > Well, I don't really think that we can just break a behavior that > > exists since 9.4 as you could break applications relying on the > > existing behavior, and that's also the point of Vladimir upthread. > > Which is different than not being opposed to the change. I don't see this > as broken, and it's quite possible that some of our users are using > it. Apologies for misinterpreting. > It certainly needs to be documented I'd rather not. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On Thu, 4 Jun 2020 at 19:46, Alvaro Herrera wrote: > On 2020-Jun-04, Andres Freund wrote: > > > postgres[52656][1]=# SELECT 1; > > ┌──┐ > > │ ?column? │ > > ├──┤ > > │1 │ > > └──┘ > > (1 row) > > > > > > I am very much not in love with the way that was implemented, but it's > > there, and it's used as far as I know (cf tablesync.c). > > Ouch ... so they made IDENT in the replication grammar be a trigger to > enter the regular grammar. Crazy. No way to put those worms back in > the tin now, I guess. > Is that documented ? > > It is still my opinion that we should prohibit a logical replication > connection from being used to do physical replication. Horiguchi-san, > Sawada-san and Masao-san are all of the same opinion. Dave Cramer (of > the JDBC team) is not opposed to the change -- he says they're just > using it because they didn't realize they should be doing differently. I think my exact words were "I don't see this is a valid reason to keep doing something. If it is broken then fix it. Clients can deal with the change." in response to: Well, I don't really think that we can just break a behavior that > exists since 9.4 as you could break applications relying on the > existing behavior, and that's also the point of Vladimir upthread. > Which is different than not being opposed to the change. I don't see this as broken, and it's quite possible that some of our users are using it. It certainly needs to be documented Dave
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On 2020-Jun-04, Andres Freund wrote: > postgres[52656][1]=# SELECT 1; > ┌──┐ > │ ?column? │ > ├──┤ > │1 │ > └──┘ > (1 row) > > > I am very much not in love with the way that was implemented, but it's > there, and it's used as far as I know (cf tablesync.c). Ouch ... so they made IDENT in the replication grammar be a trigger to enter the regular grammar. Crazy. No way to put those worms back in the tin now, I guess. It is still my opinion that we should prohibit a logical replication connection from being used to do physical replication. Horiguchi-san, Sawada-san and Masao-san are all of the same opinion. Dave Cramer (of the JDBC team) is not opposed to the change -- he says they're just using it because they didn't realize they should be doing differently. Both Michael P. and you are saying we shouldn't break it because it works today, but there isn't a real use-case for it. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
Hi, On 2020-06-04 16:44:53 -0400, Alvaro Herrera wrote: > A logical replication connection cannot run SQL anyway, can it? You can: andres@awork3:~/src/postgresql$ psql 'replication=database' postgres[52656][1]=# IDENTIFY_SYSTEM; ┌─┬──┬┬──┐ │ systemid │ timeline │ xlogpos │ dbname │ ├─┼──┼┼──┤ │ 6821634567571961151 │1 │ 1/D256EC40 │ postgres │ └─┴──┴┴──┘ (1 row) postgres[52656][1]=# SELECT 1; ┌──┐ │ ?column? │ ├──┤ │1 │ └──┘ (1 row) I am very much not in love with the way that was implemented, but it's there, and it's used as far as I know (cf tablesync.c). Greetings, Andres Freund
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On 2020-Jun-04, Michael Paquier wrote: > On Wed, Jun 03, 2020 at 06:33:11PM -0700, Andres Freund wrote: > >> I don't think having a physical replication connection access catalog > >> data directly is a great idea. We already have gadgets like > >> IDENTIFY_SYSTEM for physical replication that can do that, and if you > >> need particular settings you can use SHOW (commit d1ecd539477). If > >> there was a strong need for even more than that, we can add something to > >> the grammar. > > > > Those special case things are a bad idea, and we shouldn't introduce > > more. It's unrealistic that we can ever make that support everything, > > and since we already have to support the database connected thing, I > > don't see the point. > > Let's continue discussing this part as well. A logical replication connection cannot run SQL anyway, can it? it's limited to the replication grammar. So it's not like you can run arbitrary queries to access catalog data. So even if we do need to access the catalogs, we'd have to add stuff to the replication grammar in order to support that. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On 2020-Jun-03, Andres Freund wrote: > On 2020-06-03 18:27:12 -0400, Alvaro Herrera wrote: > > On 2020-Jun-03, Andres Freund wrote: > > > I don't think we should prohibit this. For one, it'd probably break some > > > clients, without a meaningful need. > > > > There *is* a need, namely to keep complexity down. This is quite > > convoluted, it's got a lot of historical baggage because of the way it > > was implemented, and it's very difficult to understand. The greatest > > motive I see is to make this easier to understand, so that it is easier > > to modify and improve in the future. > > That seems like a possibly convincing argument for not introducing the > capability, but doesn't seem strong enough to remove it. This "capability" has never been introduced. The fact that it's there is just an accident. In fact, it's not a capability, since the feature (physical replication) is invoked differently -- namely, using a physical replication connection. JDBC uses a logical replication connection for it only because they never realized that they were supposed to do differently, because we failed to throw the correct error message in the first place. > > I don't think having a physical replication connection access catalog > > data directly is a great idea. We already have gadgets like > > IDENTIFY_SYSTEM for physical replication that can do that, and if you > > need particular settings you can use SHOW (commit d1ecd539477). If > > there was a strong need for even more than that, we can add something to > > the grammar. > > Those special case things are a bad idea, and we shouldn't introduce > more. What special case things? The replication connection has never been supposed to run SQL. That's why we have SHOW in the replication grammar. > It's unrealistic that we can ever make that support everything, > and since we already have to support the database connected thing, I > don't see the point. A logical replication connection is not supposed to be used for physical replication. That's just going to make more bugs appear. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On Wed, Jun 03, 2020 at 06:33:11PM -0700, Andres Freund wrote: > On 2020-06-03 18:27:12 -0400, Alvaro Herrera wrote: >> There *is* a need, namely to keep complexity down. This is quite >> convoluted, it's got a lot of historical baggage because of the way it >> was implemented, and it's very difficult to understand. The greatest >> motive I see is to make this easier to understand, so that it is easier >> to modify and improve in the future. > > That seems like a possibly convincing argument for not introducing the > capability, but doesn't seem strong enough to remove it. Especially not > if it was just broken as part of effectively a refactoring, as far as I > understand? Are there any objections in fixing the issue first then? As far as I can see there is no objection to this part, like here: https://www.postgresql.org/message-id/20200603214448.GA901@alvherre.pgsql >> I don't think having a physical replication connection access catalog >> data directly is a great idea. We already have gadgets like >> IDENTIFY_SYSTEM for physical replication that can do that, and if you >> need particular settings you can use SHOW (commit d1ecd539477). If >> there was a strong need for even more than that, we can add something to >> the grammar. > > Those special case things are a bad idea, and we shouldn't introduce > more. It's unrealistic that we can ever make that support everything, > and since we already have to support the database connected thing, I > don't see the point. Let's continue discussing this part as well. -- Michael signature.asc Description: PGP signature
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
Hi, On 2020-06-03 18:27:12 -0400, Alvaro Herrera wrote: > On 2020-Jun-03, Andres Freund wrote: > > I don't think we should prohibit this. For one, it'd probably break some > > clients, without a meaningful need. > > There *is* a need, namely to keep complexity down. This is quite > convoluted, it's got a lot of historical baggage because of the way it > was implemented, and it's very difficult to understand. The greatest > motive I see is to make this easier to understand, so that it is easier > to modify and improve in the future. That seems like a possibly convincing argument for not introducing the capability, but doesn't seem strong enough to remove it. Especially not if it was just broken as part of effectively a refactoring, as far as I understand? > > But I think it's also actually quite useful to be able to access > > catalogs before streaming data. You e.g. can look up configuration of > > the primary before streaming WAL. With a second connection that's > > actually harder to do reliably in some cases, because you need to be > > sure that you actually reached the right server (consider a pooler, > > automatic failover etc). > > I don't think having a physical replication connection access catalog > data directly is a great idea. We already have gadgets like > IDENTIFY_SYSTEM for physical replication that can do that, and if you > need particular settings you can use SHOW (commit d1ecd539477). If > there was a strong need for even more than that, we can add something to > the grammar. Those special case things are a bad idea, and we shouldn't introduce more. It's unrealistic that we can ever make that support everything, and since we already have to support the database connected thing, I don't see the point. Greetings, Andres Freund
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On 2020-Jun-02, Michael Paquier wrote: > I can note as well that StartLogicalReplication() moves in this sense > by setting xlogreader to be the one from logical_decoding_ctx once the > decoding context has been created. > > This results in the attached. The extra test from upthread to check > that logical decoding is not allowed in a non-database WAL sender is a > good idea, so I have kept it. I don't particularly disagree with your proposed patch -- in fact, it seems to make things cleaner. It is a little wasteful, but I don't really mind that. It's just some memory, and it's not a significant amount. That said, I would *also* apply Kyotaro's proposed patch to prohibit a physical standby running with a logical slot, if only because that reduces the number of combinations that we need to test and keep our collective heads straight about. Just reject the weird case and have one type of slot for each type of replication. I didn't even think this was at all possible. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On 2020-Jun-03, Andres Freund wrote: > I don't think we should prohibit this. For one, it'd probably break some > clients, without a meaningful need. There *is* a need, namely to keep complexity down. This is quite convoluted, it's got a lot of historical baggage because of the way it was implemented, and it's very difficult to understand. The greatest motive I see is to make this easier to understand, so that it is easier to modify and improve in the future. > But I think it's also actually quite useful to be able to access > catalogs before streaming data. You e.g. can look up configuration of > the primary before streaming WAL. With a second connection that's > actually harder to do reliably in some cases, because you need to be > sure that you actually reached the right server (consider a pooler, > automatic failover etc). I don't think having a physical replication connection access catalog data directly is a great idea. We already have gadgets like IDENTIFY_SYSTEM for physical replication that can do that, and if you need particular settings you can use SHOW (commit d1ecd539477). If there was a strong need for even more than that, we can add something to the grammar. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
Hi, On 2020-06-02 14:23:50 +0900, Fujii Masao wrote: > On 2020/06/02 13:24, Michael Paquier wrote: > > On Fri, May 29, 2020 at 06:09:06PM +0900, Masahiko Sawada wrote: > > > Yes. Conversely, if we start logical replication in a physical > > > replication connection (i.g. replication=true) we got an error before > > > staring replication: > > > > > > ERROR: logical decoding requires a database connection > > > > > > I think we can prevent that SEGV in a similar way. > > > > Still unconvinced as this restriction stands for logical decoding > > requiring a database connection but it is not necessarily true now as > > physical replication has less restrictions than a logical one. > > Could you tell me what the benefit for supporting physical replication on > logical rep connection is? If it's only for "undocumented" > backward-compatibility, IMO it's better to reject such "tricky" set up. > But if there are some use cases for that, I'm ok to support that. I don't think we should prohibit this. For one, it'd probably break some clients, without a meaningful need. But I think it's also actually quite useful to be able to access catalogs before streaming data. You e.g. can look up configuration of the primary before streaming WAL. With a second connection that's actually harder to do reliably in some cases, because you need to be sure that you actually reached the right server (consider a pooler, automatic failover etc). Greetings, Andres Freund
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On 2020/06/03 20:33, Dave Cramer wrote: On Wed, 3 Jun 2020 at 01:19, Michael Paquier mailto:mich...@paquier.xyz>> wrote: On Tue, Jun 02, 2020 at 02:23:50PM +0900, Fujii Masao wrote: > On 2020/06/02 13:24, Michael Paquier wrote: >> Still unconvinced as this restriction stands for logical decoding >> requiring a database connection but it is not necessarily true now as >> physical replication has less restrictions than a logical one. > > Could you tell me what the benefit for supporting physical replication on > logical rep connection is? If it's only for "undocumented" > backward-compatibility, IMO it's better to reject such "tricky" set up. > But if there are some use cases for that, I'm ok to support that. Well, I don't really think that we can just break a behavior that exists since 9.4 as you could break applications relying on the existing behavior, and that's also the point of Vladimir upthread. For the back branches, I agree with you. Even if it's undocumented behavior, basically we should not get rid of it from the back branches unless there is very special reason. For v13, if it has no functional merit, I don't think it's so bad to get rid of that undocumented (and maybe not-fully tested) behavior. If there are applications depending it, I think that they can be updated. I don't see this is a valid reason to keep doing something. If it is broken then fix it. Clients can deal with the change. +1 Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On Wed, 3 Jun 2020 at 01:19, Michael Paquier wrote: > On Tue, Jun 02, 2020 at 02:23:50PM +0900, Fujii Masao wrote: > > On 2020/06/02 13:24, Michael Paquier wrote: > >> Still unconvinced as this restriction stands for logical decoding > >> requiring a database connection but it is not necessarily true now as > >> physical replication has less restrictions than a logical one. > > > > Could you tell me what the benefit for supporting physical replication on > > logical rep connection is? If it's only for "undocumented" > > backward-compatibility, IMO it's better to reject such "tricky" set up. > > But if there are some use cases for that, I'm ok to support that. > > Well, I don't really think that we can just break a behavior that > exists since 9.4 as you could break applications relying on the > existing behavior, and that's also the point of Vladimir upthread. > I don't see this is a valid reason to keep doing something. If it is broken then fix it. Clients can deal with the change. Dave Cramer https://www.postgres.rocks
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On Tue, Jun 02, 2020 at 02:23:50PM +0900, Fujii Masao wrote: > On 2020/06/02 13:24, Michael Paquier wrote: >> Still unconvinced as this restriction stands for logical decoding >> requiring a database connection but it is not necessarily true now as >> physical replication has less restrictions than a logical one. > > Could you tell me what the benefit for supporting physical replication on > logical rep connection is? If it's only for "undocumented" > backward-compatibility, IMO it's better to reject such "tricky" set up. > But if there are some use cases for that, I'm ok to support that. Well, I don't really think that we can just break a behavior that exists since 9.4 as you could break applications relying on the existing behavior, and that's also the point of Vladimir upthread. On top of it, the issue is actually unrelated to if we want to restrict things more or not when starting replication in a WAL sender because the xlogreader creation just needs to happen when starting replication. Now we have a static "fake" one created when a WAL sender process starts, something that it would not need in most cases like answering to a BASE_BACKUP command for example. >> I can note as well that StartLogicalReplication() moves in this sense >> by setting xlogreader to be the one from logical_decoding_ctx once the >> decoding context has been created. >> >> This results in the attached. The extra test from upthread to check >> that logical decoding is not allowed in a non-database WAL sender is a >> good idea, so I have kept it. > > Yes. Also we should add the test to check if physical replication can work > fine even on logical rep connection? I found confusing the use of psql to confirm that it actually works, because we'd just return a protocol-level error in this case with psql bumping on COPY_BOTH and it is not reliable to do just an error message match. Note as well that GetConnection() discards automatically the database name for pg_basebackup and pg_receivewal as well as libpqrcv_connect() for standbys so we cannot use that. Perhaps using psql is better than nothing, but that makes me uncomfortable. -- Michael signature.asc Description: PGP signature
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
At Tue, 2 Jun 2020 13:24:56 +0900, Michael Paquier wrote in > On Fri, May 29, 2020 at 06:09:06PM +0900, Masahiko Sawada wrote: > > Yes. Conversely, if we start logical replication in a physical > > replication connection (i.g. replication=true) we got an error before > > staring replication: > > > > ERROR: logical decoding requires a database connection > > > > I think we can prevent that SEGV in a similar way. > > Still unconvinced as this restriction stands for logical decoding > requiring a database connection but it is not necessarily true now as > physical replication has less restrictions than a logical one. If we deliberately allow physical replication on a database-replication connection, we should revise the documentation that way. On the other hand physical replication has wider access to a database cluster than logical replication. Thus allowing to start physical replication on a logical replication connection could introduce a problem related to privileges. So I think it might be better that physical and logical replication have separate pg_hba lines. Once we explicitly allow physical replication on a logical replication connection in documentation, it would be far harder to change the behavior than now. If we are sure that that cannot be a problem, I don't object the change in documented behavior. > Looking at the code, I think that there is some confusion with the > fake WAL reader used as base reference in InitWalSender() where we > assume that it could only be used in the context of a non-database WAL > sender. However, this initialization happens when the WAL sender > connection is initialized, and what I think this misses is that we > should try to initialize a WAL reader when actually going through a > START_REPLICATION command. At first fake_xlogreader was really a fake one that only provides callback routines, but it should have been changed to a real xlogreader at the time it began to store segment information. In that sense moving to real xlogreader makes sense to me separately from whether we allow physicalrep on logicalrep connections. > I can note as well that StartLogicalReplication() moves in this sense > by setting xlogreader to be the one from logical_decoding_ctx once the > decoding context has been created. > > This results in the attached. The extra test from upthread to check > that logical decoding is not allowed in a non-database WAL sender is a > good idea, so I have kept it. + ereport(ERROR, + (errcode(ERRCODE_OUT_OF_MEMORY), +errmsg("out of memory"))); The same error message is accompanied by a DETAILS in some other places. Don't we need one for this? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On 2020/06/02 13:24, Michael Paquier wrote: On Fri, May 29, 2020 at 06:09:06PM +0900, Masahiko Sawada wrote: Yes. Conversely, if we start logical replication in a physical replication connection (i.g. replication=true) we got an error before staring replication: ERROR: logical decoding requires a database connection I think we can prevent that SEGV in a similar way. Still unconvinced as this restriction stands for logical decoding requiring a database connection but it is not necessarily true now as physical replication has less restrictions than a logical one. Could you tell me what the benefit for supporting physical replication on logical rep connection is? If it's only for "undocumented" backward-compatibility, IMO it's better to reject such "tricky" set up. But if there are some use cases for that, I'm ok to support that. Looking at the code, I think that there is some confusion with the fake WAL reader used as base reference in InitWalSender() where we assume that it could only be used in the context of a non-database WAL sender. However, this initialization happens when the WAL sender connection is initialized, and what I think this misses is that we should try to initialize a WAL reader when actually going through a START_REPLICATION command. I can note as well that StartLogicalReplication() moves in this sense by setting xlogreader to be the one from logical_decoding_ctx once the decoding context has been created. This results in the attached. The extra test from upthread to check that logical decoding is not allowed in a non-database WAL sender is a good idea, so I have kept it. Yes. Also we should add the test to check if physical replication can work fine even on logical rep connection? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On Fri, May 29, 2020 at 06:09:06PM +0900, Masahiko Sawada wrote: > Yes. Conversely, if we start logical replication in a physical > replication connection (i.g. replication=true) we got an error before > staring replication: > > ERROR: logical decoding requires a database connection > > I think we can prevent that SEGV in a similar way. Still unconvinced as this restriction stands for logical decoding requiring a database connection but it is not necessarily true now as physical replication has less restrictions than a logical one. Looking at the code, I think that there is some confusion with the fake WAL reader used as base reference in InitWalSender() where we assume that it could only be used in the context of a non-database WAL sender. However, this initialization happens when the WAL sender connection is initialized, and what I think this misses is that we should try to initialize a WAL reader when actually going through a START_REPLICATION command. I can note as well that StartLogicalReplication() moves in this sense by setting xlogreader to be the one from logical_decoding_ctx once the decoding context has been created. This results in the attached. The extra test from upthread to check that logical decoding is not allowed in a non-database WAL sender is a good idea, so I have kept it. -- Michael diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h index d930fe957d..b0f2a6ed43 100644 --- a/src/include/access/xlogreader.h +++ b/src/include/access/xlogreader.h @@ -262,10 +262,6 @@ extern XLogReaderRoutine *LocalXLogReaderRoutine(void); /* Free an XLogReader */ extern void XLogReaderFree(XLogReaderState *state); -/* Initialize supporting structures */ -extern void WALOpenSegmentInit(WALOpenSegment *seg, WALSegmentContext *segcxt, - int segsize, const char *waldir); - /* Position the XLogReader to given record */ extern void XLogBeginRead(XLogReaderState *state, XLogRecPtr RecPtr); #ifdef FRONTEND diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index 5995798b58..cb76be4f46 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -44,6 +44,8 @@ static bool ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr, static bool ValidXLogRecord(XLogReaderState *state, XLogRecord *record, XLogRecPtr recptr); static void ResetDecoder(XLogReaderState *state); +static void WALOpenSegmentInit(WALOpenSegment *seg, WALSegmentContext *segcxt, + int segsize, const char *waldir); /* size of the buffer allocated for error message. */ #define MAX_ERRORMSG_LEN 1000 @@ -210,7 +212,7 @@ allocate_recordbuf(XLogReaderState *state, uint32 reclength) /* * Initialize the passed segment structs. */ -void +static void WALOpenSegmentInit(WALOpenSegment *seg, WALSegmentContext *segcxt, int segsize, const char *waldir) { diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 86847cbb54..aeda307c6b 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -130,13 +130,11 @@ bool log_replication_commands = false; bool wake_wal_senders = false; /* - * Physical walsender does not use xlogreader to read WAL, but it does use a - * fake one to keep state. Logical walsender uses a proper xlogreader. Both - * keep the 'xlogreader' pointer to the right one, for the sake of common - * routines. + * xlogreader used for replication. Note that a WAL sender doing physical + * replication does not need xlogreader to read WAL, but it needs one to + * keep a state of its work. */ -static XLogReaderState fake_xlogreader; -static XLogReaderState *xlogreader; +static XLogReaderState *xlogreader = NULL; /* * These variables keep track of the state of the timeline we're currently @@ -285,20 +283,6 @@ InitWalSender(void) /* Initialize empty timestamp buffer for lag tracking. */ lag_tracker = MemoryContextAllocZero(TopMemoryContext, sizeof(LagTracker)); - - /* - * Prepare physical walsender's fake xlogreader struct. Logical walsender - * does this later. - */ - if (!am_db_walsender) - { - xlogreader = _xlogreader; - xlogreader->routine = - *XL_ROUTINE(.segment_open = WalSndSegmentOpen, - .segment_close = wal_segment_close); - WALOpenSegmentInit(>seg, >segcxt, - wal_segment_size, NULL); - } } /* @@ -594,6 +578,18 @@ StartReplication(StartReplicationCmd *cmd) (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("IDENTIFY_SYSTEM has not been run before START_REPLICATION"))); + /* create xlogreader for physical replication */ + xlogreader = + XLogReaderAllocate(wal_segment_size, NULL, + XL_ROUTINE(.segment_open = WalSndSegmentOpen, + .segment_close = wal_segment_close), + NULL); + + if (!xlogreader) + ereport(ERROR, +(errcode(ERRCODE_OUT_OF_MEMORY), + errmsg("out of memory"))); + /* * We assume
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On Fri, 29 May 2020 at 17:57, Kyotaro Horiguchi wrote: > > At Fri, 29 May 2020 16:21:38 +0900, Michael Paquier > wrote in > > On Thu, May 28, 2020 at 06:11:39PM +0900, Kyotaro Horiguchi wrote: > > > Mmm. It is not the proper way to use physical replication and it's > > > totally accidental that that worked (or even it might be a bug). The > > > documentation is saying as the follows, as more-or-less the same for > > > all versions since 9.4. > > > > > > https://www.postgresql.org/docs/13/protocol-replication.html > > > > + if (am_db_walsender) > > + ereport(ERROR, > > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > > +errmsg("cannot initiate physical > > replication on a logical replication connection"))); > > > > I don't agree with this change. The only restriction that we have in > > place now in walsender.c regarding MyDatabaseId not being set is to > > prevent the execution of SQL commands. Note that it is possible to > > start physical replication even if MyDatabaseId is set in a > > replication connection, so you could break cases that have been valid > > until now. > > It donesn't check MyDatabase, but whether the connection parameter > "repliation" is "true" or "database". The documentation is telling > that "replication" should be "true" for a connection that is to be > used for physical replication, and "replication" should literally be > "database" for a connection that is for logical replication. We need > to revise the documentation if we are going to allow physical > replication on a conection with "replication = database". > Yes. Conversely, if we start logical replication in a physical replication connection (i.g. replication=true) we got an error before staring replication: ERROR: logical decoding requires a database connection I think we can prevent that SEGV in a similar way. Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
At Fri, 29 May 2020 16:21:38 +0900, Michael Paquier wrote in > On Thu, May 28, 2020 at 06:11:39PM +0900, Kyotaro Horiguchi wrote: > > Mmm. It is not the proper way to use physical replication and it's > > totally accidental that that worked (or even it might be a bug). The > > documentation is saying as the follows, as more-or-less the same for > > all versions since 9.4. > > > > https://www.postgresql.org/docs/13/protocol-replication.html > > + if (am_db_walsender) > + ereport(ERROR, > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > +errmsg("cannot initiate physical > replication on a logical replication connection"))); > > I don't agree with this change. The only restriction that we have in > place now in walsender.c regarding MyDatabaseId not being set is to > prevent the execution of SQL commands. Note that it is possible to > start physical replication even if MyDatabaseId is set in a > replication connection, so you could break cases that have been valid > until now. It donesn't check MyDatabase, but whether the connection parameter "repliation" is "true" or "database". The documentation is telling that "replication" should be "true" for a connection that is to be used for physical replication, and "replication" should literally be "database" for a connection that is for logical replication. We need to revise the documentation if we are going to allow physical replication on a conection with "replication = database". > I think that we actually should be much more careful with the > initialization of the WAL reader used in the context of a WAL sender > before calling WALRead() and attempting to read a new WAL page. I agree that the initialization can be improved, but the current code is no problem if we don't allow to run both logical and physical replication on a single session. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On Thu, May 28, 2020 at 06:11:39PM +0900, Kyotaro Horiguchi wrote: > Mmm. It is not the proper way to use physical replication and it's > totally accidental that that worked (or even it might be a bug). The > documentation is saying as the follows, as more-or-less the same for > all versions since 9.4. > > https://www.postgresql.org/docs/13/protocol-replication.html + if (am_db_walsender) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), +errmsg("cannot initiate physical replication on a logical replication connection"))); I don't agree with this change. The only restriction that we have in place now in walsender.c regarding MyDatabaseId not being set is to prevent the execution of SQL commands. Note that it is possible to start physical replication even if MyDatabaseId is set in a replication connection, so you could break cases that have been valid until now. I think that we actually should be much more careful with the initialization of the WAL reader used in the context of a WAL sender before calling WALRead() and attempting to read a new WAL page. -- Michael signature.asc Description: PGP signature
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
At Thu, 28 May 2020 09:08:19 -0400, Dave Cramer wrote in > On Thu, 28 May 2020 at 05:11, Kyotaro Horiguchi > wrote: > > Mmm. It is not the proper way to use physical replication and it's > > totally accidental that that worked (or even it might be a bug). The > > documentation is saying as the follows, as more-or-less the same for > > all versions since 9.4. > > > > https://www.postgresql.org/docs/13/protocol-replication.html ... > > > While the documentation does indeed say that there is quite a bit of > additional confusion added by: > > and > START_REPLICATION [ SLOT *slot_name* ] [ PHYSICAL ] *XXX/XXX* [ TIMELINE > *tli* ] > > If we already have a physical replication slot according to the startup > message why do we need to specify it in the START REPLICATION message ? I don't know, but physical replication has worked that way since before the replication slots was introduced so we haven't needed to do so. Physical replication slots are not assumed as more than memorandum for the oldest required WAL segment (and oldest xmin). regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On Thu, 28 May 2020 at 05:11, Kyotaro Horiguchi wrote: > Hello, Vladimir. > > At Thu, 28 May 2020 11:57:23 +0300, Vladimir Sitnikov < > sitnikov.vladi...@gmail.com> wrote in > > Kyotaro>It seems to me that that crash means Pgjdbc is initiating a > logical > > Kyotaro>replication connection to start physical replication. > > > > Well, it used to work previously, so it might be a breaking change from > the > > client/application point of view. > > Mmm. It is not the proper way to use physical replication and it's > totally accidental that that worked (or even it might be a bug). The > documentation is saying as the follows, as more-or-less the same for > all versions since 9.4. > > https://www.postgresql.org/docs/13/protocol-replication.html > > > To initiate streaming replication, the frontend sends the replication > > parameter in the startup message. A Boolean value of true (or on, yes, > > 1) tells the backend to go into physical replication walsender mode, > > wherein a small set of replication commands, shown below, can be > > issued instead of SQL statements. > > > > Passing database as the value for the replication parameter instructs > > the backend to go into logical replication walsender mode, connecting > > to the database specified in the dbname parameter. In logical > > replication walsender mode, the replication commands shown below as > > well as normal SQL commands can be issued. > > regards. > > While the documentation does indeed say that there is quite a bit of additional confusion added by: and START_REPLICATION [ SLOT *slot_name* ] [ PHYSICAL ] *XXX/XXX* [ TIMELINE *tli* ] If we already have a physical replication slot according to the startup message why do we need to specify it in the START REPLICATION message ? Dave
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
Hello, Vladimir. At Thu, 28 May 2020 11:57:23 +0300, Vladimir Sitnikov wrote in > Kyotaro>It seems to me that that crash means Pgjdbc is initiating a logical > Kyotaro>replication connection to start physical replication. > > Well, it used to work previously, so it might be a breaking change from the > client/application point of view. Mmm. It is not the proper way to use physical replication and it's totally accidental that that worked (or even it might be a bug). The documentation is saying as the follows, as more-or-less the same for all versions since 9.4. https://www.postgresql.org/docs/13/protocol-replication.html > To initiate streaming replication, the frontend sends the replication > parameter in the startup message. A Boolean value of true (or on, yes, > 1) tells the backend to go into physical replication walsender mode, > wherein a small set of replication commands, shown below, can be > issued instead of SQL statements. > > Passing database as the value for the replication parameter instructs > the backend to go into logical replication walsender mode, connecting > to the database specified in the dbname parameter. In logical > replication walsender mode, the replication commands shown below as > well as normal SQL commands can be issued. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
Kyotaro>It seems to me that that crash means Pgjdbc is initiating a logical Kyotaro>replication connection to start physical replication. Well, it used to work previously, so it might be a breaking change from the client/application point of view. Vladimir
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
At Thu, 28 May 2020 09:07:04 +0300, Vladimir Sitnikov wrote in > Pgjdbc test suite identified a SIGSEGV in the recent HEAD builds of > PostgreSQL, Ubuntu 14.04.5 LTS > > Here's a call stack: > https://travis-ci.org/github/pgjdbc/pgjdbc/jobs/691794110#L7484 > The crash is consistent, and it reproduces 100% of the cases so far. > > The CI history shows that HEAD was good at 11 May 13:27 UTC, and it became > bad by 19 May 14:00 UTC, > so the regression was introduced somewhere in-between. > > Does that ring any bells? Thanks for the report. It is surely a bug since the server crashes, on the other hand Pgjdbc seems doing bad, too. It seems to me that that crash means Pgjdbc is initiating a logical replication connection to start physical replication. > In case you wonder: > > Program terminated with signal SIGSEGV, Segmentation fault. > #0 XLogSendPhysical () at walsender.c:2762 > 2762 if (!WALRead(xlogreader, > (gdb) #0 XLogSendPhysical () at walsender.c:2762 > SendRqstPtr = 133473640 > startptr = 133473240 > endptr = 133473640 > nbytes = 400 > segno = 1 > errinfo = {wre_errno = 988942240, wre_off = 2, wre_req = -1, > wre_read = -1, wre_seg = {ws_file = 4714224, > ws_segno = 140729887364688, ws_tli = 0}} > __func__ = "XLogSendPhysical" I see the probably the same symptom by the following steps with the current HEAD. psql 'host=/tmp replication=database' =# START_REPLICATION 0/1; Physical replication is not assumed to be started on a logical replication connection. The attached would fix that. The patch adds two tests. One for this case and another for the reverse. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 1f94c98f7459ca8a4942246325815a3e0a91caa4 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 28 May 2020 15:55:30 +0900 Subject: [PATCH v1] Fix crash when starting physical replication on logical connection It is an illegal operation to try starting physical replication on a logical replication session. We should properly warn the client instead of crashing. --- src/backend/replication/walsender.c | 5 + src/test/recovery/t/001_stream_rep.pl | 14 +++--- src/test/recovery/t/006_logical_decoding.pl | 10 +- 3 files changed, 25 insertions(+), 4 deletions(-) diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 86847cbb54..7b79c75311 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -589,6 +589,11 @@ StartReplication(StartReplicationCmd *cmd) StringInfoData buf; XLogRecPtr FlushPtr; + if (am_db_walsender) + ereport(ERROR, +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("cannot initiate physical replication on a logical replication connection"))); + if (ThisTimeLineID == 0) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl index 0c316c1808..0b69b7d8d1 100644 --- a/src/test/recovery/t/001_stream_rep.pl +++ b/src/test/recovery/t/001_stream_rep.pl @@ -3,7 +3,7 @@ use strict; use warnings; use PostgresNode; use TestLib; -use Test::More tests => 35; +use Test::More tests => 36; # Initialize master node my $node_master = get_new_node('master'); @@ -18,6 +18,14 @@ my $backup_name = 'my_backup'; # Take backup $node_master->backup($backup_name); +# Check if logical-rep session properly refuses to start physical-rep +my ($ret, $stdout, $stderr) = + $node_master->psql('template1', + qq[START_REPLICATION PHYSICAL 0/1], + replication=>'database'); +ok($stderr =~ /ERROR: cannot initiate physical replication on a logical replication connection/, + "check if physical replication is rejected on logical-rep session"); + # Create streaming standby linking to master my $node_standby_1 = get_new_node('standby_1'); $node_standby_1->init_from_backup($node_master, $backup_name, @@ -94,7 +102,7 @@ sub test_target_session_attrs # The client used for the connection does not matter, only the backend # point does. - my ($ret, $stdout, $stderr) = + ($ret, $stdout, $stderr) = $node1->psql('postgres', 'SHOW port;', extra_params => [ '-d', $connstr ]); is( $status == $ret && $stdout eq $target_node->port, @@ -136,7 +144,7 @@ my $connstr_rep= "$connstr_common replication=1"; my $connstr_db = "$connstr_common replication=database dbname=postgres"; # Test SHOW ALL -my ($ret, $stdout, $stderr) = $node_master->psql( +($ret, $stdout, $stderr) = $node_master->psql( 'postgres', 'SHOW ALL;', on_error_die => 1, extra_params => [ '-d', $connstr_rep ]); diff --git a/src/test/recovery/t/006_logical_decoding.pl b/src/test/recovery/t/006_logical_decoding.pl index ee05535b1c..eefde8c3f1 100644 --- a/src/test/recovery/t/006_logical_decoding.pl +++ b/src/test/recovery/t/006_logical_decoding.pl @@ -7,7
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On Thu, May 28, 2020 at 04:32:22PM +0900, Kyotaro Horiguchi wrote: > I think that's not the case. I think I cause this crash with the > HEAD. I'll post a fix soon. > > Pgjdbc seems initiating physical replication on a logical replication > session. Let me see... Indeed: $ psql "replication=database dbname=postgres" =# START_REPLICATION SLOT test_slot PHYSICAL 0/0; [one later] (gdb) bt #0 XLogSendPhysical () at walsender.c:2762 #1 0x55d5f7803318 in WalSndLoop (send_data=0x55d5f78039c7 ) at walsender.c:2300 #2 0x55d5f7800d70 in StartReplication (cmd=0x55d5f919bc60) at walsender.c:750 #3 0x55d5f78025ad in exec_replication_command (cmd_string=0x55d5f9119a80 "START_REPLICATION SLOT test_slot PHYSICAL 0/0;") at walsender.c:1643 #4 0x55d5f786a7ea in PostgresMain (argc=1, argv=0x55d5f91472c8, dbname=0x55d5f9147210 "ioltas", username=0x55d5f91471f0 "ioltas") at postgres.c:4311 #5 0x55d5f77b4e9c in BackendRun (port=0x55d5f913dcd0) at postmaster.c:4523 #6 0x55d5f77b4606 in BackendStartup (port=0x55d5f913dcd0) at postmaster.c:4215 #7 0x55d5f77b08ad in ServerLoop () at postmaster.c:1727 #8 0x55d5f77b00fc in PostmasterMain (argc=3, argv=0x55d5f9113260) at postmaster.c:1400 #9 0x55d5f76b3736 in main (argc=3, argv=0x55d5f9113260) at main.c:210 No need for the JDBC test suite then. -- Michael signature.asc Description: PGP signature
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
At Thu, 28 May 2020 16:22:33 +0900, Michael Paquier wrote in > On Thu, May 28, 2020 at 09:07:04AM +0300, Vladimir Sitnikov wrote: > > The CI history shows that HEAD was good at 11 May 13:27 UTC, and it became > > bad by 19 May 14:00 UTC, > > so the regression was introduced somewhere in-between. > > > > Does that ring any bells? > > It does, thanks! This would map with 1d374302 or 850196b6 that > reworked this area of the code, so it seems like we are not quite done > with this work yet. Do you still see the problem as of 55ca50d > (today's latest HEAD)? I think that's not the case. I think I cause this crash with the HEAD. I'll post a fix soon. > Also, just wondering.. If I use more or less the same commands as > your travis job I should be able to reproduce the problem with a fresh > JDBC repository, right? Or do you a sub-portion of your regression > tests to run that easily? Pgjdbc seems initiating physical replication on a logical replication session. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On Thu, May 28, 2020 at 09:07:04AM +0300, Vladimir Sitnikov wrote: > The CI history shows that HEAD was good at 11 May 13:27 UTC, and it became > bad by 19 May 14:00 UTC, > so the regression was introduced somewhere in-between. > > Does that ring any bells? It does, thanks! This would map with 1d374302 or 850196b6 that reworked this area of the code, so it seems like we are not quite done with this work yet. Do you still see the problem as of 55ca50d (today's latest HEAD)? Also, just wondering.. If I use more or less the same commands as your travis job I should be able to reproduce the problem with a fresh JDBC repository, right? Or do you a sub-portion of your regression tests to run that easily? -- Michael signature.asc Description: PGP signature