Re: [HACKERS] Replication status in logical replication
On Tue, Sep 26, 2017 at 10:36 AM, Vaishnavi Prabakaran wrote: > Hi, > > On Wed, Sep 13, 2017 at 9:59 AM, Daniel Gustafsson wrote: >> >> >> I’m not entirely sure why this was flagged as "Waiting for Author” by the >> automatic run, the patch applies for me and builds so resetting back to >> “Needs >> review”. >> > > This patch applies and build cleanly and I did a testing with one publisher > and one subscriber, and confirm that the replication state after restarting > the server now is "streaming" and not "Catchup". > > And, I don't find any issues with code and patch to me is ready for > committer, marked the same in cf entry. > Thank you for the reviewing the patch! Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication status in logical replication
Hi, On Wed, Sep 13, 2017 at 9:59 AM, Daniel Gustafsson wrote: > > I’m not entirely sure why this was flagged as "Waiting for Author” by the > automatic run, the patch applies for me and builds so resetting back to > “Needs > review”. > > This patch applies and build cleanly and I did a testing with one publisher and one subscriber, and confirm that the replication state after restarting the server now is "streaming" and not "Catchup". And, I don't find any issues with code and patch to me is ready for committer, marked the same in cf entry. Thanks & Regards, Vaishnavi, Fujitsu Australia.
Re: [HACKERS] Replication status in logical replication
> On 30 May 2017, at 19:55, Peter Eisentraut > wrote: > > On 5/29/17 22:56, Noah Misch wrote: >> On Fri, May 19, 2017 at 11:33:48AM +0900, Masahiko Sawada wrote: >>> On Wed, Apr 12, 2017 at 5:31 AM, Simon Riggs wrote: Looks like a bug that we should fix in PG10, with backpatch to 9.4 (or as far as it goes). Objections to commit? >>> >>> Seems we still have this issue. Any update or comment on this? Barring >>> any objections, I'll add this to the open item so it doesn't get >>> missed. >> >> [Action required within three days. This is a generic notification.] >> >> The above-described topic is currently a PostgreSQL 10 open item. Peter, >> since you committed the patch believed to have created it, you own this open >> item. If some other commit is more relevant or if this does not belong as a >> v10 open item, please let us know. > > I would ask Simon to go ahead with this patch if he feels comfortable > with it. > > I'm disclaiming this open item, since it's an existing bug from previous > releases (and I have other open items to focus on). I’m not entirely sure why this was flagged as "Waiting for Author” by the automatic run, the patch applies for me and builds so resetting back to “Needs review”. Simon: do you think you will have time to look at this patch in this CF? cheers ./daniel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
On 09/06/17 18:33, Omar Kilani wrote: > Is there anything people using float datetimes can do that isn't a > pg_dumpall | pg_restore to do a less painful update? > > We have several TB of data still using float datetimes and I'm trying > to figure out how we can move to 10 (currently on 9.6.x) without > massive amounts of $ in duplicated hardware or downtime. I ran into an analogous issue with endianness of PL/Java-defined datatypes, and ended up devising a procedure [1] for treating the type one way on output and another way on input, and then constructing an UPDATE query that just assigns the column to itself using a function that's essentially identity, but forces the output-input conversion (and doesn't look like a simple identity function to the query optimizer, which otherwise might optimize the whole update away). While the details of that were specific to PL/Java and byte order, something similar might work in your case if you can afford some period, either just before or just after migration, when your normal workload is blocked, long enough to run such updates on your several TB of data. Or if that's too big a chunk of downtime, you might be able to add a column in advance, of some user-defined type the same width as an integer datetime, populate that in advance, migrate, then do a quick catalog switcheroo of the column names and types. I've never done that, so someone else might have comments on its feasibility or the best way of doing it. > the exact crash, but the datetime values were either too small or too > large to fit into the integer datetimes field -- I can retry this if That does seem important to get the details on. If the integer datetime column won't represent your stuff (and the too-large or too-small values are necessary to represent), then some schema change might be necessary. Or, if the outlying values were sentinel values of some kind (I wonder, how else would you even have datetimes outside of the int64 range?), maybe they can be replaced with others that fit. -Chap [1]: https://tada.github.io/pljava/use/byteordermigrate.html -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
Omar Kilani writes: > Is there anything people using float datetimes can do that isn't a > pg_dumpall | pg_restore to do a less painful update? Um, not really. You may be stuck on 9.6 until you can spare the effort to convert. The physical representations of timestamps are totally different in the two cases. > I did attempt a pg_dumpall | pg_restore at one point but for whatever > reason we had data in tables that integer datetimes fails on (I forget > the exact crash, but the datetime values were either too small or too > large to fit into the integer datetimes field -- I can retry this if > it would be helpful). I'm pretty sure the minimum values are the same in both cases, to wit Julian day zero. As for the max, according to the old code comments * The upper limit for dates is 5874897-12-31, which is a bit less than what * the Julian-date code can allow. We use that same limit for timestamps when * using floating-point timestamps ... For integer timestamps, the upper * limit is 294276-12-31. I would hope that any timestamps you've got beyond 294276AD are erroneous data that you need to clean up anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
Hi, I know I'm 7 months late to this, but only just read the beta 4 release notes. Is there anything people using float datetimes can do that isn't a pg_dumpall | pg_restore to do a less painful update? We have several TB of data still using float datetimes and I'm trying to figure out how we can move to 10 (currently on 9.6.x) without massive amounts of $ in duplicated hardware or downtime. I did attempt a pg_dumpall | pg_restore at one point but for whatever reason we had data in tables that integer datetimes fails on (I forget the exact crash, but the datetime values were either too small or too large to fit into the integer datetimes field -- I can retry this if it would be helpful). Thanks. Regards, Omar On Mon, Feb 27, 2017 at 5:13 PM, Andres Freund wrote: > On 2017-02-27 17:00:23 -0800, Joshua D. Drake wrote: >> On 02/22/2017 02:45 PM, Tom Lane wrote: >> > Andres Freund writes: >> > > On 2017-02-22 08:43:28 -0500, Tom Lane wrote: >> > > > (To be concrete, I'm suggesting dropping --disable-integer-datetimes >> > > > in HEAD, and just agreeing that in the back branches, use of >> > > > replication >> > > > protocol with float-timestamp servers is not supported and we're not >> > > > going to bother looking for related bugs there. Given the lack of >> > > > field >> > > > complaints, I do not believe anyone cares.) >> > >> > What I *am* willing to spend time on is removing float-timestamp code >> > in HEAD. I've not yet heard anybody speak against doing that (or at >> > least, nothing I interpreted as a vote against it). If I've not heard >> > any complaints by tomorrow, I'll get started on that. >> >> Rip it out! > > Already happened: > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b6aa17e0ae367afdcea07118e016111af4fa6bc3 > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication origins and timelines
On 1 June 2017 at 09:23, Andres Freund wrote: > Even if we decide this is necessary, I *strongly* suggest trying to get > the existing standby decoding etc wrapped up before starting something > nontrival afresh. Speaking of such, I had a thought about how to sync logical slot state to physical replicas without requiring all logical downstreams to know about and be able to connect to all physical replicas. Interested in your initial reaction. Basically, enlist the walreceiver's help. Extend the walsender so in physical rep mode it periodically writes the upstream's logical slot state into the stream as a message with special lsn 0/0. Then the walreceiver uses that to make decoding calls on the downstream to advance the downstream logical slots to the new confirmed_flush_lsn, or hands the info off to a helper proc that does it. It could set up a decoding context and do it via a proper decoding session, discarding output, and later we could probably optimise that decoding session to do even less work than ReorderBufferSkip()ing xacts. The alternative at this point since we nixed writing logical slot state to WAL seems to be a bgworker on the upstream that periodically writes logical slot state into generic WAL messages in a special table, then another on the downstream that processes the table and makes appropriate decoding calls to advance the downstream slot state. (Safely, not via directly setting catalog_xmin etc). Which is pretty damn ugly, but has the advantage that it'd work for PITR restores, unlike the walsender/walreceiver based approach. Failover slots in extension-space, basically. I'm really, really not sold on all logical downstreams having to know about and be able to connect to all physical standbys of the upstream to maintain slots on them. Some kind of solution that runs entirely on the standby will be needed. It's more a question of whether it's something built-in, easy, and nice, or some out of tree extension. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication origins and timelines
On 2017-05-31 21:27:56 -0400, Stephen Frost wrote: > Craig, > > * Craig Ringer (cr...@2ndquadrant.com) wrote: > > TL;DR: replication origins track LSN without timeline. This is > > ambiguous when physical failover is present since / > > can now represent more than one state due to timeline forks with > > promotions. Replication origins should track timelines so we can tell > > the difference, I propose to patch them accordingly for pg11. > > Uh, TL;DR, wow? Why isn't this something which needs to be addressed > before PG10 can be released? Huh? Slots are't present on replicas, ergo there's no way for the whole issue to occur in v10. > The further comments in your email seem to state that logical > replication will just fail if a replica is promoted. While not ideal, > that might barely reach the point of it being releasable, but turns it > into a feature that I'd have a really hard time recommending to > anyone, Meh^10 > and are we absolutely sure that there aren't any cases where there might > be an issue of undetected promotion, leading to the complications which > you describe? Yes, unless you manipulate things by hand, copying files around or such. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication origins and timelines
On 1 June 2017 at 09:36, Andres Freund wrote: > On 2017-05-31 21:33:26 -0400, Stephen Frost wrote: >> > This only starts becoming an issue once logical replication slots can >> > exist on replicas and be maintained to follow the master's slot state. >> > Which is incomplete in Pg10 (not exposed to users) but I plan to >> > finish getting in for pg11, making this a possible issue to be >> > addressed. >> >> Fair enough. I'm disappointed that we ended up with that as the >> solution for PG10 > > This has widely been debated, and it's not exactly new that development > happens incrementally, so I don't have particularly much sympathy for > that POV. Yeah. Even if we'd had completed support for decoding on standby, there's no way we could've implemented the required gymnastics for getting logical replication to actually use it to support physical failover for pg10, so it was always going to land in pg11. This is very much a "how do we do it right when we do do it" topic, not a pg10 issue. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication origins and timelines
Craig, * Craig Ringer (cr...@2ndquadrant.com) wrote: > On 1 June 2017 at 09:27, Stephen Frost wrote: > > * Craig Ringer (cr...@2ndquadrant.com) wrote: > >> TL;DR: replication origins track LSN without timeline. This is > >> ambiguous when physical failover is present since / > >> can now represent more than one state due to timeline forks with > >> promotions. Replication origins should track timelines so we can tell > >> the difference, I propose to patch them accordingly for pg11. > > > > Uh, TL;DR, wow? Why isn't this something which needs to be addressed > > before PG10 can be released? I hope I'm missing something that makes > > the current approach work in PG10, or that there's some reason that this > > isn't a big deal for PG10, but I'd like a bit of info as to why that's > > the case, if it is. > > In Pg 10, if you promote a physical replica then logical replication > falls apart entirely and stops working. So there's no corruption > hazard because it just ... stops. I see. > This only starts becoming an issue once logical replication slots can > exist on replicas and be maintained to follow the master's slot state. > Which is incomplete in Pg10 (not exposed to users) but I plan to > finish getting in for pg11, making this a possible issue to be > addressed. Fair enough. I'm disappointed that we ended up with that as the solution for PG10, but so be it, the main thing is that we avoid any corruption risk. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Replication origins and timelines
Andres, * Andres Freund (and...@anarazel.de) wrote: > On 2017-05-31 21:33:26 -0400, Stephen Frost wrote: > > > This only starts becoming an issue once logical replication slots can > > > exist on replicas and be maintained to follow the master's slot state. > > > Which is incomplete in Pg10 (not exposed to users) but I plan to > > > finish getting in for pg11, making this a possible issue to be > > > addressed. > > > > Fair enough. I'm disappointed that we ended up with that as the > > solution for PG10 > > This has widely been debated, and it's not exactly new that development > happens incrementally, so I don't have particularly much sympathy for > that POV. I do understand that, of course, but hadn't quite realized yet that we're talking only about replication slots on replicas. Apologies for the noise. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Replication origins and timelines
On 2017-05-31 21:33:26 -0400, Stephen Frost wrote: > > This only starts becoming an issue once logical replication slots can > > exist on replicas and be maintained to follow the master's slot state. > > Which is incomplete in Pg10 (not exposed to users) but I plan to > > finish getting in for pg11, making this a possible issue to be > > addressed. > > Fair enough. I'm disappointed that we ended up with that as the > solution for PG10 This has widely been debated, and it's not exactly new that development happens incrementally, so I don't have particularly much sympathy for that POV. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication origins and timelines
On 1 June 2017 at 09:23, Andres Freund wrote: > Hi, > > On 2017-06-01 09:12:04 +0800, Craig Ringer wrote: >> TL;DR: replication origins track LSN without timeline. This is >> ambiguous when physical failover is present since / >> can now represent more than one state due to timeline forks with >> promotions. Replication origins should track timelines so we can tell >> the difference, I propose to patch them accordingly for pg11. > > I'm not quite convinced that this should be tracked at the origin level. > If you fail over physically, shouldn't we also reconfigure logical > replication? > > Even if we decide this is necessary, I *strongly* suggest trying to get > the existing standby decoding etc wrapped up before starting something > nontrival afresh. Yeah, I'm not thinking of leaping straight to a patch before we've got the rep on standby stuff nailed down. I just wanted to raise early discussion to make sure it's not entirely the wrong path and/or totally hopeless for core. Logical decoding output plugins would need to keep track of the timeline and send an extra message informing the downstream of a timeline change whenever they see a new timeline. Or include it in all messages (see: extra overhead). Since we don't stop a decoding session when we hit a timeline boundary and force re-connection. (Nor can we, since at some point our restart_lsn will be on the old timeline but the first commits will be on the new timeline). I'll need to think about if/how the decoding plugin can reliably do that. >> Take master A, its physical replica B, and logical decoding client X >> streaming changes from A. B is lagging. A is at lsn 1/1000, B is only >> at 1/500. C has replicated from A up to 1/1000, when A fails. We >> promote B to replace A. Now C connects to B, and requests to resume at >> LSN 1/1000. > > Wouldn't it be better to solve this by querying the new master's > timeline history, and checking whether the current replay point is > pre/post fork? That could work. The decoding client would need to track the last-commit timeline in its own metadata if we're not letting it put it in the replication origin. Manageable, if awkward. Clients would need to know how to fetch and parse timeline history files, which is an irritating thing for every decoding client that wants to support failover to have to support. But I guess it's manageable, if not friendly. And non-Pg-based downstreams would have to do it anyway. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication origins and timelines
Andres, * Andres Freund (and...@anarazel.de) wrote: > On 2017-05-31 21:27:56 -0400, Stephen Frost wrote: > > Uh, TL;DR, wow? Why isn't this something which needs to be addressed > > before PG10 can be released? > > Huh? Slots are't present on replicas, ergo there's no way for the whole > issue to occur in v10. O, ok, now that makes more sense, not sure how I missed that. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Replication origins and timelines
On 1 June 2017 at 09:27, Stephen Frost wrote: > Craig, > > * Craig Ringer (cr...@2ndquadrant.com) wrote: >> TL;DR: replication origins track LSN without timeline. This is >> ambiguous when physical failover is present since / >> can now represent more than one state due to timeline forks with >> promotions. Replication origins should track timelines so we can tell >> the difference, I propose to patch them accordingly for pg11. > > Uh, TL;DR, wow? Why isn't this something which needs to be addressed > before PG10 can be released? I hope I'm missing something that makes > the current approach work in PG10, or that there's some reason that this > isn't a big deal for PG10, but I'd like a bit of info as to why that's > the case, if it is. In Pg 10, if you promote a physical replica then logical replication falls apart entirely and stops working. So there's no corruption hazard because it just ... stops. This only starts becoming an issue once logical replication slots can exist on replicas and be maintained to follow the master's slot state. Which is incomplete in Pg10 (not exposed to users) but I plan to finish getting in for pg11, making this a possible issue to be addressed. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication origins and timelines
Craig, * Craig Ringer (cr...@2ndquadrant.com) wrote: > TL;DR: replication origins track LSN without timeline. This is > ambiguous when physical failover is present since / > can now represent more than one state due to timeline forks with > promotions. Replication origins should track timelines so we can tell > the difference, I propose to patch them accordingly for pg11. Uh, TL;DR, wow? Why isn't this something which needs to be addressed before PG10 can be released? I hope I'm missing something that makes the current approach work in PG10, or that there's some reason that this isn't a big deal for PG10, but I'd like a bit of info as to why that's the case, if it is. The further comments in your email seem to state that logical replication will just fail if a replica is promoted. While not ideal, that might barely reach the point of it being releasable, but turns it into a feature that I'd have a really hard time recommending to anyone, and are we absolutely sure that there aren't any cases where there might be an issue of undetected promotion, leading to the complications which you describe? Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Replication origins and timelines
Hi, On 2017-06-01 09:12:04 +0800, Craig Ringer wrote: > TL;DR: replication origins track LSN without timeline. This is > ambiguous when physical failover is present since / > can now represent more than one state due to timeline forks with > promotions. Replication origins should track timelines so we can tell > the difference, I propose to patch them accordingly for pg11. I'm not quite convinced that this should be tracked at the origin level. If you fail over physically, shouldn't we also reconfigure logical replication? Even if we decide this is necessary, I *strongly* suggest trying to get the existing standby decoding etc wrapped up before starting something nontrival afresh. > Why? > > Take master A, its physical replica B, and logical decoding client X > streaming changes from A. B is lagging. A is at lsn 1/1000, B is only > at 1/500. C has replicated from A up to 1/1000, when A fails. We > promote B to replace A. Now C connects to B, and requests to resume at > LSN 1/1000. Wouldn't it be better to solve this by querying the new master's timeline history, and checking whether the current replay point is pre/post fork? I'm more than bit doubtful that adding more overhead to every relevant record is worth it here. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Replication origins and timelines
Hi all TL;DR: replication origins track LSN without timeline. This is ambiguous when physical failover is present since / can now represent more than one state due to timeline forks with promotions. Replication origins should track timelines so we can tell the difference, I propose to patch them accordingly for pg11. - When replication origins were introduced, they deliberately left out tracking of the upstream node's timeline. Logical decoding couldn't follow a timeline switch anyway, and replicas (still) have no facility for logical decoding so everything completely breaks on promotion of a physical replica. I'm working on fixing that so that logical decoding and logical replication integrates properly with physical replication and failover. But when that works we'll face the same problem in logical rep that timelines were introduced to solve for physical rep. To prevent undetected misreplication we'll need to keep track of the timeline of the last-replicated LSN in our downstream replication origin. So I propose to add a timeline field to replication origins for pg11. Why? Take master A, its physical replica B, and logical decoding client X streaming changes from A. B is lagging. A is at lsn 1/1000, B is only at 1/500. C has replicated from A up to 1/1000, when A fails. We promote B to replace A. Now C connects to B, and requests to resume at LSN 1/1000. If B has since done enough work for its insert position to pass 1/1000, C will completely skip whatever B did between 1/500 and 1/1000, thinking (incorrectly) that it already replayed it. And it will have *extra data* from A from the 1/500 to 1/1000 range that B lost. It'll pick up from B's 1/1000 and try to apply that on top of A's 1/1000 state, potentially leading to a mangled mess. In physical rep this would lead to serious data corruption and crashes. In logical rep it'll most likely lead to conflicts, apply errors, inconsistent data, broken FKs, etc. It could be drastic, or quite subtle, depending on app and workload. But we really should still detect it. To do that, we need to remember that our last replay position was (1/1000, 1) . And when we request to start replay from 1/1000 at timeline 1 on B, it'll ERROR, telling us that its timeline 1 ends at 1/500. We could still *choose* to continue as if all was well, but by default we'll detect the error. But we can't do that unless replication origins on the downstream can track the timeline. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication status in logical replication
On 5/29/17 22:56, Noah Misch wrote: > On Fri, May 19, 2017 at 11:33:48AM +0900, Masahiko Sawada wrote: >> On Wed, Apr 12, 2017 at 5:31 AM, Simon Riggs wrote: >>> Looks like a bug that we should fix in PG10, with backpatch to 9.4 (or >>> as far as it goes). >>> >>> Objections to commit? >>> >> >> Seems we still have this issue. Any update or comment on this? Barring >> any objections, I'll add this to the open item so it doesn't get >> missed. > > [Action required within three days. This is a generic notification.] > > The above-described topic is currently a PostgreSQL 10 open item. Peter, > since you committed the patch believed to have created it, you own this open > item. If some other commit is more relevant or if this does not belong as a > v10 open item, please let us know. I would ask Simon to go ahead with this patch if he feels comfortable with it. I'm disclaiming this open item, since it's an existing bug from previous releases (and I have other open items to focus on). -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication status in logical replication
On Fri, May 19, 2017 at 11:33:48AM +0900, Masahiko Sawada wrote: > On Wed, Apr 12, 2017 at 5:31 AM, Simon Riggs wrote: > > On 22 March 2017 at 02:50, Masahiko Sawada wrote: > > > >> When using logical replication, I ran into a situation where the > >> pg_stat_replication.state is not updated until any wal record is sent > >> after started up. For example, I set up logical replication with 2 > >> subscriber and restart the publisher server, but I see the following > >> status for a while (maybe until autovacuum run). > > ... > > > >> Attached patch fixes this behavior by updating WalSndCaughtUp before > >> trying to read next WAL if already caught up. > > > > Looks like a bug that we should fix in PG10, with backpatch to 9.4 (or > > as far as it goes). > > > > Objections to commit? > > > > Seems we still have this issue. Any update or comment on this? Barring > any objections, I'll add this to the open item so it doesn't get > missed. [Action required within three days. This is a generic notification.] The above-described topic is currently a PostgreSQL 10 open item. Peter, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a v10 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within three calendar days of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed well in advance of shipping v10. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication status in logical replication
On Wed, Apr 12, 2017 at 5:31 AM, Simon Riggs wrote: > On 22 March 2017 at 02:50, Masahiko Sawada wrote: > >> When using logical replication, I ran into a situation where the >> pg_stat_replication.state is not updated until any wal record is sent >> after started up. For example, I set up logical replication with 2 >> subscriber and restart the publisher server, but I see the following >> status for a while (maybe until autovacuum run). > ... > >> Attached patch fixes this behavior by updating WalSndCaughtUp before >> trying to read next WAL if already caught up. > > Looks like a bug that we should fix in PG10, with backpatch to 9.4 (or > as far as it goes). > > Objections to commit? > Seems we still have this issue. Any update or comment on this? Barring any objections, I'll add this to the open item so it doesn't get missed. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication status in logical replication
On 22 March 2017 at 02:50, Masahiko Sawada wrote: > When using logical replication, I ran into a situation where the > pg_stat_replication.state is not updated until any wal record is sent > after started up. For example, I set up logical replication with 2 > subscriber and restart the publisher server, but I see the following > status for a while (maybe until autovacuum run). ... > Attached patch fixes this behavior by updating WalSndCaughtUp before > trying to read next WAL if already caught up. Looks like a bug that we should fix in PG10, with backpatch to 9.4 (or as far as it goes). Objections to commit? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Replication status in logical replication
Hi all, When using logical replication, I ran into a situation where the pg_stat_replication.state is not updated until any wal record is sent after started up. For example, I set up logical replication with 2 subscriber and restart the publisher server, but I see the following status for a while (maybe until autovacuum run). =# select application_name, state, sent_location, write_location, flush_location, replay_location, sync_state from pg_stat_replication ; application_name | state | sent_location | write_location | flush_location | replay_location | sync_state --+-+---+++-+ node1| catchup | 0/16329F8 | 0/16329F8 | 0/16329F8 | 0/16329F8 | potential node2| catchup | 0/16329F8 | 0/16329F8 | 0/16329F8 | 0/16329F8 | async (2 rows) It seems that all wal senders have caught up but pg_stat_replication.state is still "catchup". The reason of this behavior is that WalSndCaughtUp is updated only in WalSndWaitForWal in logical replication during running, and in logical_read_xlog_page always try to read next wal record (i.g. it calls WalSndWaitForWal(targetPagePtr + reqLen)). So WalSndWaitForWal cannot update WalSndCaughtUp until any new wal record is created after started up and wal sender read it. Attached patch fixes this behavior by updating WalSndCaughtUp before trying to read next WAL if already caught up. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center logical_repl_caught_up.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
On 2017-02-27 17:00:23 -0800, Joshua D. Drake wrote: > On 02/22/2017 02:45 PM, Tom Lane wrote: > > Andres Freund writes: > > > On 2017-02-22 08:43:28 -0500, Tom Lane wrote: > > > > (To be concrete, I'm suggesting dropping --disable-integer-datetimes > > > > in HEAD, and just agreeing that in the back branches, use of replication > > > > protocol with float-timestamp servers is not supported and we're not > > > > going to bother looking for related bugs there. Given the lack of field > > > > complaints, I do not believe anyone cares.) > > > > What I *am* willing to spend time on is removing float-timestamp code > > in HEAD. I've not yet heard anybody speak against doing that (or at > > least, nothing I interpreted as a vote against it). If I've not heard > > any complaints by tomorrow, I'll get started on that. > > Rip it out! Already happened: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b6aa17e0ae367afdcea07118e016111af4fa6bc3 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
On 02/22/2017 02:45 PM, Tom Lane wrote: Andres Freund writes: On 2017-02-22 08:43:28 -0500, Tom Lane wrote: (To be concrete, I'm suggesting dropping --disable-integer-datetimes in HEAD, and just agreeing that in the back branches, use of replication protocol with float-timestamp servers is not supported and we're not going to bother looking for related bugs there. Given the lack of field complaints, I do not believe anyone cares.) What I *am* willing to spend time on is removing float-timestamp code in HEAD. I've not yet heard anybody speak against doing that (or at least, nothing I interpreted as a vote against it). If I've not heard any complaints by tomorrow, I'll get started on that. Rip it out! Sincerely, JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. Unless otherwise stated, opinions are my own. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
On Mon, Feb 20, 2017 at 09:07:33AM -0500, Tom Lane wrote: > The question to be asked is whether there is still anybody out there > using float timestamps. I'm starting to get dubious about it myself. > Certainly, no packager that I'm aware of has shipped a float-timestamp > build since we switched the default in 8.4. Maybe there is somebody > who's faithfully built a float-timestamp custom build every year so they > can pg_upgrade in place from their 8.3 installation, but there have got > to be darn few such people. > > As for "proper deprecation period", the documentation has called the > option deprecated since 8.4: > > -disable-integer-datetimes > Disable support for 64-bit integer storage for timestamps and > intervals, and store datetime values as floating-point numbers > instead. Floating-point datetime storage was the default in > PostgreSQL releases prior to 8.4, but it is now deprecated, > because it does not support microsecond precision for the full > range of timestamp values. > > I think the strongest reason why we didn't move to kill it sooner was > that we were not then assuming that every platform had 64-bit ints; > but we've required that since 9.0. I agree with removing support for float timestamps in PG 10. It would be tempting to think we can reuse the bits we stopped using in 9.0 for VACUUM FULL, e.g.: #define HEAP_MOVED_OFF 0x4000 /* moved to another place by pre-9.0 * VACUUM FULL; kept for binary * upgrade support */ #define HEAP_MOVED_IN 0x8000 /* moved from another place by pre-9.0 * VACUUM FULL; kept for binary * upgrade support */ However, if users built Postgres 8.4 with integer timestamps, they could still be in the data files. Does anyone see a fault in my logic, I hope? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
Andres Freund writes: > On 2017-02-22 08:43:28 -0500, Tom Lane wrote: >> (To be concrete, I'm suggesting dropping --disable-integer-datetimes >> in HEAD, and just agreeing that in the back branches, use of replication >> protocol with float-timestamp servers is not supported and we're not >> going to bother looking for related bugs there. Given the lack of field >> complaints, I do not believe anyone cares.) > I think we should just fix it in the back branches, and drop the float > timestamp code in 10 or 11. I.e. 1) and then 4). I'm not personally interested in touching this issue in the back branches, but if you want to spend time on it, I surely won't stand in your way. What I *am* willing to spend time on is removing float-timestamp code in HEAD. I've not yet heard anybody speak against doing that (or at least, nothing I interpreted as a vote against it). If I've not heard any complaints by tomorrow, I'll get started on that. I envision the following work plan: * Reject --disable-integer-datetimes in configure. To avoid breaking existing build scripts unnecessarily, --enable-integer-datetimes will still be accepted. * Convert HAVE_INT64_TIMESTAMP to a fixed, always-defined symbol. (We shouldn't remove it entirely because that would break third-party code that might be testing it. Perhaps shove it in pg_config_manual.h.) * Possibly remove the enableIntTimes field from pg_control as well as associated logic in places like pg_controldata. There might be some argument for keeping this, though ... anyone have an opinion? pg_upgrade, at least, would need a bespoke test instead. * Remove appropriate user documentation. * Remove all core-code tests for HAVE_INT64_TIMESTAMP, along with the negative sides of those #if tests. * Change the places in the replication code that declare timestamp variables to declare them as TimestampTz not int64, and adjust nearby comments accordingly. (This will just be code beautification at this point, but it still seems like a good idea.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
On Mon, Feb 20, 2017 at 11:58:12AM +0100, Petr Jelinek wrote: > On 20/02/17 08:03, Andres Freund wrote: > > On 2017-02-19 10:49:29 -0500, Tom Lane wrote: > >> Robert Haas writes: > >>> On Sun, Feb 19, 2017 at 3:31 AM, Tom Lane wrote: > Thoughts? Should we double down on trying to make this work according > to the "all integer timestamps" protocol specs, or cut our losses and > change the specs? > >> > >>> I vote for doubling down. It's bad enough that we have so many > >>> internal details that depend on this setting; letting that cascade > >>> into the wire protocol seems like it's just letting the chaos spread > >>> farther and wider. > >> > >> How do you figure that it's not embedded in the wire protocol already? > >> Not only the replicated data for a timestamp column, but also the > >> client-visible binary I/O format, depend on this. I think having some > >> parts of the protocol use a different timestamp format than other parts > >> is simply weird, and as this exercise has shown, it's bug-prone as all > >> get out. > > > > I don't think it's that closely tied together atm. Things like > > pg_basebackup, pg_receivexlog etc should work, without having to match > > timestamp storage. Logical replication, unless your output plugin dumps > > data in binary / "raw" output, also works just fine across the timestamp > > divide. > > > > It doesn't sound that hard to add a SystemToIntTimestamp() function, > > given it only needs to do something if float timestamps are enabled? > > > > It's definitely not hard, we already have > IntegerTimestampToTimestampTz() which does the opposite conversion anyway. > > That being said, I did wonder myself if we should just deprecate float > timestamps as well. +1 for deprecating them. If we need a timestamp(tz) with a wider range, we are getting options we didn't have before for implementing it. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
Andrew Dunstan writes: > On 02/22/2017 10:21 AM, Jim Nasby wrote: >> Only in the catalog though, not the datums, right? I would think you >> could just change the oid in the catalog the same as you would for a >> table column. > No, in the datums. Yeah, I don't see any way that we could fob off float timestamps to an extension that would be completely transparent at the pg_upgrade level. And even a partial solution would be an enormous amount of fundamentally dead-end work. There has never been any project policy that promises everybody will be able to pg_upgrade until the end of time. I think it's not unreasonable for us to say that anyone still using float timestamps has to go through a dump and reload to get to v10. The original discussion about getting rid of them was ten years ago come May; that seems long enough. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
On 02/22/2017 10:21 AM, Jim Nasby wrote: > On 2/22/17 9:12 AM, Andres Freund wrote: >>> That would allow an in-place upgrade of >>> a really large cluster. A user would still need to modify their code >>> to use >>> the new type. >>> >>> Put another way: add ability for pg_upgrade to change the type of a >>> field. >>> There might be other uses for that as well. >> Type oids are unfortunately embedded into composite and array type data >> - we can do such changes for columns themselves, but it doesn't work if >> there's any array/composite members containing the to-be-changed type >> that are used as columns. > > Only in the catalog though, not the datums, right? I would think you > could just change the oid in the catalog the same as you would for a > table column. No, in the datums. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
On 2/22/17 9:12 AM, Andres Freund wrote: That would allow an in-place upgrade of a really large cluster. A user would still need to modify their code to use the new type. Put another way: add ability for pg_upgrade to change the type of a field. There might be other uses for that as well. Type oids are unfortunately embedded into composite and array type data - we can do such changes for columns themselves, but it doesn't work if there's any array/composite members containing the to-be-changed type that are used as columns. Only in the catalog though, not the datums, right? I would think you could just change the oid in the catalog the same as you would for a table column. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
On 2017-02-22 09:06:38 -0600, Jim Nasby wrote: > On 2/22/17 7:56 AM, Andres Freund wrote: > > It sounded more like Jim suggested a full blown SQL type, given that he > > replied to my concern about the possible need for a deprecation period > > due to pg_upgrade concerns. To be useful for that, we'd need a good > > chunk of magic, so all existing uses of timestamp[tz] are replaced with > > floattimestamp[tz], duplicate some code, add implicit casts, and accept > > that composites/arrays won't be fixed. That sounds like a fair amount > > of work to me, and we'd still have no way to remove the code without > > causing pain. > > Right, but I was thinking more in line with just providing the type (as an > extension, perhaps not even in core) and making it possible for pg_upgrade > to switch fields over to that type. Isn't the above what you'd need to do that? > That would allow an in-place upgrade of > a really large cluster. A user would still need to modify their code to use > the new type. > > Put another way: add ability for pg_upgrade to change the type of a field. > There might be other uses for that as well. Type oids are unfortunately embedded into composite and array type data - we can do such changes for columns themselves, but it doesn't work if there's any array/composite members containing the to-be-changed type that are used as columns. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
On 2/22/17 7:56 AM, Andres Freund wrote: On 2017-02-22 08:43:28 -0500, Tom Lane wrote: Andres Freund writes: On 2017-02-22 00:10:35 -0600, Jim Nasby wrote: I wounder if a separate "floatstamp" data type might fit the bill there. It might not be completely seamless, but it would be binary compatible. I don't really see what'd that solve. Seems to me this is a different name for what I already tried in <27694.1487456...@sss.pgh.pa.us>. It would be much better than doing nothing, IMO, but it would still leave lots of opportunities for mistakes. It sounded more like Jim suggested a full blown SQL type, given that he replied to my concern about the possible need for a deprecation period due to pg_upgrade concerns. To be useful for that, we'd need a good chunk of magic, so all existing uses of timestamp[tz] are replaced with floattimestamp[tz], duplicate some code, add implicit casts, and accept that composites/arrays won't be fixed. That sounds like a fair amount of work to me, and we'd still have no way to remove the code without causing pain. Right, but I was thinking more in line with just providing the type (as an extension, perhaps not even in core) and making it possible for pg_upgrade to switch fields over to that type. That would allow an in-place upgrade of a really large cluster. A user would still need to modify their code to use the new type. Put another way: add ability for pg_upgrade to change the type of a field. There might be other uses for that as well. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
Stephen Frost writes: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> While I'm generally not one to vote for dropping backwards-compatibility >> features, I have to say that I find #4 the most attractive of these >> options. It would result in getting rid of boatloads of under-tested >> code, whereas #2 would really just add more, and #3 at best maintains >> the status quo complexity-wise. > +1. BTW, I did a bit of digging in the archives, and found a couple of previous discussions around this: https://www.postgresql.org/message-id/flat/1178476313.18303.164.camel%40goldbach https://www.postgresql.org/message-id/flat/1287334597.8516.372.camel%40jdavis as well as numerous discussions of specific bugs associated with float timestamps, eg this isn't even the first time we've hit a float-timestamp replication bug: https://www.postgresql.org/message-id/flat/CAHGQGwFbqTDBrw95jek6_RvYG2%3DE-6o0HOpbeEcP6oWHJTLkUw%40mail.gmail.com In one or another of those threads, I opined that we'd have to keep float timestamps available for as long as we were supporting pg_upgrade from 8.3. However, I think that that argument hasn't really stood the test of time, because of the lack of packagers shipping builds with float timestamps turned on. There may be some number of installations that still have float timestamps active, but it's got to be a tiny number. And the list of reasons not to like float timestamps is very long. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
Tom, all, * Tom Lane (t...@sss.pgh.pa.us) wrote: > While I'm generally not one to vote for dropping backwards-compatibility > features, I have to say that I find #4 the most attractive of these > options. It would result in getting rid of boatloads of under-tested > code, whereas #2 would really just add more, and #3 at best maintains > the status quo complexity-wise. +1. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Replication vs. float timestamps is a disaster
On 2017-02-22 08:43:28 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2017-02-22 00:10:35 -0600, Jim Nasby wrote: > >> I wounder if a separate "floatstamp" data type might fit the bill there. It > >> might not be completely seamless, but it would be binary compatible. > > > I don't really see what'd that solve. > > Seems to me this is a different name for what I already tried in > <27694.1487456...@sss.pgh.pa.us>. It would be much better than doing > nothing, IMO, but it would still leave lots of opportunities for mistakes. It sounded more like Jim suggested a full blown SQL type, given that he replied to my concern about the possible need for a deprecation period due to pg_upgrade concerns. To be useful for that, we'd need a good chunk of magic, so all existing uses of timestamp[tz] are replaced with floattimestamp[tz], duplicate some code, add implicit casts, and accept that composites/arrays won't be fixed. That sounds like a fair amount of work to me, and we'd still have no way to remove the code without causing pain. > 1. Just patch the already-identified float-vs-int-timestamp bugs in as > localized a fashion as possible, and hope that there aren't any more and > that we never introduce any more. I find this hope foolish :-(, > especially in view of the fact that what started this discussion is a > newly-introduced bug of this ilk. Well, the newly introduced bug was just a copy of the existing code. I'm far less negative about this than you - we're not dealing with a whole lot of code here. I don't get why it'd be foolish to verify the limited amount of code dealing with replication protocol timestamp. > 4. Give up on float timestamps altogether. > > While I'm generally not one to vote for dropping backwards-compatibility > features, I have to say that I find #4 the most attractive of these > options. It would result in getting rid of boatloads of under-tested > code, whereas #2 would really just add more, and #3 at best maintains > the status quo complexity-wise. > (To be concrete, I'm suggesting dropping --disable-integer-datetimes > in HEAD, and just agreeing that in the back branches, use of replication > protocol with float-timestamp servers is not supported and we're not > going to bother looking for related bugs there. Given the lack of field > complaints, I do not believe anyone cares.) I think we should just fix it in the back branches, and drop the float timestamp code in 10 or 11. I.e. 1) and then 4). Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
Andres Freund writes: > On 2017-02-22 00:10:35 -0600, Jim Nasby wrote: >> I wounder if a separate "floatstamp" data type might fit the bill there. It >> might not be completely seamless, but it would be binary compatible. > I don't really see what'd that solve. Seems to me this is a different name for what I already tried in <27694.1487456...@sss.pgh.pa.us>. It would be much better than doing nothing, IMO, but it would still leave lots of opportunities for mistakes. To review the bidding a bit, it seems to me we've got four possible ways of dealing with this issue: 1. Just patch the already-identified float-vs-int-timestamp bugs in as localized a fashion as possible, and hope that there aren't any more and that we never introduce any more. I find this hope foolish :-(, especially in view of the fact that what started this discussion is a newly-introduced bug of this ilk. 2. Introduce some new notation, perhaps <27694.1487456...@sss.pgh.pa.us> plus new "sendtimestamp" and "recvtimestamp" functions, to try to provide some compiler-assisted support for getting it right. 3. Give up on "protocol timestamps are always integer style". 4. Give up on float timestamps altogether. While I'm generally not one to vote for dropping backwards-compatibility features, I have to say that I find #4 the most attractive of these options. It would result in getting rid of boatloads of under-tested code, whereas #2 would really just add more, and #3 at best maintains the status quo complexity-wise. I think it was clear from the day we switched to integer timestamps as the default that the days of float timestamps were numbered, and that we were only going to continue to support them as long as it wasn't costing a lot of effort. We have now reached a point at which it's clear that continuing to support them will have direct and significant costs. I say it's time to pull the plug. (To be concrete, I'm suggesting dropping --disable-integer-datetimes in HEAD, and just agreeing that in the back branches, use of replication protocol with float-timestamp servers is not supported and we're not going to bother looking for related bugs there. Given the lack of field complaints, I do not believe anyone cares.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
On 2017-02-22 00:10:35 -0600, Jim Nasby wrote: > On 2/20/17 5:04 AM, Andres Freund wrote: > > On 2017-02-20 11:58:12 +0100, Petr Jelinek wrote: > > > That being said, I did wonder myself if we should just deprecate float > > > timestamps as well. > > > > I think we need a proper deprecation period for that, given that the > > conversion away will be painful for pg_upgrade using people with big > > clusters. So I think we should fix this regardless... :( > > I wounder if a separate "floatstamp" data type might fit the bill there. It > might not be completely seamless, but it would be binary compatible. I don't really see what'd that solve. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
On 2/20/17 5:04 AM, Andres Freund wrote: On 2017-02-20 11:58:12 +0100, Petr Jelinek wrote: That being said, I did wonder myself if we should just deprecate float timestamps as well. I think we need a proper deprecation period for that, given that the conversion away will be painful for pg_upgrade using people with big clusters. So I think we should fix this regardless... :( I wounder if a separate "floatstamp" data type might fit the bill there. It might not be completely seamless, but it would be binary compatible. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
On 2/21/17 4:52 PM, James Cloos wrote: "TL" == Tom Lane writes: TL> The question to be asked is whether there is still anybody out there TL> using float timestamps. Gentoo's ebuild includes: $(use_enable !pg_legacytimestamp integer-datetimes) \ FWIW, last time I looked it was also an option in FreeBSD's ports, though I think it's defaulted to int since forever ago (like, 7.4 era). -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
> "TL" == Tom Lane writes: TL> The question to be asked is whether there is still anybody out there TL> using float timestamps. Gentoo's ebuild includes: $(use_enable !pg_legacytimestamp integer-datetimes) \ meaning that by default --enable-integer-datetimes is passed to configure, but if the pg_legacytimestamp use flag is set, then --disable-integer-datetimes is passed instead. They document it as: Use double precision floating-point numbers instead of 64-bit integers for timestamp storage. Ie, w/o any kind of deprecation notice. I don't know how many (how few?) add pg_legacytimestamp to USE when merging postgresql. But it is still available as of 9.6 and also with their live build of git://git.postgresql.org/git/postgresql.git. -JimC -- James Cloos OpenPGP: 0x997A9F17ED7DAEA6 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
Robert Haas writes: > On Mon, Feb 20, 2017 at 7:37 PM, Tom Lane wrote: >> The question to be asked is whether there is still anybody out there >> using float timestamps. I'm starting to get dubious about it myself. > I'm wondering if it has any effect that pg_config.h.win32 says > /* Define to 1 if you want 64-bit integer timestamp and interval support. >(--enable-integer-datetimes) */ > /* #undef USE_INTEGER_DATETIMES */ > Whereas pg_config.h.win32 says: > /* Define to 1 if you want 64-bit integer timestamp and interval support. >(--enable-integer-datetimes) */ > #define USE_INTEGER_DATETIMES 1 Er, what? For me, grep finds src/include/pg_config.h.in: 836: #undef USE_INTEGER_DATETIMES src/include/pg_config.h.win32: 630: /* #undef USE_INTEGER_DATETIMES */ > It looks like it was commit 2169e42bef9db7e0bdd1bea00b81f44973ad83c8 > that enabled integer datetimes by default, but that commit seems to > not to have touched the Windows build scripts. Commit > fcf053d7829f2d83829256153e856f9a36c83ffd changed MSVC over to use > integer datetimes by default, but I'm not clear if there's any build > environment where we rely on config.h.win32 but not Solution.pm? Any such build would find itself without a defined value of BLCKSZ, to mention just one of the settings that get appended by Solution.pm. It does look like we expect pg_config.h.win32 to be usable standalone for libpq-only Windows compiles, but it would never work for building the backend. (I dunno whether the libpq-only scripts still work at all or have bitrotted, but it's irrelevant for the question at hand.) > If not, what exactly is pg_config.h.win32 for and to what degree does it > need to be in sync with pg_config.h.in? The list of differences > appears to be far more extensive than the header comment at the top of > pg_config.h.win32 would lead one to believe. Yeah, I think the maintenance of pg_config.h.win32 has been a bit more haphazard than one could wish. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
On Mon, Feb 20, 2017 at 7:37 PM, Tom Lane wrote: > The question to be asked is whether there is still anybody out there > using float timestamps. I'm starting to get dubious about it myself. > Certainly, no packager that I'm aware of has shipped a float-timestamp > build since we switched the default in 8.4. Maybe there is somebody > who's faithfully built a float-timestamp custom build every year so they > can pg_upgrade in place from their 8.3 installation, but there have got > to be darn few such people. I'm wondering if it has any effect that pg_config.h.win32 says /* Define to 1 if you want 64-bit integer timestamp and interval support. (--enable-integer-datetimes) */ /* #undef USE_INTEGER_DATETIMES */ Whereas pg_config.h.win32 says: /* Define to 1 if you want 64-bit integer timestamp and interval support. (--enable-integer-datetimes) */ #define USE_INTEGER_DATETIMES 1 It looks like it was commit 2169e42bef9db7e0bdd1bea00b81f44973ad83c8 that enabled integer datetimes by default, but that commit seems to not to have touched the Windows build scripts. Commit fcf053d7829f2d83829256153e856f9a36c83ffd changed MSVC over to use integer datetimes by default, but I'm not clear if there's any build environment where we rely on config.h.win32 but not Solution.pm? If not, what exactly is pg_config.h.win32 for and to what degree does it need to be in sync with pg_config.h.in? The list of differences appears to be far more extensive than the header comment at the top of pg_config.h.win32 would lead one to believe. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
Petr Jelinek writes: > It's definitely not hard, we already have > IntegerTimestampToTimestampTz() which does the opposite conversion anyway. It's not the functions that are hard, it's making sure that you have used them in the correct places, and declared relevant variables with the appropriate types. Evidence on the ground is that that is very hard; I have little faith that we'll get it right without compiler support. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
Andres Freund writes: > On 2017-02-20 11:58:12 +0100, Petr Jelinek wrote: >> That being said, I did wonder myself if we should just deprecate float >> timestamps as well. > I think we need a proper deprecation period for that, given that the > conversion away will be painful for pg_upgrade using people with big > clusters. So I think we should fix this regardless... :( The question to be asked is whether there is still anybody out there using float timestamps. I'm starting to get dubious about it myself. Certainly, no packager that I'm aware of has shipped a float-timestamp build since we switched the default in 8.4. Maybe there is somebody who's faithfully built a float-timestamp custom build every year so they can pg_upgrade in place from their 8.3 installation, but there have got to be darn few such people. As for "proper deprecation period", the documentation has called the option deprecated since 8.4: -disable-integer-datetimes Disable support for 64-bit integer storage for timestamps and intervals, and store datetime values as floating-point numbers instead. Floating-point datetime storage was the default in PostgreSQL releases prior to 8.4, but it is now deprecated, because it does not support microsecond precision for the full range of timestamp values. I think the strongest reason why we didn't move to kill it sooner was that we were not then assuming that every platform had 64-bit ints; but we've required that since 9.0. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
On 20/02/17 12:04, Andres Freund wrote: > On 2017-02-20 11:58:12 +0100, Petr Jelinek wrote: >> That being said, I did wonder myself if we should just deprecate float >> timestamps as well. > > I think we need a proper deprecation period for that, given that the > conversion away will be painful for pg_upgrade using people with big > clusters. So I think we should fix this regardless... :( > That's a good point. Attached should fix the logical replication problems. I am not quite sure if there is anything in physical that needs changing. I opted for GetCurrentIntegerTimestamp() in the reply code as that's the same coding walreceiver uses. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c index 142cd99..dd5bdcc 100644 --- a/src/backend/replication/logical/proto.c +++ b/src/backend/replication/logical/proto.c @@ -46,7 +46,7 @@ logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn) /* fixed fields */ pq_sendint64(out, txn->final_lsn); - pq_sendint64(out, txn->commit_time); + pq_sendint64(out, TimestampTzToIntegerTimestamp(txn->commit_time)); pq_sendint(out, txn->xid, 4); } @@ -60,7 +60,7 @@ logicalrep_read_begin(StringInfo in, LogicalRepBeginData *begin_data) begin_data->final_lsn = pq_getmsgint64(in); if (begin_data->final_lsn == InvalidXLogRecPtr) elog(ERROR, "final_lsn not set in begin message"); - begin_data->committime = pq_getmsgint64(in); + begin_data->committime = IntegerTimestampToTimestampTz(pq_getmsgint64(in)); begin_data->xid = pq_getmsgint(in, 4); } @@ -82,7 +82,7 @@ logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn, /* send fields */ pq_sendint64(out, commit_lsn); pq_sendint64(out, txn->end_lsn); - pq_sendint64(out, txn->commit_time); + pq_sendint64(out, TimestampTzToIntegerTimestamp(txn->commit_time)); } /* @@ -100,7 +100,7 @@ logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data) /* read fields */ commit_data->commit_lsn = pq_getmsgint64(in); commit_data->end_lsn = pq_getmsgint64(in); - commit_data->committime = pq_getmsgint64(in); + commit_data->committime = IntegerTimestampToTimestampTz(pq_getmsgint64(in)); } /* diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c index 0b19fec..225ea4c 100644 --- a/src/backend/replication/logical/worker.c +++ b/src/backend/replication/logical/worker.c @@ -1183,7 +1183,7 @@ send_feedback(XLogRecPtr recvpos, bool force, bool requestReply) pq_sendint64(reply_message, recvpos); /* write */ pq_sendint64(reply_message, flushpos); /* flush */ pq_sendint64(reply_message, writepos); /* apply */ - pq_sendint64(reply_message, now); /* sendTime */ + pq_sendint64(reply_message, GetCurrentIntegerTimestamp()); /* sendTime */ pq_sendbyte(reply_message, requestReply); /* replyRequested */ elog(DEBUG2, "sending feedback (force %d) to recv %X/%X, write %X/%X, flush %X/%X", diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c index 9b4c012..1dd469d 100644 --- a/src/backend/utils/adt/timestamp.c +++ b/src/backend/utils/adt/timestamp.c @@ -1749,6 +1749,20 @@ IntegerTimestampToTimestampTz(int64 timestamp) #endif /* + * TimestampTzToIntegerTimestamp -- convert a native format timestamp to int64 + * + * When compiled with --enable-integer-datetimes, this is implemented as a + * no-op macro. + */ +#ifndef HAVE_INT64_TIMESTAMP +int64 +TimestampTzToIntegerTimestamp(TimestampTz timestamp) +{ + return timestamp * USECS_PER_SEC; +} +#endif + +/* * GetSQLCurrentTimestamp -- implements CURRENT_TIMESTAMP, CURRENT_TIMESTAMP(n) */ TimestampTz diff --git a/src/include/utils/timestamp.h b/src/include/utils/timestamp.h index 21651b1..765fa81 100644 --- a/src/include/utils/timestamp.h +++ b/src/include/utils/timestamp.h @@ -108,9 +108,11 @@ extern bool TimestampDifferenceExceeds(TimestampTz start_time, #ifndef HAVE_INT64_TIMESTAMP extern int64 GetCurrentIntegerTimestamp(void); extern TimestampTz IntegerTimestampToTimestampTz(int64 timestamp); +extern int64 TimestampTzToIntegerTimestamp(TimestampTz timestamp); #else #define GetCurrentIntegerTimestamp() GetCurrentTimestamp() #define IntegerTimestampToTimestampTz(timestamp) (timestamp) +#define TimestampTzToIntegerTimestamp(timestamp) (timestamp) #endif extern TimestampTz time_t_to_timestamptz(pg_time_t tm); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
On 2017-02-20 11:58:12 +0100, Petr Jelinek wrote: > That being said, I did wonder myself if we should just deprecate float > timestamps as well. I think we need a proper deprecation period for that, given that the conversion away will be painful for pg_upgrade using people with big clusters. So I think we should fix this regardless... :( -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
On 20/02/17 08:03, Andres Freund wrote: > On 2017-02-19 10:49:29 -0500, Tom Lane wrote: >> Robert Haas writes: >>> On Sun, Feb 19, 2017 at 3:31 AM, Tom Lane wrote: Thoughts? Should we double down on trying to make this work according to the "all integer timestamps" protocol specs, or cut our losses and change the specs? >> >>> I vote for doubling down. It's bad enough that we have so many >>> internal details that depend on this setting; letting that cascade >>> into the wire protocol seems like it's just letting the chaos spread >>> farther and wider. >> >> How do you figure that it's not embedded in the wire protocol already? >> Not only the replicated data for a timestamp column, but also the >> client-visible binary I/O format, depend on this. I think having some >> parts of the protocol use a different timestamp format than other parts >> is simply weird, and as this exercise has shown, it's bug-prone as all >> get out. > > I don't think it's that closely tied together atm. Things like > pg_basebackup, pg_receivexlog etc should work, without having to match > timestamp storage. Logical replication, unless your output plugin dumps > data in binary / "raw" output, also works just fine across the timestamp > divide. > > It doesn't sound that hard to add a SystemToIntTimestamp() function, > given it only needs to do something if float timestamps are enabled? > It's definitely not hard, we already have IntegerTimestampToTimestampTz() which does the opposite conversion anyway. That being said, I did wonder myself if we should just deprecate float timestamps as well. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
On 2017-02-19 10:49:29 -0500, Tom Lane wrote: > Robert Haas writes: > > On Sun, Feb 19, 2017 at 3:31 AM, Tom Lane wrote: > >> Thoughts? Should we double down on trying to make this work according > >> to the "all integer timestamps" protocol specs, or cut our losses and > >> change the specs? > > > I vote for doubling down. It's bad enough that we have so many > > internal details that depend on this setting; letting that cascade > > into the wire protocol seems like it's just letting the chaos spread > > farther and wider. > > How do you figure that it's not embedded in the wire protocol already? > Not only the replicated data for a timestamp column, but also the > client-visible binary I/O format, depend on this. I think having some > parts of the protocol use a different timestamp format than other parts > is simply weird, and as this exercise has shown, it's bug-prone as all > get out. I don't think it's that closely tied together atm. Things like pg_basebackup, pg_receivexlog etc should work, without having to match timestamp storage. Logical replication, unless your output plugin dumps data in binary / "raw" output, also works just fine across the timestamp divide. It doesn't sound that hard to add a SystemToIntTimestamp() function, given it only needs to do something if float timestamps are enabled? Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
On Sun, Feb 19, 2017 at 9:19 PM, Tom Lane wrote: > Robert Haas writes: >> On Sun, Feb 19, 2017 at 3:31 AM, Tom Lane wrote: >>> Thoughts? Should we double down on trying to make this work according >>> to the "all integer timestamps" protocol specs, or cut our losses and >>> change the specs? > >> I vote for doubling down. It's bad enough that we have so many >> internal details that depend on this setting; letting that cascade >> into the wire protocol seems like it's just letting the chaos spread >> farther and wider. > > How do you figure that it's not embedded in the wire protocol already? > Not only the replicated data for a timestamp column, but also the > client-visible binary I/O format, depend on this. I think having some > parts of the protocol use a different timestamp format than other parts > is simply weird, and as this exercise has shown, it's bug-prone as all > get out. You've got a point, but if we don't make the replication protocol consistently use integers, then every client that cares about those fields has to be able to handle either format, or at least detect that it got a different format than it was expecting and fail cleanly. How's that going to work? Also, I feel like there's some difference between the data stored in a cluster and the metadata which describes the cluster. Making the interpretation of the metadata depend on the cluster configuration feels circular, like using non-ASCII characters in the name of an encoding. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
Robert Haas writes: > On Sun, Feb 19, 2017 at 3:31 AM, Tom Lane wrote: >> Thoughts? Should we double down on trying to make this work according >> to the "all integer timestamps" protocol specs, or cut our losses and >> change the specs? > I vote for doubling down. It's bad enough that we have so many > internal details that depend on this setting; letting that cascade > into the wire protocol seems like it's just letting the chaos spread > farther and wider. How do you figure that it's not embedded in the wire protocol already? Not only the replicated data for a timestamp column, but also the client-visible binary I/O format, depend on this. I think having some parts of the protocol use a different timestamp format than other parts is simply weird, and as this exercise has shown, it's bug-prone as all get out. > Also, I wonder if we could consider deprecating and removing > --disable-integer-datetimes at some point. Seems like a different conversation. Although given the lack of replication bug reports so far, maybe nobody is using --disable-integer-datetimes anymore. Certainly, fixing these bugs by removing the --disable-integer-datetimes option would be a lot less painful than trying to make it actually work per protocol spec. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
On Sun, Feb 19, 2017 at 3:31 AM, Tom Lane wrote: > Thoughts? Should we double down on trying to make this work according > to the "all integer timestamps" protocol specs, or cut our losses and > change the specs? I vote for doubling down. It's bad enough that we have so many internal details that depend on this setting; letting that cascade into the wire protocol seems like it's just letting the chaos spread farther and wider. Also, I wonder if we could consider deprecating and removing --disable-integer-datetimes at some point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Replication vs. float timestamps is a disaster
Both the streaming replication and logical replication areas of the code are, approximately, utterly broken when !HAVE_INT64_TIMESTAMPS. (The fact that "make check-world" passes anyway is an indictment of the quality of the regression tests.) I started poking around in this area after Thomas Munro reported that pg_recvlogical.c no longer even compiles with --disable-integer-datetimes, but the problems go much deeper than that. replication/logical/worker.c's send_feedback() is sending a backend timestamp directly as sendTime. This is wrong per the protocol spec, which says the field is an integer timestamp. I'm not sure if it's OK to just replace -pq_sendint64(reply_message, now);/* sendTime */ +pq_sendint64(reply_message, GetCurrentIntegerTimestamp()); /* sendTime */ ... is it important for the sent timestamp to exactly match the timestamp that was used earlier in the function for deciding whether to send or not? Should we instead try to convert the earlier logic to use integer timestamps? As for logical replication, logicalrep_write_begin and logicalrep_write_commit are likewise sending TimestampTz values using pq_sendint64, and this is much harder to patch locally since the values were taken from WAL records that contain TimestampTz. Although the sent values are nonsensical according to the protocol spec, logicalrep_read_begin and logicalrep_read_commit read them with pq_getmsgint64 and assign the results back to TimestampTz, which means that things sort of appear to work as long as you don't mind losing all sub-second precision in the timestamps. (But a decoder that was written to spec would think the values are garbage.) I'm inclined to think that at least for logical replication, we're going to have to give up the protocol spec wording that says that timestamps are integers, and instead admit that they are dependent on disable-integer-datetimes. Maybe we should do the same in the streaming replication protocol, because this area is going to be a very fertile source of bugs for the foreseeable future if we don't. It's not like replication between enable-integer-datetimes and disable-integer-datetimes hosts has any chance of being useful anyway. Thoughts? Should we double down on trying to make this work according to the "all integer timestamps" protocol specs, or cut our losses and change the specs? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down
On 13 January 2017 at 10:17, Ants Aasma wrote: > On 5 Jan 2017 2:54 a.m., "Craig Ringer" wrote: >> Ants, do you think you'll have a chance to convert your shell script >> test into a TAP test in src/test/recovery? >> >> Simon has said he would like to commit this fix. I'd personally be >> happier if it had a test to go with it. >> >> You could probably just add to src/test/recover/t/001 which now >> contains my additions for hot standby. > > Do you feel the test in the attached patch is enough or would you like > to see anything else covered? That looks good, thanks for the patch. Applying in next few hours, barring objections; then backpatching code (without tests). -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
On Wed, Jan 11, 2017 at 2:09 AM, Michael Paquier wrote: > On Wed, Jan 11, 2017 at 10:06 AM, David Steele > wrote: > > On 1/10/17 3:06 PM, Stephen Frost wrote: > >> * Magnus Hagander (mag...@hagander.net) wrote: > >>> On Tue, Jan 10, 2017 at 8:03 PM, Robert Haas > wrote: > > > I may be outvoted, but I'm still not in favor of changing the default > wal_level. That caters only to people who lack sufficient foresight > to know that they need a replica before the system becomes so critical > that they can't bounce it to update the configuration. > >>> > >>> True. But the current level only caters to those people who run large > ETL > >>> jobs without doing any tuning on their system (at least none that would > >>> require a restart), or another one of the fairly specific workloads. > >>> > >>> And as I keep re-iterating, it's not just about replicas, it's also > about > >>> the ability to make proper backups. Which is a pretty fundamental > feature. > >>> > >>> I do think you are outvoted, yes :) At least that's the result of my > >>> tallying up the people who have spoken out on the thread. > >> > >> I tend to agree with Magnus on this, being able to perform an online > >> backup is pretty darn important. > > > > Agreed and +1. > > +1'ing. > I've pushed this. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down
On 5 Jan 2017 2:54 a.m., "Craig Ringer" wrote: > Ants, do you think you'll have a chance to convert your shell script > test into a TAP test in src/test/recovery? > > Simon has said he would like to commit this fix. I'd personally be > happier if it had a test to go with it. > > You could probably just add to src/test/recover/t/001 which now > contains my additions for hot standby. Do you feel the test in the attached patch is enough or would you like to see anything else covered? Regards, Ants Aasma diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index c6b54ec..0ab936f 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -1157,7 +1157,10 @@ XLogWalRcvSendReply(bool force, bool requestReply) * in case they don't have a watch. * * If the user disables feedback, send one final message to tell sender - * to forget about the xmin on this standby. + * to forget about the xmin on this standby. We also send this message + * on first connect because a previous connection might have set xmin + * on a replication slot. (If we're not using a slot it's harmless to + * send a feedback message explicitly setting InvalidTransactionId). */ static void XLogWalRcvSendHSFeedback(bool immed) @@ -1167,7 +1170,8 @@ XLogWalRcvSendHSFeedback(bool immed) uint32 nextEpoch; TransactionId xmin; static TimestampTz sendTime = 0; - static bool master_has_standby_xmin = false; + /* initially true so we always send at least one feedback message */ + static bool master_has_standby_xmin = true; /* * If the user doesn't want status to be reported to the master, be sure @@ -1192,14 +1196,17 @@ XLogWalRcvSendHSFeedback(bool immed) } /* - * If Hot Standby is not yet active there is nothing to send. Check this - * after the interval has expired to reduce number of calls. + * If Hot Standby is not yet accepting connections there is nothing to + * send. Check this after the interval has expired to reduce number of + * calls. + * + * Bailing out here also ensures that we don't send feedback until we've + * read our own replication slot state, so we don't tell the master to + * discard needed xmin or catalog_xmin from any slots that may exist + * on this replica. */ if (!HotStandbyActive()) - { - Assert(!master_has_standby_xmin); return; - } /* * Make the expensive call to get the oldest xmin once we are certain diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl index eef512d..7fb2e9e 100644 --- a/src/test/recovery/t/001_stream_rep.pl +++ b/src/test/recovery/t/001_stream_rep.pl @@ -3,7 +3,7 @@ use strict; use warnings; use PostgresNode; use TestLib; -use Test::More tests => 22; +use Test::More tests => 24; # Initialize master node my $node_master = get_new_node('master'); @@ -161,3 +161,22 @@ is($catalog_xmin, '', 'non-cascaded slot xmin still null with hs_feedback reset' ($xmin, $catalog_xmin) = get_slot_xmins($node_standby_1, $slotname_2); is($xmin, '', 'cascaded slot xmin null with hs feedback reset'); is($catalog_xmin, '', 'cascaded slot xmin still null with hs_feedback reset'); + +diag "re-enabling hot_standby_feedback and disabling while stopped"; +$node_standby_2->safe_psql('postgres', 'ALTER SYSTEM SET hot_standby_feedback = on;'); +$node_standby_2->reload; + +$node_master->safe_psql('postgres', qq[INSERT INTO tab_int VALUES (11000);]); +replay_check(); + +$node_standby_2->safe_psql('postgres', 'ALTER SYSTEM SET hot_standby_feedback = off;'); +$node_standby_2->stop; + +($xmin, $catalog_xmin) = get_slot_xmins($node_standby_1, $slotname_2); +isnt($xmin, '', 'cascaded slot xmin non-null with postgres shut down'); + +# Xmin from a previous run should be cleared on startup. +$node_standby_2->start; + +($xmin, $catalog_xmin) = get_slot_xmins($node_standby_1, $slotname_2); +is($xmin, '', 'cascaded slot xmin reset after startup with hs feedback reset'); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
On Wed, Jan 11, 2017 at 10:06 AM, David Steele wrote: > On 1/10/17 3:06 PM, Stephen Frost wrote: >> * Magnus Hagander (mag...@hagander.net) wrote: >>> On Tue, Jan 10, 2017 at 8:03 PM, Robert Haas wrote: > I may be outvoted, but I'm still not in favor of changing the default wal_level. That caters only to people who lack sufficient foresight to know that they need a replica before the system becomes so critical that they can't bounce it to update the configuration. >>> >>> True. But the current level only caters to those people who run large ETL >>> jobs without doing any tuning on their system (at least none that would >>> require a restart), or another one of the fairly specific workloads. >>> >>> And as I keep re-iterating, it's not just about replicas, it's also about >>> the ability to make proper backups. Which is a pretty fundamental feature. >>> >>> I do think you are outvoted, yes :) At least that's the result of my >>> tallying up the people who have spoken out on the thread. >> >> I tend to agree with Magnus on this, being able to perform an online >> backup is pretty darn important. > > Agreed and +1. +1'ing. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
On 1/10/17 3:06 PM, Stephen Frost wrote: > * Magnus Hagander (mag...@hagander.net) wrote: >> On Tue, Jan 10, 2017 at 8:03 PM, Robert Haas wrote: >>> I may be outvoted, but I'm still not in favor of changing the default >>> wal_level. That caters only to people who lack sufficient foresight >>> to know that they need a replica before the system becomes so critical >>> that they can't bounce it to update the configuration. >> >> True. But the current level only caters to those people who run large ETL >> jobs without doing any tuning on their system (at least none that would >> require a restart), or another one of the fairly specific workloads. >> >> And as I keep re-iterating, it's not just about replicas, it's also about >> the ability to make proper backups. Which is a pretty fundamental feature. >> >> I do think you are outvoted, yes :) At least that's the result of my >> tallying up the people who have spoken out on the thread. > > I tend to agree with Magnus on this, being able to perform an online > backup is pretty darn important. Agreed and +1. -- -David da...@pgmasters.net signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Replication/backup defaults
Greetings, * Magnus Hagander (mag...@hagander.net) wrote: > On Tue, Jan 10, 2017 at 8:03 PM, Robert Haas wrote: > > > On Mon, Jan 9, 2017 at 11:02 AM, Peter Eisentraut > > wrote: > > > On 1/9/17 7:44 AM, Magnus Hagander wrote: > > >> So based on that, I suggest we go ahead and make the change to make both > > >> the values 10 by default. And that we do that now, because that lets us > > >> get it out through more testing on different platforms, so that we catch > > >> issues earlier on if they do arise. > > > > > > Sounds good. > > > > I may be outvoted, but I'm still not in favor of changing the default > > wal_level. That caters only to people who lack sufficient foresight > > to know that they need a replica before the system becomes so critical > > that they can't bounce it to update the configuration. > > True. But the current level only caters to those people who run large ETL > jobs without doing any tuning on their system (at least none that would > require a restart), or another one of the fairly specific workloads. > > And as I keep re-iterating, it's not just about replicas, it's also about > the ability to make proper backups. Which is a pretty fundamental feature. > > I do think you are outvoted, yes :) At least that's the result of my > tallying up the people who have spoken out on the thread. I tend to agree with Magnus on this, being able to perform an online backup is pretty darn important. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Replication/backup defaults
On Tue, Jan 10, 2017 at 8:03 PM, Robert Haas wrote: > On Mon, Jan 9, 2017 at 11:02 AM, Peter Eisentraut > wrote: > > On 1/9/17 7:44 AM, Magnus Hagander wrote: > >> So based on that, I suggest we go ahead and make the change to make both > >> the values 10 by default. And that we do that now, because that lets us > >> get it out through more testing on different platforms, so that we catch > >> issues earlier on if they do arise. > > > > Sounds good. > > I may be outvoted, but I'm still not in favor of changing the default > wal_level. That caters only to people who lack sufficient foresight > to know that they need a replica before the system becomes so critical > that they can't bounce it to update the configuration. > True. But the current level only caters to those people who run large ETL jobs without doing any tuning on their system (at least none that would require a restart), or another one of the fairly specific workloads. And as I keep re-iterating, it's not just about replicas, it's also about the ability to make proper backups. Which is a pretty fundamental feature. I do think you are outvoted, yes :) At least that's the result of my tallying up the people who have spoken out on the thread. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Replication/backup defaults
On Mon, Jan 9, 2017 at 11:02 AM, Peter Eisentraut wrote: > On 1/9/17 7:44 AM, Magnus Hagander wrote: >> So based on that, I suggest we go ahead and make the change to make both >> the values 10 by default. And that we do that now, because that lets us >> get it out through more testing on different platforms, so that we catch >> issues earlier on if they do arise. > > Sounds good. I may be outvoted, but I'm still not in favor of changing the default wal_level. That caters only to people who lack sufficient foresight to know that they need a replica before the system becomes so critical that they can't bounce it to update the configuration. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
On 1/9/17 7:44 AM, Magnus Hagander wrote: > So based on that, I suggest we go ahead and make the change to make both > the values 10 by default. And that we do that now, because that lets us > get it out through more testing on different platforms, so that we catch > issues earlier on if they do arise. Sounds good. > It would be interesting to find out why it's limited as well, of course, > but I don't think we need to wait for that. After some testing and searching for documentation, it seems that at least the BSD platforms have a very low default semmns setting (apparently 60, which leads to max_connections=30). -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
On Sun, Jan 8, 2017 at 2:19 AM, Jim Nasby wrote: > On 1/5/17 2:50 PM, Tomas Vondra wrote: > >> Ultimately, the question is whether the number of people running into >> "Hey, I can't take pg_basebackup or setup a standby with the default >> config!" is higher or lower than number of people running into "Hey, >> CREATE TABLE + COPY is slower now!" >> > > I'm betting it's way higher. Loads of folks use Postgres and never do any > kind of ETL. I'm willing to say "the majority". > That is not to say there are no other cases benefiting from those >> optimizations, but we're talking about the default value - we're not >> removing the wal_level=minimal. >> > > This would be a non-issue if we provided example configs for a few > different workloads. Obviously those would never be optimal either, but > they *would* show users what settings they should immediately look at > changing in their environment. It might also be worthwhile to provide a section in the docs just saying "these are the parameters you probably want to look at for workload " rather than an actual example configuration. Including a short sentence or two about why. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Replication/backup defaults
On Sat, Jan 7, 2017 at 7:57 PM, Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > On 1/7/17 6:23 AM, Magnus Hagander wrote: > > In the build farm, I have found 6 critters that do not end up with > the > > 100/128MB setting: sidewinder, curculio, coypu, brolga, lorikeet, > > opossum. I wonder what limitations initdb is bumping against. > > > > > > Since you lookeda t the data -- they did not end up with 100, but what's > > the lowest they did end up with? Did they go all the way down to 10? > > They all ended up on 30 or 40. > > The only documented way this could happen is if the semaphore > configuration does not allow enough room, but this would have to be a > very particular setting on all these quite different boxes. > > So based on that, I suggest we go ahead and make the change to make both the values 10 by default. And that we do that now, because that lets us get it out through more testing on different platforms, so that we catch issues earlier on if they do arise. It would be interesting to find out why it's limited as well, of course, but I don't think we need to wait for that. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Replication/backup defaults
On 1/5/17 2:50 PM, Tomas Vondra wrote: Ultimately, the question is whether the number of people running into "Hey, I can't take pg_basebackup or setup a standby with the default config!" is higher or lower than number of people running into "Hey, CREATE TABLE + COPY is slower now!" I'm betting it's way higher. Loads of folks use Postgres and never do any kind of ETL. That is not to say there are no other cases benefiting from those optimizations, but we're talking about the default value - we're not removing the wal_level=minimal. This would be a non-issue if we provided example configs for a few different workloads. Obviously those would never be optimal either, but they *would* show users what settings they should immediately look at changing in their environment. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
On 1/7/17 6:23 AM, Magnus Hagander wrote: > In the build farm, I have found 6 critters that do not end up with the > 100/128MB setting: sidewinder, curculio, coypu, brolga, lorikeet, > opossum. I wonder what limitations initdb is bumping against. > > > Since you lookeda t the data -- they did not end up with 100, but what's > the lowest they did end up with? Did they go all the way down to 10? They all ended up on 30 or 40. The only documented way this could happen is if the semaphore configuration does not allow enough room, but this would have to be a very particular setting on all these quite different boxes. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
On Sat, Jan 7, 2017 at 1:27 AM, Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > On 1/5/17 12:01 PM, Andres Freund wrote: > > On 2017-01-05 08:38:32 -0500, Peter Eisentraut wrote: > >> I also suggest making the defaults for both 20 instead of 10. That > >> leaves enough room that almost nobody ever has to change them, whereas > >> 10 can be a bit tight for some not-outrageous installations (8 standbys > >> plus backup?). > > > > I'm afraid we need to add initdb integration / testing for those. I mean > > we have initdb test down to 10 connections to deal with limited > > resources... > > Those initdb defaults were last touched in 2005, before the use of > System V shared memory was reduced to a minimum. It might be worth > revisiting that. The only way to end up with a low number of connection > slots would seem to be a very low semaphore configuration. > > In the build farm, I have found 6 critters that do not end up with the > 100/128MB setting: sidewinder, curculio, coypu, brolga, lorikeet, > opossum. I wonder what limitations initdb is bumping against. > > Since you lookeda t the data -- they did not end up with 100, but what's the lowest they did end up with? Did they go all the way down to 10? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Replication/backup defaults
On 1/5/17 12:01 PM, Andres Freund wrote: > On 2017-01-05 08:38:32 -0500, Peter Eisentraut wrote: >> I also suggest making the defaults for both 20 instead of 10. That >> leaves enough room that almost nobody ever has to change them, whereas >> 10 can be a bit tight for some not-outrageous installations (8 standbys >> plus backup?). > > I'm afraid we need to add initdb integration / testing for those. I mean > we have initdb test down to 10 connections to deal with limited > resources... Those initdb defaults were last touched in 2005, before the use of System V shared memory was reduced to a minimum. It might be worth revisiting that. The only way to end up with a low number of connection slots would seem to be a very low semaphore configuration. In the build farm, I have found 6 critters that do not end up with the 100/128MB setting: sidewinder, curculio, coypu, brolga, lorikeet, opossum. I wonder what limitations initdb is bumping against. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
On 1/5/17 4:56 PM, Michael Banck wrote: >> You can't actually change the other two without changing wal_level. > That actually goes both ways: I recently saw a server not start cause we > were experimenting with temporarily setting wal_level to minimal for > initial bulk loading, but did not reduce max_wal_senders back to zero. > So it failed at startup with 'FATAL: WAL streaming (max_wal_senders > > 0) requires wal_level "replica" or "logical"'. I think that was the point: You can't change the default of max_wal_senders without also changing the default of wal_level. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
On Mon, Jan 02, 2017 at 10:21:41AM +0100, Magnus Hagander wrote: > On Mon, Jan 2, 2017 at 10:17 AM, Simon Riggs wrote: > > On 31 December 2016 at 15:00, Magnus Hagander wrote: > > > max_wal_senders=10 > > > max_replication_slots=20 [...] > > > wal_level=replica > > > > This is more problematic because it changes behaviours. > > You can't actually change the other two without changing wal_level. That actually goes both ways: I recently saw a server not start cause we were experimenting with temporarily setting wal_level to minimal for initial bulk loading, but did not reduce max_wal_senders back to zero. So it failed at startup with 'FATAL: WAL streaming (max_wal_senders > 0) requires wal_level "replica" or "logical"'. I don't want to hijack this thread, but I wonder whether the name "*max*_wal_senders" really conveys that dependence on wal_level (there's no comment to that end in the postgresql.conf sample) and/or whether maybe the admin should just be notified that WAL streaming is turned off cause wal_level < 'replica'? Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
On 01/05/2017 05:37 PM, Stephen Frost wrote: Tomas, * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: On 01/05/2017 02:23 PM, Magnus Hagander wrote: It's easy enough to construct a benchmark specifically to show the difference, but of any actual "normal workload" for it. Typically the optimization applies to things like bulk loading, which typically never done alone and does not lend itself to that type of benchmarking very easily. Not sure if I understand correctly what you're saying. You're saying that although it'd be easy to construct a benchmark showing significant performance impact, it won't represent a common workload. Correct? I think he's saying that it's not very easy to construct a good example of typical bulk-loading workloads using just pgbench. Bulk loading certainly happens with PG and I don't think we'll make very many friends if we break optimizations when wal_level is set to minimal like those you get using: BEGIN; CREATE TABLE x (c1 int); COPY x FROM STDIN; COMMIT; or: BEGIN; TRUNCATE x; COPY x FROM STDIN; COMMIT; Changing the wal_level from 'minimal' to 'replica' or 'logical' with such a benchmark is going to make the WAL go from next-to-nothing to size-of-database. Sure, I do know how to construct such workloads - and it's trivial even with pgbench custom scripts. The question is whether such workloads are common or not. Most importantly, no one is proposing to break the optimizations, but changing the defaults - users relying on the optimizations are free to switch back to wal_level=minimal if needed. > One doesn't typically *just* do bulk loads, however, often it's a bulk load into a table and then the contents of that table are merged with another table or perhaps joined to it to produce some report or something along those lines. In many of those cases, our more-recently added capability to have UNLOGGED tables will work, but not all (in particular, it can be very handy to load everything in using the above technique and then switch the wal_level to replica, which avoids having to have the bulk of the data sent through WAL, something you can't avoid if you want to turn an unlogged table into a logged one). Ultimately, the question is whether the number of people running into "Hey, I can't take pg_basebackup or setup a standby with the default config!" is higher or lower than number of people running into "Hey, CREATE TABLE + COPY is slower now!" I haven't seen many systems relying on such load optimizations, for a number of reasons: 1) The important/critical systems usually have replicas, so are inherently incompatible with wal_level=minimal. 2) The batch jobs usually don't truncate the main table, but load the increment into a temporary/unlogged table first, then merge it into the main one. That is not to say there are no other cases benefiting from those optimizations, but we're talking about the default value - we're not removing the wal_level=minimal. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down
On 5 Jan 2017 2:54 a.m., "Craig Ringer" wrote: On 2 January 2017 at 22:24, Craig Ringer wrote: > > > On 2 Jan. 2017 20:20, "Simon Riggs" wrote: > > On 21 December 2016 at 13:23, Simon Riggs wrote: > >> Fix it up and I'll commit. Thanks for the report. > > I was hoping for some more effort from Ants to correct this. > > I'll commit Craig's new tests for hs feedback before this, so it can > go in with a Tap test, so I imagine we're about a week away from > commit on this. > > > I posted a revised version of Ants's patch that passes the shell script > test. > > A TAP test would likely make sene though, I agree. Ants, do you think you'll have a chance to convert your shell script test into a TAP test in src/test/recovery? Simon has said he would like to commit this fix. I'd personally be happier if it had a test to go with it. You could probably just add to src/test/recover/t/001 which now contains my additions for hot standby. I'm travelling right now, but I should be able to give it a shot next week. Regards, Ants Aasma
Re: [HACKERS] Replication/backup defaults
On 2017-01-05 09:12:49 -0800, Andres Freund wrote: > On 2017-01-05 18:08:36 +0100, Magnus Hagander wrote: > > On Thu, Jan 5, 2017 at 6:01 PM, Andres Freund wrote: > > > > > On 2017-01-05 08:38:32 -0500, Peter Eisentraut wrote: > > > > I also suggest making the defaults for both 20 instead of 10. That > > > > leaves enough room that almost nobody ever has to change them, whereas > > > > 10 can be a bit tight for some not-outrageous installations (8 standbys > > > > plus backup?). > > > > > > I'm afraid we need to add initdb integration / testing for those. I mean > > > we have initdb test down to 10 connections to deal with limited > > > resources... > > > > > > > If we make both 10 by default we should be OK though, no? > > I'm a bit doubtful about that. On the other hand, we've increased > max_parallel_workers without anybody complaining. Err, I mean max_worker_processes. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
On 2017-01-05 18:08:36 +0100, Magnus Hagander wrote: > On Thu, Jan 5, 2017 at 6:01 PM, Andres Freund wrote: > > > On 2017-01-05 08:38:32 -0500, Peter Eisentraut wrote: > > > I also suggest making the defaults for both 20 instead of 10. That > > > leaves enough room that almost nobody ever has to change them, whereas > > > 10 can be a bit tight for some not-outrageous installations (8 standbys > > > plus backup?). > > > > I'm afraid we need to add initdb integration / testing for those. I mean > > we have initdb test down to 10 connections to deal with limited > > resources... > > > > If we make both 10 by default we should be OK though, no? I'm a bit doubtful about that. On the other hand, we've increased max_parallel_workers without anybody complaining. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
On Thu, Jan 5, 2017 at 6:01 PM, Andres Freund wrote: > On 2017-01-05 08:38:32 -0500, Peter Eisentraut wrote: > > I also suggest making the defaults for both 20 instead of 10. That > > leaves enough room that almost nobody ever has to change them, whereas > > 10 can be a bit tight for some not-outrageous installations (8 standbys > > plus backup?). > > I'm afraid we need to add initdb integration / testing for those. I mean > we have initdb test down to 10 connections to deal with limited > resources... > If we make both 10 by default we should be OK though, no? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Replication/backup defaults
On 2017-01-05 08:38:32 -0500, Peter Eisentraut wrote: > I also suggest making the defaults for both 20 instead of 10. That > leaves enough room that almost nobody ever has to change them, whereas > 10 can be a bit tight for some not-outrageous installations (8 standbys > plus backup?). I'm afraid we need to add initdb integration / testing for those. I mean we have initdb test down to 10 connections to deal with limited resources... Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
Tomas, * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: > On 01/05/2017 02:23 PM, Magnus Hagander wrote: > >It's easy enough to construct a benchmark specifically to show the > >difference, but of any actual "normal workload" for it. Typically the > >optimization applies to things like bulk loading, which typically never > >done alone and does not lend itself to that type of benchmarking very > >easily. > > Not sure if I understand correctly what you're saying. You're saying > that although it'd be easy to construct a benchmark showing > significant performance impact, it won't represent a common > workload. Correct? I think he's saying that it's not very easy to construct a good example of typical bulk-loading workloads using just pgbench. Bulk loading certainly happens with PG and I don't think we'll make very many friends if we break optimizations when wal_level is set to minimal like those you get using: BEGIN; CREATE TABLE x (c1 int); COPY x FROM STDIN; COMMIT; or: BEGIN; TRUNCATE x; COPY x FROM STDIN; COMMIT; Changing the wal_level from 'minimal' to 'replica' or 'logical' with such a benchmark is going to make the WAL go from next-to-nothing to size-of-database. One doesn't typically *just* do bulk loads, however, often it's a bulk load into a table and then the contents of that table are merged with another table or perhaps joined to it to produce some report or something along those lines. In many of those cases, our more-recently added capability to have UNLOGGED tables will work, but not all (in particular, it can be very handy to load everything in using the above technique and then switch the wal_level to replica, which avoids having to have the bulk of the data sent through WAL, something you can't avoid if you want to turn an unlogged table into a logged one). Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Replication/backup defaults
On 01/05/2017 02:23 PM, Magnus Hagander wrote: On Thu, Jan 5, 2017 at 12:44 AM, Tomas Vondra mailto:tomas.von...@2ndquadrant.com>> wrote: On 01/03/2017 11:56 PM, Tomas Vondra wrote: Hi, ... I'll push results for larger ones once those tests complete (possibly tomorrow). I just pushed additional results (from the additional scales) to the git repositories. On the larger (16/32-cores) machine with 2x e5-2620, the results look like this scale minimal replica logical - 100 23968 24393 24393 100023412 23656 23794 15283 53205197 and on the smaller one (i5-2500k with 4 cores) I got this: scale minimal replica logical - 50 5884 58965873 400 5324 53425478 1000 5341 54395425 The scales were chosen so that the smallest one fits into shared buffers, the medium exceeds shared buffers but still fits into RAM, and the largest scale exceeds RAM. The results seem to confirm that for this workload (regular pgbench), there's very little difference between the different WAL levels. Actually, the 'replica' seems a tad faster than 'minimal', but the difference may be easily due to noise. I've also looked at the amount of WAL actually produced, by doing pgbench runs throttled to the same throughput, and counting the number of archived WAL segments & running pg_xlogdump. Interestingly enough, those two metrics differ quite a bit - for example for scale 1000 (on the 32-core machine), the 2h runs produced these number of WAL segments: minimal: 5515 (88.2GB) replica: 5587 (89.4GB) logical: 6058 (96.9GB) so 'replica' adds ~1.3% and 'logical' ~9.8%. But per pg_xlogdump, the WAL amounts are only 73.3GB, 73.9GB and 74.4GB - a difference of only ~1.5% between minimal and logical. The values are also much lower than raw WAL size, so I assume it's because pg_xlogdump ignores some extra overhead, present in the segments. Moreover, the sequential nature of WAL writes means even the +10% is not a big deal (unless it results in saturating the bandwidth, but running on >90% is a bad idea anyway). If you are using log archiving, it also means your log archive grows by 10% (well, 8% assuming it was 9.8% on top of 0, not on top of replica). ... and that the standby has to chew through the additional 10% of WAL. We already have standbys that occasionally struggle to keep up with the master, and adding more load won't make them happy (even if just 10%). My conclusion from these results is that using 'wal_level=replica' by default seems fine. Perhaps even wal_level=logical would be OK, but that's probably a too big step for 10.0. I think it sounds like 'replica' is the safe default. If we can make it possible to go replica<->logical without a restart, that makes it easy enough to increase it if necessary, and the default still applies to most people (most people take backups, most people probably don't do logical replication). My thoughts, exactly. Any ideas how to construct a plausible workload where the differences are significantly larger? Running the tests on non-SSD storage might also be useful. It's easy enough to construct a benchmark specifically to show the difference, but of any actual "normal workload" for it. Typically the optimization applies to things like bulk loading, which typically never done alone and does not lend itself to that type of benchmarking very easily. Not sure if I understand correctly what you're saying. You're saying that although it'd be easy to construct a benchmark showing significant performance impact, it won't represent a common workload. Correct? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
On 1/4/17 2:44 PM, Peter Eisentraut wrote: > On 1/4/17 9:46 AM, Magnus Hagander wrote: >> How about we default max_replication_slots to -1, which means to use the >> same value as max_wal_senders? > >> But you don't necessarily want to adjust them together, do you? They are >> both capped by max_connections, but I don't think they have any other >> direct relation between each other? > > I think the most usual case is that you use approximately one > replication slot per wal sender slot. So it would be a good default to > make them equal. Well, let's worry about that later. I think everyone is now in agreement with your original change proposal. My suggestion would be to make the defaults of max_wal_senders and max_replication_slots the same, so we don't open an opportunity for ongoing discussions about why they are different and how different they should be. Ideally, we would make max_replication_slots go away at some point similar to my suggestion above. I also suggest making the defaults for both 20 instead of 10. That leaves enough room that almost nobody ever has to change them, whereas 10 can be a bit tight for some not-outrageous installations (8 standbys plus backup?). Your patch looks OK, documentation changes look good. I wouldn't be surprised if we found another place or two that will need updating, but that is not a big deal. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
On Thu, Jan 5, 2017 at 12:44 AM, Tomas Vondra wrote: > On 01/03/2017 11:56 PM, Tomas Vondra wrote: > >> Hi, >> >> ... > >> I'll push results for larger ones once those tests complete (possibly >> tomorrow). >> >> > I just pushed additional results (from the additional scales) to the git > repositories. On the larger (16/32-cores) machine with 2x e5-2620, the > results look like this > >scale minimal replica logical > - >100 23968 24393 24393 >100023412 23656 23794 >15283 53205197 > > and on the smaller one (i5-2500k with 4 cores) I got this: > >scale minimal replica logical > - >50 5884 58965873 >400 5324 53425478 >1000 5341 54395425 > > The scales were chosen so that the smallest one fits into shared buffers, > the medium exceeds shared buffers but still fits into RAM, and the largest > scale exceeds RAM. > > The results seem to confirm that for this workload (regular pgbench), > there's very little difference between the different WAL levels. Actually, > the 'replica' seems a tad faster than 'minimal', but the difference may be > easily due to noise. > > I've also looked at the amount of WAL actually produced, by doing pgbench > runs throttled to the same throughput, and counting the number of archived > WAL segments & running pg_xlogdump. Interestingly enough, those two metrics > differ quite a bit - for example for scale 1000 (on the 32-core machine), > the 2h runs produced these number of WAL segments: > >minimal: 5515 (88.2GB) >replica: 5587 (89.4GB) >logical: 6058 (96.9GB) > > so 'replica' adds ~1.3% and 'logical' ~9.8%. But per pg_xlogdump, the WAL > amounts are only 73.3GB, 73.9GB and 74.4GB - a difference of only ~1.5% > between minimal and logical. The values are also much lower than raw WAL > size, so I assume it's because pg_xlogdump ignores some extra overhead, > present in the segments. Moreover, the sequential nature of WAL writes > means even the +10% is not a big deal (unless it results in saturating the > bandwidth, but running on >90% is a bad idea anyway). > If you are using log archiving, it also means your log archive grows by 10% (well, 8% assuming it was 9.8% on top of 0, not on top of replica). > > My conclusion from these results is that using 'wal_level=replica' by > default seems fine. Perhaps even wal_level=logical would be OK, but that's > probably a too big step for 10.0. > I think it sounds like 'replica' is the safe default. If we can make it possible to go replica<->logical without a restart, that makes it easy enough to increase it if necessary, and the default still applies to most people (most people take backups, most people probably don't do logical replication). > Any ideas how to construct a plausible workload where the differences are > significantly larger? Running the tests on non-SSD storage might also be > useful. > > It's easy enough to construct a benchmark specifically to show the difference, but of any actual "normal workload" for it. Typically the optimization applies to things like bulk loading, which typically never done alone and does not lend itself to that type of benchmarking very easily. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Replication/backup defaults
On 01/03/2017 11:56 PM, Tomas Vondra wrote: Hi, ... I'll push results for larger ones once those tests complete (possibly tomorrow). I just pushed additional results (from the additional scales) to the git repositories. On the larger (16/32-cores) machine with 2x e5-2620, the results look like this scale minimal replica logical - 100 23968 24393 24393 100023412 23656 23794 15283 53205197 and on the smaller one (i5-2500k with 4 cores) I got this: scale minimal replica logical - 50 5884 58965873 400 5324 53425478 1000 5341 54395425 The scales were chosen so that the smallest one fits into shared buffers, the medium exceeds shared buffers but still fits into RAM, and the largest scale exceeds RAM. The results seem to confirm that for this workload (regular pgbench), there's very little difference between the different WAL levels. Actually, the 'replica' seems a tad faster than 'minimal', but the difference may be easily due to noise. I've also looked at the amount of WAL actually produced, by doing pgbench runs throttled to the same throughput, and counting the number of archived WAL segments & running pg_xlogdump. Interestingly enough, those two metrics differ quite a bit - for example for scale 1000 (on the 32-core machine), the 2h runs produced these number of WAL segments: minimal: 5515 (88.2GB) replica: 5587 (89.4GB) logical: 6058 (96.9GB) so 'replica' adds ~1.3% and 'logical' ~9.8%. But per pg_xlogdump, the WAL amounts are only 73.3GB, 73.9GB and 74.4GB - a difference of only ~1.5% between minimal and logical. The values are also much lower than raw WAL size, so I assume it's because pg_xlogdump ignores some extra overhead, present in the segments. Moreover, the sequential nature of WAL writes means even the +10% is not a big deal (unless it results in saturating the bandwidth, but running on >90% is a bad idea anyway). My conclusion from these results is that using 'wal_level=replica' by default seems fine. Perhaps even wal_level=logical would be OK, but that's probably a too big step for 10.0. Any ideas how to construct a plausible workload where the differences are significantly larger? Running the tests on non-SSD storage might also be useful. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down
On 2 January 2017 at 22:24, Craig Ringer wrote: > > > On 2 Jan. 2017 20:20, "Simon Riggs" wrote: > > On 21 December 2016 at 13:23, Simon Riggs wrote: > >> Fix it up and I'll commit. Thanks for the report. > > I was hoping for some more effort from Ants to correct this. > > I'll commit Craig's new tests for hs feedback before this, so it can > go in with a Tap test, so I imagine we're about a week away from > commit on this. > > > I posted a revised version of Ants's patch that passes the shell script > test. > > A TAP test would likely make sene though, I agree. Ants, do you think you'll have a chance to convert your shell script test into a TAP test in src/test/recovery? Simon has said he would like to commit this fix. I'd personally be happier if it had a test to go with it. You could probably just add to src/test/recover/t/001 which now contains my additions for hot standby. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
On 3 January 2017 at 12:34, Michael Paquier wrote: > On Mon, Jan 2, 2017 at 10:55 PM, Simon Riggs wrote: >> In the hope of making things better in 10.0, I remove my objection. If >> people want to use wal_level = minimal they can restart their server >> and they can find that out in the release notes. >> >> Should we set wal_level = replica or wal_level = logical as the >> default for 10.0? > > replica sounds like a better default to me as most users use at least > archiving. Logical decoding is still fresh though, and its use is not > that wide. Have there been any study on its performance impact > compared to replica by the way? Magnus' arguments should also be applied to wal_level = logical since users will be surprised if they cannot use the logical replication features we are adding as a main feature of 10.0. Why go through the same pain again? And if preventing their use is acceptable for the user, we should treat it as a performance feature to reduce the wal_level. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
On 1/4/17 9:46 AM, Magnus Hagander wrote: > How about we default max_replication_slots to -1, which means to use the > same value as max_wal_senders? > But you don't necessarily want to adjust them together, do you? They are > both capped by max_connections, but I don't think they have any other > direct relation between each other? I think the most usual case is that you use approximately one replication slot per wal sender slot. So it would be a good default to make them equal. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
On 12/31/16 10:00 AM, Magnus Hagander wrote: > max_wal_senders=10 > max_replication_slots=20 How about we default max_replication_slots to -1, which means to use the same value as max_wal_senders? I think this would address the needs of 99% of users. If we do like you suggest, there are going to be very many users who forget to adjust these two values together, and very many who will do it but will be puzzled and annoyed by it. Since we're now pushing the use of replication slots even more (your pg_basebackup change, upcoming logical replication), I think this could be a major source of misconfigurations. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
On Wed, Jan 4, 2017 at 3:43 PM, Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > On 12/31/16 10:00 AM, Magnus Hagander wrote: > > max_wal_senders=10 > > max_replication_slots=20 > > How about we default max_replication_slots to -1, which means to use the > same value as max_wal_senders? > > I think this would address the needs of 99% of users. If we do like you > suggest, there are going to be very many users who forget to adjust > these two values together, and very many who will do it but will be > puzzled and annoyed by it. Since we're now pushing the use of > replication slots even more (your pg_basebackup change, upcoming logical > replication), I think this could be a major source of misconfigurations. > But you don't necessarily want to adjust them together, do you? They are both capped by max_connections, but I don't think they have any other direct relation between each other? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Replication/backup defaults
On 01/03/2017 01:34 PM, Michael Paquier wrote: On Mon, Jan 2, 2017 at 10:55 PM, Simon Riggs wrote: In the hope of making things better in 10.0, I remove my objection. If people want to use wal_level = minimal they can restart their server and they can find that out in the release notes. Should we set wal_level = replica or wal_level = logical as the default for 10.0? replica sounds like a better default to me as most users use at least archiving. Logical decoding is still fresh though, and its use is not that wide. Have there been any study on its performance impact compared to replica by the way? I've just posted results for some benchmarks I'm running. Those are some simple pgbench tests, nothing special, and according to those results the performance impact (logical vs. replica) is negligible. I can run additional tests with other workloads, of course. While we can probably construct workloads where the difference is measurable, I'm not sure performance impact is the main concern here. As you point out, we have 'replica' since 9.0 effectively, while logical is much newer, so perhaps there are some hidden bugs? It'd be embarrassing to pick 'logical' and hurt everyone, even if they don't get any benefit from wal_level=logical. So +1 to 'replica' and allowing switching to 'logical' without restart. That should not be extremely difficult, as the main culprit seems to be max_wal_senders/max_replication_slots requiring shared memory. But with 'replica' we already have those enabled/allocated, unlike when switching from 'minimal' to 'replica'. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
Hi, On 12/31/2016 04:00 PM, Magnus Hagander wrote: Cycling back to this topic again, but this time at the beginning of a CF. Here's an actual patch to change: wal_level=replica max_wal_senders=10 max_replication_slots=20 Based on feedback from last year (https://www.postgresql.org/message-id/CABUevEwfV7zDutescm2PHGvsJdYA0RWHFMTRGhwrJPGgSbzZDQ%40mail.gmail.com): There were requests for benchmarks of performance difference. Tomas has promised to run a couple of benchmarks on his standard benchmarking setups to give numbers on that. Thanks Tomas, please pipe in with your results when you have them! > As promised, I'm running some benchmarks, and I have some early results to report. And perhaps we can discuss whether we need to test some additional workloads. I'm 100% on board with the idea that we should switch to wal_level which allows taking backups or setting-up a streaming replica, as long as it does not cause severe performance regression in common workloads. So while it'd be trivial to construct workloads demonstrating the optimizations in wal_level=minimal (e.g. initial loads doing CREATE TABLE + COPY + CREATE INDEX in a single transaction), but that would be mostly irrelevant I guess. Instead, I've decided to run regular pgbench TPC-B-like workload on a bunch of different scales, and measure throughput + some xlog stats with each of the three wal_level options. Note: I tweaked the code a bit to allow archiving with "minimal" WAL level, to allow computing WAL stats on the archived segments (instead of keeping all segments in the data directory). As usual, I'm running it on two machines - a small old one (i5-2500k box with 4 cores and 8GB of RAM) and a new one (2x e5-2620v4 with 16/32 cores, 64GB of RAM). Both machines have SSD-based storage. The clusters on both machines were reasonably tuned, see 'settings.log' for each run. The tests are fairly long, covering multiple checkpoints etc. In other words, the results should be fairly stable. The scripts/results/stats/configs are available here: * https://bitbucket.org/tvondra/wal-levels-e2620-v4/src * https://bitbucket.org/tvondra/wal-levels-i5/src So far I only have results for the smallest data sets (50 on i5 and 100 on e5), which easily fits into shared_buffers in both cases, and the numbers look like this: minimal replica standby i5-2500k 5884 5896 5873 e5-2620v4 239682439324259 So the performance penalty of replica/standby WAL levels on this workload is pretty much non-existent - for the larger machine those levels are actually a tad faster than 'minimal', but the difference is within 2% (so might easily be noise). I'll push results for larger ones once those tests complete (possibly tomorrow). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
On Mon, Jan 2, 2017 at 10:55 PM, Simon Riggs wrote: > In the hope of making things better in 10.0, I remove my objection. If > people want to use wal_level = minimal they can restart their server > and they can find that out in the release notes. > > Should we set wal_level = replica or wal_level = logical as the > default for 10.0? replica sounds like a better default to me as most users use at least archiving. Logical decoding is still fresh though, and its use is not that wide. Have there been any study on its performance impact compared to replica by the way? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down
On 2 Jan. 2017 20:20, "Simon Riggs" wrote: On 21 December 2016 at 13:23, Simon Riggs wrote: > Fix it up and I'll commit. Thanks for the report. I was hoping for some more effort from Ants to correct this. I'll commit Craig's new tests for hs feedback before this, so it can go in with a Tap test, so I imagine we're about a week away from commit on this. I posted a revised version of Ants's patch that passes the shell script test. A TAP test would likely make sene though, I agree.
Re: [HACKERS] Replication/backup defaults
On 2 January 2017 at 09:39, Magnus Hagander wrote: > The conclusion has been that our defaults should really allow people to take > backups of their systems, and they currently don't. > > Making things run faster is tuning, and people should expect to do that if > they need things to run faster. But being able to make a backup is pretty > fundamental. In the hope of making things better in 10.0, I remove my objection. If people want to use wal_level = minimal they can restart their server and they can find that out in the release notes. Should we set wal_level = replica or wal_level = logical as the default for 10.0? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down
On 21 December 2016 at 13:23, Simon Riggs wrote: > Fix it up and I'll commit. Thanks for the report. I was hoping for some more effort from Ants to correct this. I'll commit Craig's new tests for hs feedback before this, so it can go in with a Tap test, so I imagine we're about a week away from commit on this. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
On 2 January 2017 at 09:48, Simon Riggs wrote: > I'm willing to assist in a project to allow changing wal_level online > in this release. Please let's follow that path. wal_level looks like one of the easier ones to change without a server restart There are actions to take in either direction, up or down. My initial thoughts on the pseudocode would be... reset wal_level so all new transactions see that value /* actions after setting new value */ if (old_wal_level < new_wal_level) /* going up */ get list of running transactions (perhaps only those using no-WAL-opt) else /* coming down */ { if (old_wal_level == logical) disconnect logical replication and disallow logical slots if (new_wal_level == minimal) disconnect streaming replication and disallow physical slots } wait for a checkpoint (fast checkpoint if no other transactions actions active) if (list) wait for list of running xacts to complete wait for a checkpoint (fast checkpoint if no other transactions actions active) XLogReportParameters() So it looks easier to go up than down, which is good since that is the important direction. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
On 2017-01-02 10:31:28 +, Simon Riggs wrote: > We must listen to feedback, not just try to blast through it. Not agreeing with your priorities isn't "blasting through feedback". -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
On 2 January 2017 at 10:13, Andres Freund wrote: > On 2017-01-02 11:05:05 +0100, Magnus Hagander wrote: >> My claim here is that a lot *fewer* people have come to expect this >> performance optimization, than would (quite reasonably) expect that backups >> should work on a system without taking it down for restart to reconfigure >> it to support that. > > +1 > > As evidenced by the fact that a large fraction of those optimizations > are actually currently entirely broken. Without anybody noticing for > years: > http://archives.postgresql.org/message-id/20150702220524.GA9392%40svana.org No, the optimization works, but there is a bug in it that makes it unsafe, not the same thing as entirely broken. That clearly needs to be fixed, but it does not prevent the performance benefit, so that argument is invalid. We must listen to feedback, not just try to blast through it. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
On 2017-01-02 11:05:05 +0100, Magnus Hagander wrote: > My claim here is that a lot *fewer* people have come to expect this > performance optimization, than would (quite reasonably) expect that backups > should work on a system without taking it down for restart to reconfigure > it to support that. +1 As evidenced by the fact that a large fraction of those optimizations are actually currently entirely broken. Without anybody noticing for years: http://archives.postgresql.org/message-id/20150702220524.GA9392%40svana.org Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
On Mon, Jan 2, 2017 at 10:48 AM, Simon Riggs wrote: > On 2 January 2017 at 09:39, Magnus Hagander wrote: > > > Please do submit a patch for it. > > The way this is supposed to go is someone submits a patch and they > receive feedback, then act on that feedback. If I was able to get away > with deflecting all review comments with a simple "you fix it if you > don't like it" there would be considerably more patches with my name > on it accepted, but probably no further forward in real terms because > of the loose ends it creates. > Fair enough. It's just that people keep saying that this is easy, and have said so for a long time, but nobody has written a patch for it. > In this case, simply changing the default will remove a whole class of > performance optimization that we have educated people to expect. I'm > sorry to point this out but removing that will cause many real changes > for people's systems that we will not be thanked for, even though I > understand your reasoning and wish the same goals to be achieved. > My claim here is that a lot *fewer* people have come to expect this performance optimization, than would (quite reasonably) expect that backups should work on a system without taking it down for restart to reconfigure it to support that. I run into that all the time. I hear complaints about that all the time. I have not heard a single user complain about performance loss after enabling backups. And how many people that rely on this optimization don't do any *other* optimization on their system *anyway*, that would cause them to require a restart anyway? It's not like we're taking away their ability to enable the optimization, it's just not on by default. > I'm willing to assist in a project to allow changing wal_level online > in this release. Please let's follow that path. > Sure thing, I will be happy to help test and review such a patch. I will still argue that the *default* should be wal_level=replica though. Because once we have such a patch, it's trivial to re-enable this performance optimization (at the cost of backups and replication). //Magnus
Re: [HACKERS] Replication/backup defaults
On 2 January 2017 at 09:39, Magnus Hagander wrote: > Please do submit a patch for it. The way this is supposed to go is someone submits a patch and they receive feedback, then act on that feedback. If I was able to get away with deflecting all review comments with a simple "you fix it if you don't like it" there would be considerably more patches with my name on it accepted, but probably no further forward in real terms because of the loose ends it creates. In this case, simply changing the default will remove a whole class of performance optimization that we have educated people to expect. I'm sorry to point this out but removing that will cause many real changes for people's systems that we will not be thanked for, even though I understand your reasoning and wish the same goals to be achieved. I'm willing to assist in a project to allow changing wal_level online in this release. Please let's follow that path. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication/backup defaults
On Mon, Jan 2, 2017 at 10:32 AM, Simon Riggs wrote: > On 2 January 2017 at 09:21, Magnus Hagander wrote: > > > > > > On Mon, Jan 2, 2017 at 10:17 AM, Simon Riggs > wrote: > >> > >> On 31 December 2016 at 15:00, Magnus Hagander > wrote: > >> > Cycling back to this topic again, but this time at the beginning of a > >> > CF. > >> > > >> > Here's an actual patch to change: > >> > > >> > > >> > max_wal_senders=10 > >> > max_replication_slots=20 > >> > >> +1 > >> > >> If that doesn't fly, it seems easy enough to introduce a > >> "min_reserve_limit" GUC that defaults to 10 that gives a lower bound > >> on the amount of memory we reserve for many of those shmem allocs; > >> that can be set to 0 for people that want the old behaviour. Then we > >> can allow changes up to the memory limit without a restart. > >> > >> > wal_level=replica > >> > >> This is more problematic because it changes behaviours. > > > > > > You can't actually change the other two without changing wal_level. > > You could, but we currently disallow it. > I always assumed we disallowed it because we would have to write actual code to make it safe. > >> A more useful approach would be to bring all the things needed to > >> enable replication into one ALTER SYSTEM command, so people have just > >> one thing they need to execute and it will work out the details and > >> locations for you. > >> That way we can maintain the defaults yet make it easier to enable in > >> a useful way. > > > > > > Sure, that would be great - the core being the ability to change these > > things without a restart. But I would argue for not letting perfection > get > > in the way of progress, and do this anyway. I doubt there is any way the > > bigger change is going to get done for 10 at this point, so we should > give > > people the ability to do backups off a default installation already. > > We could fairly easily change wal_level without restart; its been > discussed for many years. > > The problem from my perspective is that you're immediately turning off > the performance benefits for initial bulk loads. > We've had this discussion many times over. Please see for example the thread I referenced. The conclusion has been that our defaults should really allow people to take backups of their systems, and they currently don't. Making things run faster is tuning, and people should expect to do that if they need things to run faster. But being able to make a backup is pretty fundamental. Arguing how that isn't a problem looks at least as time consuming as > fixing the problem. Please do submit a patch for it. I don't know exactly what's involved in that part, I just know that people have been complaining about this at least since 9.0 was released so our track record of actually fixing it isn't very good. I'm not arguing that it's not a problem, btw. I'm arguing that until we can solve the problem, we're much better off letting people do backups and set up things like replication than optimizing for a usecase that many never hit. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Replication/backup defaults
On 2 January 2017 at 09:21, Magnus Hagander wrote: > > > On Mon, Jan 2, 2017 at 10:17 AM, Simon Riggs wrote: >> >> On 31 December 2016 at 15:00, Magnus Hagander wrote: >> > Cycling back to this topic again, but this time at the beginning of a >> > CF. >> > >> > Here's an actual patch to change: >> > >> > >> > max_wal_senders=10 >> > max_replication_slots=20 >> >> +1 >> >> If that doesn't fly, it seems easy enough to introduce a >> "min_reserve_limit" GUC that defaults to 10 that gives a lower bound >> on the amount of memory we reserve for many of those shmem allocs; >> that can be set to 0 for people that want the old behaviour. Then we >> can allow changes up to the memory limit without a restart. >> >> > wal_level=replica >> >> This is more problematic because it changes behaviours. > > > You can't actually change the other two without changing wal_level. You could, but we currently disallow it. >> A more useful approach would be to bring all the things needed to >> enable replication into one ALTER SYSTEM command, so people have just >> one thing they need to execute and it will work out the details and >> locations for you. >> That way we can maintain the defaults yet make it easier to enable in >> a useful way. > > > Sure, that would be great - the core being the ability to change these > things without a restart. But I would argue for not letting perfection get > in the way of progress, and do this anyway. I doubt there is any way the > bigger change is going to get done for 10 at this point, so we should give > people the ability to do backups off a default installation already. We could fairly easily change wal_level without restart; its been discussed for many years. The problem from my perspective is that you're immediately turning off the performance benefits for initial bulk loads. Arguing how that isn't a problem looks at least as time consuming as fixing the problem. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers