Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On Wed, Mar 10, 2010 at 6:29 AM, Josh Berkus j...@agliodbs.com wrote: Then I increased vacuum_defer_cleanup_age to 10, which represents about 5 minutes of transactions on the test system. This eliminated all query cancels for the reporting query, which takes an average of 10s. Next is a database bloat test, but I'll need to do that on a system with more free space than my laptop. Note that this will be heavily dependent on the use case. If you have one of those counter records that keeps being updated and gets cleaned up by HOT whenever the page fills up then you need to allow HOT to clean it up before it overflows the page or else it'll bloat the table and require a real vacuum. I think that means that a vacuum_defer_cleanup of up to about 100 or so (it depends on the width of your counter record) might be reasonable as a general suggestion but anything higher will depend on understanding the specific system. Another use case that might suprise people who are accustomed to the current behaviour is massive updates. This is the main really pessimal use case left in Postgres -- ideally they wouldn't bloat the table at all but currently they double the size of the table. People may be accustomed to the idea that they can then run vacuum and that will limit the bloat to 50%, assuming they have no (other) long-lived transactions. With vacuum_defer_cleanup that will no longer be true. It will be as if you always have a query lasting n transactions in your system at all times. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On 3/10/10 3:38 AM, Greg Stark wrote: I think that means that a vacuum_defer_cleanup of up to about 100 or so (it depends on the width of your counter record) might be reasonable as a general suggestion but anything higher will depend on understanding the specific system. 100 wouldn't be useful at all. It would increase bloat without doing anything about query cancel except on a very lightly used system. With vacuum_defer_cleanup that will no longer be true. It will be as if you always have a query lasting n transactions in your system at all times. Yep, but until we get XID-publish-to-master working in 9.1, I think it's probably the best we can do. At least it's no *worse* than having a long-running query on the master at all times. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On Wed, Mar 10, 2010 at 3:29 PM, Josh Berkus j...@agliodbs.com wrote: I've been playing with vacuum_defer_cleanup_age in reference to the query cancel problem. It really seems to me that this is the way forward in terms of dealing with query cancel for normal operation rather than wal_standby_delay, or maybe in combination with it. Why isn't vacuum_defer_cleanup_age listed on postgresql.conf.sample? Though I also tried to test the effect of it, I was unable to find it in the conf file. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Fujii Masao wrote: On Wed, Mar 10, 2010 at 3:29 PM, Josh Berkus j...@agliodbs.com wrote: I've been playing with vacuum_defer_cleanup_age in reference to the query cancel problem. ?It really seems to me that this is the way forward in terms of dealing with query cancel for normal operation rather than wal_standby_delay, or maybe in combination with it. Why isn't vacuum_defer_cleanup_age listed on postgresql.conf.sample? Though I also tried to test the effect of it, I was unable to find it in the conf file. I asked about that last week and for some reason Simon didn't want it added and Greg Smith was going to get this corrected. Greg? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Why isn't vacuum_defer_cleanup_age listed on postgresql.conf.sample? Though I also tried to test the effect of it, I was unable to find it in the conf file. Using it has some bugs we need to clean up, apparently. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
All, I've been playing with vacuum_defer_cleanup_age in reference to the query cancel problem. It really seems to me that this is the way forward in terms of dealing with query cancel for normal operation rather than wal_standby_delay, or maybe in combination with it. As a first test, I set up a deliberately pathological situation with pgbench and a wal_standby_delay of 1 second. This allowed me to trigger query cancel on a relatively simple reporting query; in fact, to make it impossible to complete. Then I increased vacuum_defer_cleanup_age to 10, which represents about 5 minutes of transactions on the test system. This eliminated all query cancels for the reporting query, which takes an average of 10s. Next is a database bloat test, but I'll need to do that on a system with more free space than my laptop. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Bruce Momjian wrote: 'max_standby_delay = -1' is really only a reasonable idea if you are absolutely certain all queries are going to be short, which we can't dismiss as an unfounded use case so it has value. I would expect you have to also combine it with a matching reasonable statement_timeout to enforce that expectation to make that situation safer. Well, as you stated in your blog, you are going to have one of these downsides: o master bloat o delayed recovery o cancelled queries Right now you can't choose master bloat, but you can choose the other two. I think that is acceptable for 9.0, assuming the other two don't have the problems that Tom foresees. I was wrong. You can choose master bloat with vacuum_defer_cleanup_age, but only crudely because it is measured in xids and the master defers no matter what queries are running on the slave, and there is still the possibility for query cancel for long queries. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Greg Smith wrote: I assumed they would set max_standby_delay = -1 and be happy. The admin in this situation might be happy until the first time the primary fails and a failover is forced, at which point there is an unbounded amount of recovery data to apply that was stuck waiting behind whatever long-running queries were active. I don't know if you've ever watched what happens to a pre-8.2 cold standby when you start it up with hundreds or thousands of backed up WAL files to process before the server can start, but it's not a fast process. I watched a production 8.1 standby get 4000 files behind once due to an archive_command bug, and it's not something I'd like to ever chew my nails off to again. If your goal was HA and you're trying to bring up the standby, the server is down the whole time that's going on. This is why no admin who prioritizes HA would consider 'max_standby_delay = -1' a reasonable setting, and those are the sort of users Joachim's example was discussing. Only takes one rogue query that runs for a long time to make the standby so far behind it's useless for HA purposes. And you also have to ask yourself if recovery is halted while waiting for this query to run, how stale is the data on the standby getting?. That's true for any large setting for this parameter, but using -1 for the unlimited setting also gives the maximum possible potential for such staleness. 'max_standby_delay = -1' is really only a reasonable idea if you are absolutely certain all queries are going to be short, which we can't dismiss as an unfounded use case so it has value. I would expect you have to also combine it with a matching reasonable statement_timeout to enforce that expectation to make that situation safer. Well, as you stated in your blog, you are going to have one of these downsides: o master bloat o delayed recovery o cancelled queries Right now you can't choose master bloat, but you can choose the other two. I think that is acceptable for 9.0, assuming the other two don't have the problems that Tom foresees. Our documentation should probably just come how and state that clearly. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On 3/2/10 10:30 AM, Bruce Momjian wrote: Right now you can't choose master bloat, but you can choose the other two. I think that is acceptable for 9.0, assuming the other two don't have the problems that Tom foresees. Actually, if vacuum_defer_cleanup_age can be used, master bloat is an option. Hopefully I'll get some time for serious testing this weekend. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Bruce Momjian wrote: Right now you can't choose master bloat, but you can choose the other two. I think that is acceptable for 9.0, assuming the other two don't have the problems that Tom foresees. I was wrong. You can choose master bloat with vacuum_defer_cleanup_age, but only crudely because it is measured in xids and the master defers no matter what queries are running on the slave... OK with you finding the situation acceptable, so long as it's an informed decision. From how you're writing about this, I'm comfortable you (and everybody else still involved here) have absorbed the issues enough that we're all talking about the same thing now. Since there are a couple of ugly user-space hacks possible for prioritizing master bloat, and nobody is stepping up to work on resolving this via my suggestion involving better SR integration, seems to me heated discussion of code changes has come to a resolution of sorts I (and Simon, just checked) can live with. Sounds like we have three action paths here: -Tom already said he was planning a tour through the HS/SR code, I wanted that to happen with him aware of this issue. -Josh will continue doing his testing, also better informed about this particular soft spot. -I'll continue test-case construction for the problems here there are still concerns about (pathologic max_standby_delay and b-tree split issues being the top two on that list), and keep sharing particularly interesting ones here to help everyone else's testing. If it turns out any of those paths leads to a must-fix problem that doesn't have an acceptable solution, at least the idea of this as a plan B is both documented and more widely understood then when I started ringing this particular bell. I just updated the Open Items list: http://wiki.postgresql.org/wiki/PostgreSQL_9.0_Open_Items to officially put myself on the hook for the following HS related documentation items that have come up recently, aiming to get them all wrapped up in time before or during early beta: -Update Hot Standby documentation: clearly explain relationships between the 3 major setup trade-offs, buffer cleanup lock, notes on which queries are killed once max_standby_delay is reached, measuring XID churn on master for setting vacuum_defer_cleanup_age -Clean up archive_command docs related to recent /bin/true addition. Given that's where I expect people who run into the pg_stop_backup warning message recently added will end up at, noting its value for escaping from that particular case might be useful too. To finish airing my personal 9.0 TODO list now that I've gone this far, I'm also still working on completing the following patches that initial versions have been submitted of, was close to finishing both before getting side-tracked onto this larger issue: -pgbench 4000 scale bug fix: http://archives.postgresql.org/message-id/4b621ba3.7090...@2ndquadrant.com -Improving the logging/error reporting/no timestamp issues in pg_standby re-raised recently by Selena: http://archives.postgresql.org/message-id/2b5e566d1001250945oae17be8n6317f827e3bd7...@mail.gmail.com If nobody else claims them as something they're working on before, I suspect I'll then move onto building some of the archiver UI improvements discussed most recently as part of the pg_stop_backup does not complete thread, despite Heikki having crushed my dreams of a simple solution to those by pointing out the shared memory memory limitation involved. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On Mon, 2010-03-01 at 12:04 -0800, Josh Berkus wrote: does anyone dispute his analysis? Simon? No dispute. I think I've discussed this before. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On Mon, 2010-03-01 at 14:43 -0500, Tom Lane wrote: Speaking of which, does the current HS+SR code have a provision to force the standby to stop tracking WAL and come up live, even when there's more WAL available? Yes, trigger file. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On Sun, 2010-02-28 at 16:56 +0100, Joachim Wieland wrote: Now let's take a look at both scenarios from the administrators' point of view: Well argued, agree with all of your points. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Greg Stark wrote: On Mon, Mar 1, 2010 at 5:50 PM, Josh Berkus j...@agliodbs.com wrote: I don't think that defer_cleanup_age is a long-term solution. ?But we need *a* solution which does not involve delaying 9.0. So I think the primary solution currently is to raise max_standby_age. However there is a concern with max_standby_age. If you set it to, say, 300s. Then run a 300s query on the slave which causes the slave to fall 299s behind. Now you start a new query on the slave -- it gets a snapshot based on the point in time that the slave is currently at. If it hits a conflict it will only have 1s to finish before the conflict causes the query to be cancelled. In short in the current setup I think there is no safe value of max_standby_age which will prevent query cancellations short of -1. If the slave has a constant stream of queries and always has at least one concurrent query running then it's possible that the slave will run continuously max_standby_age-epsilon behind the master and cancel queries left and right, regardless of how large max_standby_age is. To resolve this I think you would have to introduce some chance for the slave to catch up. Something like refusing to use a snapshot older than max_standby_age/2 and instead wait until the existing queries finish and the slave gets a chance to catch up and see a more recent snapshot. The problem is that this would result in very unpredictable and variable response times from the slave. A single long-lived query could cause replay to pause for a big chunk of max_standby_age and prevent any new query from starting. That is a good point. I have added the attached documentation patch to mention that max_standby_delay increases the master/slave inconsistency, and not to use it for xid-keepalive connections. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do Index: doc/src/sgml/config.sgml === RCS file: /cvsroot/pgsql/doc/src/sgml/config.sgml,v retrieving revision 1.256 diff -c -c -r1.256 config.sgml *** doc/src/sgml/config.sgml 27 Feb 2010 14:46:05 - 1.256 --- doc/src/sgml/config.sgml 2 Mar 2010 21:03:14 - *** *** 1869,1875 this parameter makes sense only during replication, so when performing an archive recovery to recover from data loss a very high parameter setting or -1 which means wait forever is recommended. ! The default is 30 seconds. This parameter can only be set in the filenamepostgresql.conf/ file or on the server command line. /para --- 1869,1876 this parameter makes sense only during replication, so when performing an archive recovery to recover from data loss a very high parameter setting or -1 which means wait forever is recommended. ! The default is 30 seconds. Increasing this parameter can delay ! master server changes from appearing on the standby. This parameter can only be set in the filenamepostgresql.conf/ file or on the server command line. /para Index: doc/src/sgml/high-availability.sgml === RCS file: /cvsroot/pgsql/doc/src/sgml/high-availability.sgml,v retrieving revision 1.52 diff -c -c -r1.52 high-availability.sgml *** doc/src/sgml/high-availability.sgml 27 Feb 2010 09:29:20 - 1.52 --- doc/src/sgml/high-availability.sgml 2 Mar 2010 21:03:14 - *** *** 1410,1416 that the primary and standby nodes are linked via the WAL, so the cleanup situation is no different from the case where the query ran on the primary node itself. And you are still getting the benefit of off-loading the ! execution onto the standby. /para para --- 1410,1418 that the primary and standby nodes are linked via the WAL, so the cleanup situation is no different from the case where the query ran on the primary node itself. And you are still getting the benefit of off-loading the ! execution onto the standby. varnamemax_standby_delay/ should ! not be used in this case because delayed WAL files might already ! contain entries that invalidate the current shapshot. /para para -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Greg Stark wrote: On Mon, Mar 1, 2010 at 5:50 PM, Josh Berkus j...@agliodbs.com wrote: I don't think that defer_cleanup_age is a long-term solution. ?But we need *a* solution which does not involve delaying 9.0. So I think the primary solution currently is to raise max_standby_age. However there is a concern with max_standby_age. If you set it to, say, 300s. Then run a 300s query on the slave which causes the slave to fall 299s behind. Now you start a new query on the slave -- it gets a snapshot based on the point in time that the slave is currently at. If it hits a conflict it will only have 1s to finish before the conflict causes the query to be cancelled. In short in the current setup I think there is no safe value of max_standby_age which will prevent query cancellations short of -1. If the slave has a constant stream of queries and always has at least one concurrent query running then it's possible that the slave will run continuously max_standby_age-epsilon behind the master and cancel queries left and right, regardless of how large max_standby_age is. This is sobering. I have added the attached documentation so at least this odd behavior is documented. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do Index: doc/src/sgml/config.sgml === RCS file: /cvsroot/pgsql/doc/src/sgml/config.sgml,v retrieving revision 1.257 diff -c -c -r1.257 config.sgml *** doc/src/sgml/config.sgml 2 Mar 2010 21:18:59 - 1.257 --- doc/src/sgml/config.sgml 2 Mar 2010 23:34:38 - *** *** 1862,1879 listitem para When server acts as a standby, this parameter specifies a wait policy ! for queries that conflict with data changes being replayed by recovery. If a conflict should occur the server will delay up to this number ! of seconds before it begins trying to resolve things less amicably, as ! described in xref linkend=hot-standby-conflict. Typically, ! this parameter makes sense only during replication, so when ! performing an archive recovery to recover from data loss a very high ! parameter setting or -1 which means wait forever is recommended. ! The default is 30 seconds. Increasing this parameter can delay ! master server changes from appearing on the standby. This parameter can only be set in the filenamepostgresql.conf/ file or on the server command line. /para /listitem /varlistentry --- 1862,1892 listitem para When server acts as a standby, this parameter specifies a wait policy ! for applying WAL entries that conflict with active queries. If a conflict should occur the server will delay up to this number ! of seconds before it cancels conflicting queries, as ! described in xref linkend=hot-standby-conflict. ! Typically, this parameter is used only during replication. ! The default is 30 seconds. This parameter can only be set in the filenamepostgresql.conf/ file or on the server command line. /para +para + A high value makes query cancel less likely, and -1 + causes the standby to wait forever for a conflicting query to + complete. Increasing this parameter might delay master server + changes from appearing on the standby. + /para + para +While it is tempting to believe that varnamemax_standby_delay/ +is the maximum number of seconds a query can run before +cancellation is possible, this is not true. When a long-running +query ends, there is a finite time required to apply backlogged +WAL logs. If a second long-running query appears before the +WAL has caught up, the snapshot taken by the second query will +allow significantly less than varnamemax_standby_delay/ +before query cancellation is possible. + /para /listitem /varlistentry -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Greg Smith wrote: Bruce Momjian wrote: Right now you can't choose master bloat, but you can choose the other two. I think that is acceptable for 9.0, assuming the other two don't have the problems that Tom foresees. I was wrong. You can choose master bloat with vacuum_defer_cleanup_age, but only crudely because it is measured in xids and the master defers no matter what queries are running on the slave... OK with you finding the situation acceptable, so long as it's an informed decision. From how you're writing about this, I'm comfortable Well, consider that in November we were not even sure SR or HS would be in 9.0. We got them both, so if it is a little kudgy in 9.0, that's OK. We are much farther along than I ever expected. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On 2/28/10 7:00 PM, Greg Smith wrote: The main problem with setting vacuum_defer_cleanup_age high isn't showing it works, it's a pretty simple bit of code. It's when you recognize that it penalizes all cleanup all the time, whether or not the standby is actually executing a long-running query or not, that you note the second level of pain in increasing it. Returning to the idea of how is this different from a site already in production?, it may very well be the case that a site that sets vacuum_defer_cleanup_age high enough to support off-peak batch reporting cannot tolerate how that will impact vacuums during their peak time of day. The XID export implementation sidesteps that issue by only making the vacuum delay increase when queries that require it are running, turning this back into a standard what's the best time of day to run my big reports? issue that people understand how to cope with already. I don't think that defer_cleanup_age is a long-term solution. But we need *a* solution which does not involve delaying 9.0. And I think we can measure bloat in a pgbench test, no? When I get a chance, I'll run one for a couple hours and see the difference that cleanup_age makes. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Josh Berkus wrote: And I think we can measure bloat in a pgbench test, no? When I get a chance, I'll run one for a couple hours and see the difference that cleanup_age makes. The test case I attached at the start of this thread runs just the UPDATE to the tellers table. Running something similar that focuses just on UPDATEs to the pgbench_accounts table, without the rest of the steps done by the standard test, is the fastest route to bloat. The standard test will do it too, just does a lot of extra stuff too that doesn't impact results (SELECT, INSERT) so it wastes some resources compared to a targeted bloater script. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On Mon, Mar 1, 2010 at 5:50 PM, Josh Berkus j...@agliodbs.com wrote: I don't think that defer_cleanup_age is a long-term solution. But we need *a* solution which does not involve delaying 9.0. So I think the primary solution currently is to raise max_standby_age. However there is a concern with max_standby_age. If you set it to, say, 300s. Then run a 300s query on the slave which causes the slave to fall 299s behind. Now you start a new query on the slave -- it gets a snapshot based on the point in time that the slave is currently at. If it hits a conflict it will only have 1s to finish before the conflict causes the query to be cancelled. In short in the current setup I think there is no safe value of max_standby_age which will prevent query cancellations short of -1. If the slave has a constant stream of queries and always has at least one concurrent query running then it's possible that the slave will run continuously max_standby_age-epsilon behind the master and cancel queries left and right, regardless of how large max_standby_age is. To resolve this I think you would have to introduce some chance for the slave to catch up. Something like refusing to use a snapshot older than max_standby_age/2 and instead wait until the existing queries finish and the slave gets a chance to catch up and see a more recent snapshot. The problem is that this would result in very unpredictable and variable response times from the slave. A single long-lived query could cause replay to pause for a big chunk of max_standby_age and prevent any new query from starting. Does anyone see any way to guarantee that the slave gets a chance to replay and new snapshots will become visible without freezing out new queries for extended periods of time? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
So I think the primary solution currently is to raise max_standby_age. However there is a concern with max_standby_age. If you set it to, say, 300s. Then run a 300s query on the slave which causes the slave to fall 299s behind. Now you start a new query on the slave -- it gets a snapshot based on the point in time that the slave is currently at. If it hits a conflict it will only have 1s to finish before the conflict causes the query to be cancelled. Completely aside from that, how many users are going to be happy with a slave server which is constantly 5 minutes behind? --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On Mon, Mar 1, 2010 at 7:21 PM, Josh Berkus j...@agliodbs.com wrote: Completely aside from that, how many users are going to be happy with a slave server which is constantly 5 minutes behind? Uhm, well all the ones who are happy with our current warm standby setup for one? And all the ones who are looking for a standby reporting server rather than a high availability DR site. For what it's worth Oracle has an option to have your standby intentionally hold back n minutes behind and I've seen that set to 5 minutes. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Greg Stark wrote: On Mon, Mar 1, 2010 at 7:21 PM, Josh Berkus j...@agliodbs.com wrote: Completely aside from that, how many users are going to be happy with a slave server which is constantly 5 minutes behind? Uhm, well all the ones who are happy with our current warm standby setup for one? And all the ones who are looking for a standby reporting server rather than a high availability DR site. For what it's worth Oracle has an option to have your standby intentionally hold back n minutes behind and I've seen that set to 5 minutes. yeah a lot of people are doing that intentionally... Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes: Greg Stark wrote: For what it's worth Oracle has an option to have your standby intentionally hold back n minutes behind and I've seen that set to 5 minutes. yeah a lot of people are doing that intentionally... It's the old DBA screwup safety valve ... drop the main accounts table, you have five minutes to stop replication before it's dropped on the standby. Speaking of which, does the current HS+SR code have a provision to force the standby to stop tracking WAL and come up live, even when there's more WAL available? Because that's what you'd need in order for such a thing to be helpful in that scenario. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On 3/1/10 11:43 AM, Tom Lane wrote: Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes: Greg Stark wrote: For what it's worth Oracle has an option to have your standby intentionally hold back n minutes behind and I've seen that set to 5 minutes. yeah a lot of people are doing that intentionally... It's the old DBA screwup safety valve ... drop the main accounts table, you have five minutes to stop replication before it's dropped on the standby. Speaking of which, does the current HS+SR code have a provision to force the standby to stop tracking WAL and come up live, even when there's more WAL available? Because that's what you'd need in order for such a thing to be helpful in that scenario. the fast recovery option should do this. You'd need some fast reaction times, though. However, this leaves aside Greg's point about snapshot age and successive queries; does anyone dispute his analysis? Simon? --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On 2/28/10 7:12 PM, Robert Haas wrote: However, I'd still like to hear from someone with the requisite technical knowledge whether capturing and retrying the current query in a query cancel is even possible. I'm not sure who you want to hear from here, but I think that's a dead end. dead end as in too hard to implement? Or for some other reason? It's undeniable that auto-retry would be better from a user's perspective than a user-visible cancel. So if it's *reasonable* to implement, I think we should be working on it. I'm also very puzzled as to why nobody else wants to even discuss it; it's like some wierd blackout. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Josh Berkus j...@agliodbs.com wrote: It's undeniable that auto-retry would be better from a user's perspective than a user-visible cancel. So if it's *reasonable* to implement, I think we should be working on it. I'm also very puzzled as to why nobody else wants to even discuss it; it's like some wierd blackout. Well, at least for serializable transactions past the first statement, you'd need to have the complete *logic* for the transaction in order to do a retry. Not that this is a bad idea -- our application framework does this automatically -- but unless you only support this for a transaction which is wrapped up as a function, I don't see how the database itself could handle it. It might be *possible* to do it outside of a single-function transaction in a read committed transaction, but you'd have to be careful about locks. I remember suggesting automatic query retry (rather than continuing in a mixed-snapshot mode) for update conflicts in read committed mode and Tom had objections; you might want to check the archives for that. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
josh, nobody is talking about it because it doesn't make sense. you could only retry if it was the first query in the transaction and only if you could prove there were no side-effects outside the database and then you would have no reason to think the retry would be any more likely to work. greg On 1 Mar 2010 22:32, Josh Berkus j...@agliodbs.com wrote: On 2/28/10 7:12 PM, Robert Haas wrote: However, I'd still like to hear from someone with the requ... dead end as in too hard to implement? Or for some other reason? It's undeniable that auto-retry would be better from a user's perspective than a user-visible cancel. So if it's *reasonable* to implement, I think we should be working on it. I'm also very puzzled as to why nobody else wants to even discuss it; it's like some wierd blackout. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subs...
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Greg Stark st...@mit.edu writes: josh, nobody is talking about it because it doesn't make sense. you could only retry if it was the first query in the transaction and only if you could prove there were no side-effects outside the database and then you would have no reason to think the retry would be any more likely to work. But it's hot standby, so there are no data-modifying transactions. Volatile functions could be a problem, though. A bigger problem is we might have already shipped partial query results to the client. I agree it ain't easy, but it might not be completely out of the question. Definitely won't be happening for 9.0 though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Josh Berkus wrote: However, this leaves aside Greg's point about snapshot age and successive queries; does anyone dispute his analysis? Simon? There's already a note on the Hot Standby TODO about unexpectly bad max_standby_delay behavior being possible on an idle system, with no suggested resolution for it besides better SR integration. The issue Greg Stark has noted is another variation on that theme. It's already on my list of theorized pathological but as yet undemonstrated concerns that Simon and I identified, the one I'm working through creating a test cases to prove/disprove. I'm past it's possible... talks at this point though as not to spook anyone unnecessarily, and am only raising things I can show concrete examples of in action. White box testing at some point does require pausing one's investigation of what's in the box and getting on with the actual testing instead. The only real spot where my opinion diverges here that I have yet to find any situation where 'max_standby_delay=-1' makes any sense to me. When I try running my test cases with that setting, the whole system just reacts far too strangely. My first patch here is probably going to be adding more visibility into the situation when queries are blocking replication forever, because I think the times I find myself at why is the system hung right now? are when that happens and it's not obvious as an admin what's going on. Also, the idea that a long running query on the standby could cause an unbounded delay in replication is so foreign to my sensibilities that I don't ever include it in the list of useful solutions to the problems I'm worried about. The option is there, not disputing that it makes sense for some people because there seems some demand for it, just can't see how it fits into any of the use-cases I'm concerned about. I haven't said anything about query retry mainly because I can't imagine any way it's possible to build it in time for this release, so whether it's eventually feasible or not doesn't enter into what I'm worried about right now. In any case, I would prioritize that behind work on preventing the most common situations that cause cancellations in the first place, until those are handled so well that retry is the most effective improvement left to consider. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On Mon, Mar 1, 2010 at 5:32 PM, Josh Berkus j...@agliodbs.com wrote: On 2/28/10 7:12 PM, Robert Haas wrote: However, I'd still like to hear from someone with the requisite technical knowledge whether capturing and retrying the current query in a query cancel is even possible. I'm not sure who you want to hear from here, but I think that's a dead end. dead end as in too hard to implement? Or for some other reason? I think it's probably too hard to implement for the extremely limited set of circumstances in which it can work. See the other responses for some of the problems. There are others, too. Suppose that the plan for some particular query is to read a table with a hundred million records, sort it, and then do whatever with the results. After reading the first 99 million records, the transaction is cancelled and we have to start over. Maybe someone will say, fine, no problem - but it's certainly going to be user-visible. Especially if we retry more than once. I think we should focus our efforts initially on reducing the frequency of spurious cancels. What we're essentially trying to do here is refute the proposition the WAL record I just replayed might change the result of this query. It's possibly equivalent to the halting problem (and certainly impossibly hard) to refute this proposition in every case where it is in fact false, but it sounds like what we have in place right now doesn't come close to doing as well as can be done. I just read through the current documentation and it doesn't really seem to explain very much about how HS decides which queries to kill. Can someone try to flesh that out a bit? It also uses the term buffer cleanup lock, which doesn't seem to be used anywhere else in the documentation (though it does appear in the source tree, including README.HOT). ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
* Tom Lane t...@sss.pgh.pa.us [100301 20:04]: Greg Stark st...@mit.edu writes: josh, nobody is talking about it because it doesn't make sense. you could only retry if it was the first query in the transaction and only if you could prove there were no side-effects outside the database and then you would have no reason to think the retry would be any more likely to work. But it's hot standby, so there are no data-modifying transactions. Volatile functions could be a problem, though. A bigger problem is we might have already shipped partial query results to the client. But, since we know its a slave and that the reason the query was cancelled was because it's got a backlog of updates to apply, it's very likely that the data that the earlier parts of the transaction would be different... And then you have no idea if just blindly replaying all statements of the transaction successively is a good idea... a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Joachim Wieland wrote: 1) With the current implementation they will see better performance on the master and more aggressive vacuum (!), since they have less long-running queries now on the master and autovacuum can kick in and clean up with less delay than before. On the other hand their queries on the standby might fail and they will start thinking that this HS+SR feature is not as convincing as they thought it was... Next step for them is to take the documentation and study it for a few days to learn all about vacuum, different delays, transaction ids and age parameters and experiment a few weeks until no more queries fail - for a while... But they can never be sure... In the end they might also modify the parameters in the wrong direction or overshoot because of lack of time to experiment and lose another important property without noticing (like being as close as possible to the master). I assumed they would set max_standby_delay = -1 and be happy. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Josh Berkus wrote: HS+SR is still a tremendous improvement over the options available previously. We never thought it was going to work for everyone everywhere, and shouldn't let our project's OCD tendencies run away from us. OCD (Obsessive-Compulsive Disorder) --- good one. :-) -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Bruce Momjian wrote: Joachim Wieland wrote: 1) With the current implementation they will see better performance on the master and more aggressive vacuum (!), since they have less long-running queries now on the master and autovacuum can kick in and clean up with less delay than before. On the other hand their queries on the standby might fail and they will start thinking that this HS+SR feature is not as convincing as they thought it was... I assumed they would set max_standby_delay = -1 and be happy. The admin in this situation might be happy until the first time the primary fails and a failover is forced, at which point there is an unbounded amount of recovery data to apply that was stuck waiting behind whatever long-running queries were active. I don't know if you've ever watched what happens to a pre-8.2 cold standby when you start it up with hundreds or thousands of backed up WAL files to process before the server can start, but it's not a fast process. I watched a production 8.1 standby get 4000 files behind once due to an archive_command bug, and it's not something I'd like to ever chew my nails off to again. If your goal was HA and you're trying to bring up the standby, the server is down the whole time that's going on. This is why no admin who prioritizes HA would consider 'max_standby_delay = -1' a reasonable setting, and those are the sort of users Joachim's example was discussing. Only takes one rogue query that runs for a long time to make the standby so far behind it's useless for HA purposes. And you also have to ask yourself if recovery is halted while waiting for this query to run, how stale is the data on the standby getting?. That's true for any large setting for this parameter, but using -1 for the unlimited setting also gives the maximum possible potential for such staleness. 'max_standby_delay = -1' is really only a reasonable idea if you are absolutely certain all queries are going to be short, which we can't dismiss as an unfounded use case so it has value. I would expect you have to also combine it with a matching reasonable statement_timeout to enforce that expectation to make that situation safer. In any of the offload batch queries to the failover standby situations, it's unlikely an unlimited value for this setting will be practical. Perhaps you set max_standby_delay to some number of hours, to match your expected worst-case query run time and reduce the chance of cancellation. Not putting a limit on it at all is a situation no DBA with healthy paranoia is going to be happy with the potential downside of in a HA environment, given that both unbounded staleness and recovery time are then both possible. The potential of a failed long-running query is much less risky than either of those. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Robert Haas wrote: I just read through the current documentation and it doesn't really seem to explain very much about how HS decides which queries to kill. Can someone try to flesh that out a bit? I believe it just launches on a mass killing spree once things like max_standby_delay expire. This I want to confirm via testing (can simulate with a mix of long and short running pgbench queries) and then intend to update the docs to clarify. It also uses the term buffer cleanup lock, which doesn't seem to be used anywhere else in the documentation (though it does appear in the source tree, including README.HOT). This loose end was already noted in my last docs update. I wrote an initial description, but Bruce and I decided to leave out until something more thorough could be put together. This is also on my docs cleanup list, will get to it somewhere along the beta timeline. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On Sun, Feb 28, 2010 at 6:07 AM, Greg Smith g...@2ndquadrant.com wrote: Not forced to--have the option of. There are obviously workloads where you wouldn't want this. At the same time, I think there are some pretty common ones people are going to expect HS+SR to work on transparently where this would obviously be the preferred trade-off to make, were it available as one of the options. The test case I put together shows an intentionally pathological but not completely unrealistic example of such a workload. Well if we're forced to eventually have both then it kind of takes the wind out of Tom's arguments. We had better get both features working so it becomes only a question of which is worth doing first and which can be held off. Since there aren't any actual bugs in evidence for the current setup and we already have it that's a pretty easy decision. What I am sure of is that a SR-based xmin passing approach is simpler, easier to explain, more robust for some common workloads, and less likely to give surprised wow, I didn't think *that* would cancel my standby query reports from the field Really? I think we get lots of suprised wows from the field from the idea that a long-running read-only query can cause your database to bloat. I think the only reason that's obvious to us is that we've been grappling with that problem for so long. And since I never like to bet against Tom's gut feel, having it around as a plan B in case he's right about an overwhelming round of bug reports piling up against the max_standby_delay etc. logic doesn't hurt either. Agreed. Though I think it'll be bad in that case even if we have a plan B. It'll mean no file-based log shipping replicas and no guarantee that what you run on the standby can't affect the master -- which is a pretty nice guarantee. It'll also mean it'll be much more fragile against network interruptions. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On Sun, Feb 28, 2010 at 2:54 PM, Greg Stark gsst...@mit.edu wrote: Really? I think we get lots of suprised wows from the field from the idea that a long-running read-only query can cause your database to bloat. I think the only reason that's obvious to us is that we've been grappling with that problem for so long. It seems to me that the scenario that you are looking at is one where people run different queries with and without HS, i.e. that they will run longer read-only queries than now once they have HS. I don't think that is the case. If it isn't you cannot really speak of a master bloat. Instead, I assume that most people who will grab 9.0 and use HS+SR do already have a database with a certain query profile. Now with HS+SR they will try to put the most costly and longest read-only queries to the standby but in the end will run the same number of queries with the same overall complexity. Now let's take a look at both scenarios from the administrators' point of view: 1) With the current implementation they will see better performance on the master and more aggressive vacuum (!), since they have less long-running queries now on the master and autovacuum can kick in and clean up with less delay than before. On the other hand their queries on the standby might fail and they will start thinking that this HS+SR feature is not as convincing as they thought it was... Next step for them is to take the documentation and study it for a few days to learn all about vacuum, different delays, transaction ids and age parameters and experiment a few weeks until no more queries fail - for a while... But they can never be sure... In the end they might also modify the parameters in the wrong direction or overshoot because of lack of time to experiment and lose another important property without noticing (like being as close as possible to the master). 2) On the other hand if we could ship 9.0 with the xmin-propagation feature, people would still see a better performance and have a hot standby system but this time without query cancellations. Again: the read-only queries that will be processed by the HS in the future are being processed by the master today anyway, so why should it get worse? The first impression will be that it just works nicely out of the box, is easy to set up and has no negative effect (query cancellation) that has not already shown up before (vacuum lag). I guess that most people will just run fine with this setup and never get to know about the internals. Of course we should still offer an expert mode where you can turn all kinds of knobs and where you can avoid the vacuum dependency but it would be nice if this could be the expert mode only. Tuning this is highly installation specific and you need to have a deep understanding of how PostgreSQL and HS work internally and what you actually want to achieve... Agreed. Though I think it'll be bad in that case even if we have a plan B. It'll mean no file-based log shipping replicas and no guarantee that what you run on the standby can't affect the master -- which is a pretty nice guarantee. It'll also mean it'll be much more fragile against network interruptions. Regarding the network interruptions... in reality if you have network interruptions of several minutes between your primary and your standby, you have worse problems anyway... If the standby does not renew its xmin for n seconds, log a message and just go on... Joachim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
All, First, from the nature of the arguments, we need to eventually have both versions of SR: delay-based and xmin-pub. And it would be fantastic if Greg Smith and Tom Lane could work on xmin-pub to see if we can get it ready as well. I also think, based on the discussion and Greg's test case, that we could do two things which would make the shortcomings of delay-based SR a vastly better experience for users: 1) Automated retry of cancelled queries on the slave. I have no idea how hard this would be to implement, but it makes the difference between writing lots of exception-handling code for slave connections (unacceptable) to just slow response times on the slave (acceptable). 2) A more usable vacuum_defer_cleanup_age. If it was feasible for a user to configure the master to not vacuum records less than, say, 5 minutes dead, then that would again offer the choice to the user of slightly degraded performance on the master (acceptable) vs. lots of query cancel (unacceptable). I'm going to test Greg's case with vacuum_cleanup_age used fairly liberally to see if this approach has merit. Why do I say that lots of query cancel is unacceptable? For the simple reason that one cannot run the same application code against an HS+SR cluster with lots of query cancel as one runs against a standalone database. And if that's true, then the main advantage of HS+SR over Slony and Londiste is gone. MySQL took great pains to make sure that you could run the same code against replicated MySQL as standalone, and that was based on having a fairly intimate relationship with their users (at the time, anyway). Another thing to keep in mind in these discussions is the inexpensiveness of servers today. This means that, if slaves have poor performance, that's OK; one can always spin up more slaves. But if each slave imposes a large burden on the master, then that limits your scalability. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Joachim Wieland wrote: Instead, I assume that most people who will grab 9.0 and use HS+SR do already have a database with a certain query profile. Now with HS+SR they will try to put the most costly and longest read-only queries to the standby but in the end will run the same number of queries with the same overall complexity. This is a nice summary of the primary use-case I am trying to optimize usability for, because I know for a fact there's a stack of pent-up requests for exactly this form of improvement from existing warm standby users. And your subsequent discussion of how administrators will react in each of the possible configurations here matches my own concerns. I would highly recommend anyone who feels this is not a critical feature to fix carefully read Joachim's message from an advocacy perspective, that's a better user-oriented prediction than mine of exactly how this is going to play out in the field post-release. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Josh Berkus wrote: First, from the nature of the arguments, we need to eventually have both versions of SR: delay-based and xmin-pub. And it would be fantastic if Greg Smith and Tom Lane could work on xmin-pub to see if we can get it ready as well. As I see it, the main technical obstacle here is that a subset of a feature already on the SR roadmap needs to get built earlier than expected to pull this off. I don't know about Tom, but I have no expectation it's possible for me to get up to speed on that code fast enough to contribute anything there. I expect the thing I'd be most productive at as far as moving the release forward is to continue testing this pair of features looking for rough edges, which is what I have planned for the next month. I'm not even close to finished with generating test cases specifically probing for bad behavior suspected after a look the implementation details--this is just what I came up with in my first week of that. Count me in for more testing, but out for significant development here. It's not what I've got my time allocated for because it's not where I think I'll be most productive. 2) A more usable vacuum_defer_cleanup_age. If it was feasible for a user to configure the master to not vacuum records less than, say, 5 minutes dead, then that would again offer the choice to the user of slightly degraded performance on the master (acceptable) vs. lots of query cancel (unacceptable). I'm going to test Greg's case with vacuum_cleanup_age used fairly liberally to see if this approach has merit. I've been down that road and it leads quickly to the following question: how can I tell how old in time-based units an xid is? If there were an easy answer to that question, vacuum_defer_cleanup_age would already be set in time units. It's the obvious UI to want, it's just not obvious how to build it internally. Maybe I missed something, but my guess is that vacuum_defer_cleanup_age is already as good as it's going to get. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Josh Berkus j...@agliodbs.com writes: 2) A more usable vacuum_defer_cleanup_age. If it was feasible for a user to configure the master to not vacuum records less than, say, 5 minutes dead, then that would again offer the choice to the user of slightly degraded performance on the master (acceptable) vs. lots of query cancel (unacceptable). I'm going to test Greg's case with vacuum_cleanup_age used fairly liberally to see if this approach has merit. I think that to associate any time based interval notion with the XID flow, you need a ticker. We already took the txid and txid_snapshot types and functions from Skytools, which took them from Slony. Maybe we could consider borrowing pgqd, the C version of the ticker, for being able to specify in human time how long a dead transaction is allowed to remain in the heap? http://github.com/markokr/skytools-dev/tree/master/sql/ticker/ Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On Sun, Feb 28, 2010 at 8:47 PM, Josh Berkus j...@agliodbs.com wrote: 1) Automated retry of cancelled queries on the slave. I have no idea how hard this would be to implement, but it makes the difference between writing lots of exception-handling code for slave connections (unacceptable) to just slow response times on the slave (acceptable). We're not only canceling queries, we are effectively canceling transactions. It seems quite impossible to repeat all queries from a transaction that has started in the past. One query might be or include the result of a previous query and as the data we see now has changed since then, the client might now want to execute a different query when it gets a different result out of a previous query... And even if it was possible, how often would you retry? You still have no guarantee that your query succeeds the second time. I'd claim that if a query failed once, chances are even higher that it fails again than that it succeeds the second time. Moreover if you continue to repeat the query and if queries come in at a certain rate, you need to process more and more queries on the slave which will not really help other queries to finish in time nor will it be beneficial for the throughput of the system as a whole... I fully agree with what you say about user expectations: We need to assume that many programs are not prepared for failures of simple read-only queries because in the past they have always worked... Another thing to keep in mind in these discussions is the inexpensiveness of servers today. This means that, if slaves have poor performance, that's OK; one can always spin up more slaves. But if each slave imposes a large burden on the master, then that limits your scalability. The burden of the xmin-publication feature is not the number of slaves, it's just the longest running queries on whatever slave they are. So your argument applies to both cases... To minimize the burden on the master, get additional slaves so that you can run your most expensive queries in a shorter time :-) Joachim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Greg, Joachim, As I see it, the main technical obstacle here is that a subset of a feature already on the SR roadmap needs to get built earlier than expected to pull this off. I don't know about Tom, but I have no expectation it's possible for me to get up to speed on that code fast enough to contribute anything there. I expect the thing I'd be most productive at as far as moving the release forward is to continue testing this pair of features looking for rough edges, which is what I have planned for the next month. That's OK with me. I thought you were saying that xmin-pub was going to be easier than expected. Per my other e-mails, I think that we should be shooting for good enough, on time for 9.0., rather than perfect. We can't ever get to perfect if we don't release software. Quite frankly, simply telling people that long-running queries on the slave tend not to be effective, wait for 9.1 is a possibility. If you consider the limitations and silent failures associated with MySQL replication, let alone the issues with other Postgres solutions or the replication of some of the nosql databases, no long-running queries is a positively straightforwards restriction. HS+SR is still a tremendous improvement over the options available previously. We never thought it was going to work for everyone everywhere, and shouldn't let our project's OCD tendencies run away from us. I've been down that road and it leads quickly to the following question: how can I tell how old in time-based units an xid is? If there were an easy answer to that question, vacuum_defer_cleanup_age would already be set in time units. It's the obvious UI to want, it's just not obvious how to build it internally. Maybe I missed something, but my guess is that vacuum_defer_cleanup_age is already as good as it's going to get. Well, we could throw this on the user if we could get them some information on how to calculate that number. For example, some way for them to calculate the number of XIDs per minute via a query, and then set vacuum_defer_cleanup_age appropriately on the master. Sure, it's clunky, but we've already warned people that 9.0 will be clunky and hard to administer. And it's no worse than setting FSM_pages used to be. However, first we need to test that setting vacuum_defer_cleanup_age actually benefits query cancel issues. We're not only canceling queries, we are effectively canceling transactions. It seems quite impossible to repeat all queries from a transaction that has started in the past. One query might be or include the result of a previous query and as the data we see now has changed since then, the client might now want to execute a different query when it gets a different result out of a previous query... Sure, except that I don't expect people to be using explicit transactions as much on the slaves, since they are read-only anyway and can't even create temp tables. So having the retry not retry if there is an explicit transaction would be an OK option. And even if it was possible, how often would you retry? You still have no guarantee that your query succeeds the second time. I'd claim that if a query failed once, chances are even higher that it fails again than that it succeeds the second time. Moreover if you continue to repeat the query and if queries come in at a certain rate, you need to process more and more queries on the slave which will not really help other queries to finish in time nor will it be beneficial for the throughput of the system as a whole... Well, we'd need to have a limited number of retries, which means a GUC in recovery.conf: query_cancel_retry = # This might default to, say, 2. However, I'd still like to hear from someone with the requisite technical knowledge whether capturing and retrying the current query in a query cancel is even possible. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Josh Berkus wrote: Well, we could throw this on the user if we could get them some information on how to calculate that number. For example, some way for them to calculate the number of XIDs per minute via a query, and then set vacuum_defer_cleanup_age appropriately on the master. Sure, it's clunky, but we've already warned people that 9.0 will be clunky and hard to administer. And it's no worse than setting FSM_pages used to be. However, first we need to test that setting vacuum_defer_cleanup_age actually benefits query cancel issues. Proving that setting works as expected is already on my test case grid, seems fine in my limited testing so far. I've started looking into ways to monitor XID churn in a way for setting it better. I'll take care of providing all that in my next test case update. My intent here is to take the ideas outlined in my Hot Standby Tradeoffs blog post and turn that into a new documentation section making it more clear where the problem steps are, regardless of what else happens here. And I need some concrete example of XID burn rate measurement to finish that job. The main problem with setting vacuum_defer_cleanup_age high isn't showing it works, it's a pretty simple bit of code. It's when you recognize that it penalizes all cleanup all the time, whether or not the standby is actually executing a long-running query or not, that you note the second level of pain in increasing it. Returning to the idea of how is this different from a site already in production?, it may very well be the case that a site that sets vacuum_defer_cleanup_age high enough to support off-peak batch reporting cannot tolerate how that will impact vacuums during their peak time of day. The XID export implementation sidesteps that issue by only making the vacuum delay increase when queries that require it are running, turning this back into a standard what's the best time of day to run my big reports? issue that people understand how to cope with already. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On Sun, Feb 28, 2010 at 5:38 PM, Josh Berkus j...@agliodbs.com wrote: Greg, Joachim, As I see it, the main technical obstacle here is that a subset of a feature already on the SR roadmap needs to get built earlier than expected to pull this off. I don't know about Tom, but I have no expectation it's possible for me to get up to speed on that code fast enough to contribute anything there. I expect the thing I'd be most productive at as far as moving the release forward is to continue testing this pair of features looking for rough edges, which is what I have planned for the next month. That's OK with me. I thought you were saying that xmin-pub was going to be easier than expected. Per my other e-mails, I think that we should be shooting for good enough, on time for 9.0., rather than perfect. We can't ever get to perfect if we don't release software. I agree. It seems to me that the right long term fix for the problem of query cancellations on the slave is going to be to give the slave the ability to save multiple versions of relation pages where necessary so that older snapshots can continue to be used even after the conflicting WAL has been applied. However, I'm pretty sure that's going to be a very difficult project which is unlikely to be coded by anyone any time soon, let alone merged. Until it does, we're going to force people to pick from a fairly unappealing menu of options: postpone WAL replay for long periods of time, cancel queries (perhaps even seemingly unrelated to what changed on the master), bloat the master. All of those options are seriously unpleasant. I think, though, that we have to think of this as being like the Windows port, or maybe even more significant than that, as an architectural change. I think it is going to take several releases for this feature to be well-understood and stable and have all the options we'd like it to have. It wouldn't surprise me if we get to 10.0 before we really have truly seamless replication. I don't expect Slony or Londiste or any of the other solutions that are out there now to get kicked to the curb by PG 9.0. Still, a journey of a thousand miles begins with the first step. Simon and many others have put a great deal of time and energy into getting us to the point where we are now, and if we let the fact that we haven't reached our ultimate goal keep us from putting what we have out there in front of our customers, I think we're going to regret that. I think the thing to do is to reposition our PR around these features. We should maybe even go so far as to call them beta or experimental. We shouldn't tell people - this is going to be totally awesome. We should tell people - this is a big improvement, and it's still got some pretty significant limitations, but it's good stuff and it's going in a good direction. Overhyping what we have today is not going to be good for the project, and I'm frankly quite afraid that nothing we can possibly code between now and the release is going to measure up to what people are hoping for. We need to set our own expectations, and those of our customers, at a level at which they can be met. Quite frankly, simply telling people that long-running queries on the slave tend not to be effective, wait for 9.1 is a possibility. Yep. HS+SR is still a tremendous improvement over the options available previously. We never thought it was going to work for everyone everywhere, and shouldn't let our project's OCD tendencies run away from us. Yep. However, I'd still like to hear from someone with the requisite technical knowledge whether capturing and retrying the current query in a query cancel is even possible. I'm not sure who you want to hear from here, but I think that's a dead end. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
I think that what we are going to have to do before we can ship 9.0 is rip all of that stuff out and replace it with the sort of closed-loop synchronization Greg Smith is pushing. It will probably be several months before everyone is forced to accept that, which is why 9.0 is not going to ship this year. Oh, 9.0 will not be shipped in 2010? You guys share same opinion as Tom? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On Fri, Feb 26, 2010 at 1:53 PM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark gsst...@mit.edu writes: In the model you describe any long-lived queries on the slave cause tables in the master to bloat with dead records. Yup, same as they would do on the master. I think this model is on the roadmap but it's not appropriate for everyone and I think one of the benefits of having delayed it is that it forces us to get the independent model right before throwing in extra complications. It would be too easy to rely on the slave feedback as an answer for hard questions about usability if we had it and just ignore the question of what to do when it's not the right solution for the user. I'm going to make an unvarnished assertion here. I believe that the notion of synchronizing the WAL stream against slave queries is fundamentally wrong and we will never be able to make it work. The information needed isn't available in the log stream and can't be made available without very large additions (and consequent performance penalties). As we start getting actual beta testing we are going to uncover all sorts of missed cases that are not going to be fixable without piling additional ugly kluges on top of the ones Simon has already crammed into the system. Performance and reliability will both suffer. I think that what we are going to have to do before we can ship 9.0 is rip all of that stuff out and replace it with the sort of closed-loop synchronization Greg Smith is pushing. It will probably be several months before everyone is forced to accept that, which is why 9.0 is not going to ship this year. Somewhat unusually for me, I haven't been able to keep up with my email over the last few days, so I'm weighing in on this one a bit late. It seems to me that if we're forced to pass the xmin from the slave back to the master, that would be a huge step backward in terms of both scalability and performance, so I really hope it doesn't come to that. I wish I understood better exactly what you mean by the notion of synchronizing the WAL stream against slave queries and why you don't think it will work. Can you elaborate? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On Sun, Feb 28, 2010 at 5:28 AM, Greg Smith g...@2ndquadrant.com wrote: The idea of the workaround is that if you have a single long-running query to execute, and you want to make sure it doesn't get canceled because of a vacuum cleanup, you just have it connect back to the master to keep an open snapshot the whole time. That's basically the same idea that vacuum_defer_cleanup_age implements, except you don't have to calculate a value--you just hold open the snapshot to do it. This sounds like it would require a separate connection for each client on the replica. That would be a pretty big burden for the master. Also, I'm not sure this actually works. When your client makes this additional connection to the master it's connecting at some transaction in the future from the slave's point of view. The master could have already vacuumed away some record which the snapshot the client gets on the slave will have in view. Even if you defer taking the snapshot on the slave until after connecting to the master it's still possibly in the past compared to the xmin on the master. I think to make this work you would have to connect to the master, establish a snapshot, then fetch pg_current_xlog_location(), then poll the slave and wait until it reaches that same position -- and only then perform your query taking care to establish a fresh snapshot for it such as by starting a new transaction on the slave. That's a lot of effort to go to. Still it's a handy practical trick even if it isn't 100% guaranteed to work. But I don't think it provides the basis for something we can bake in. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Robert Haas wrote: It seems to me that if we're forced to pass the xmin from the slave back to the master, that would be a huge step backward in terms of both scalability and performance, so I really hope it doesn't come to that. Not forced to--have the option of. There are obviously workloads where you wouldn't want this. At the same time, I think there are some pretty common ones people are going to expect HS+SR to work on transparently where this would obviously be the preferred trade-off to make, were it available as one of the options. The test case I put together shows an intentionally pathological but not completely unrealistic example of such a workload. I wish I understood better exactly what you mean by the notion of synchronizing the WAL stream against slave queries and why you don't think it will work. Can you elaborate? There's this constant WAL stream coming in from the master to the slave. Each time the slave is about to apply a change from that stream, it considers will this disrupt one of the queries I'm already executing?. If so, it has to make a decision about what to do; that's where the synchronization problem comes from. The current two options are delay applying the change, at which point the master and standby will drift out of sync until the query ends and it can catch back up, or cancel the query. There are tunables for each of these, and they all seem to work fine (albeit without too much testing in the field yet). My concern is that the tunable that tries to implement the other thing you might want to optimize for--avoid letting the master generate WAL entires that are the most likely ones to conflict--just isn't very usable in its current form. Tom and I don't see completely eye to eye on this, in that I'm not so sure the current behaviors are fundamentally wrong and we will never be able to make [them] work. If that's really the case, you may not ever get the scalability/performance results you're hoping for from this release, and really we're all screwed if those are the only approaches available. What I am sure of is that a SR-based xmin passing approach is simpler, easier to explain, more robust for some common workloads, and less likely to give surprised wow, I didn't think *that* would cancel my standby query reports from the field than any way you can configure Hot Standby alone right now. And since I never like to bet against Tom's gut feel, having it around as a plan B in case he's right about an overwhelming round of bug reports piling up against the max_standby_delay etc. logic doesn't hurt either. I spent a little time today seeing if there was any interesting code I might steal from the early synchrep branch at http://git.postgresql.org/gitweb?p=users/fujii/postgres.git;a=summary , but sadly when I tried to rebase that against the master to separate out just the parts unique to it the merge conflicts were overwhelming. I hate getting beaten by merge bitrot even when Git is helping. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On Fri, Feb 26, 2010 at 8:33 AM, Greg Smith g...@2ndquadrant.com wrote: I'm not sure what you might be expecting from the above combination, but what actually happens is that many of the SELECT statements on the table *that isn't even being updated* are canceled. You see this in the logs: Well I proposed that the default should be to wait forever when applying WAL logs that conflict with a query. Precisely because I think the expectation is that things will just work and queries not fail unpredictably. Perhaps in your test a larger max_standby_delay would have prevented the cancellations but then as soon as you try a query which lasts longer you would have to raise it again. There's no safe value which will be right for everyone. If you're running a system that also is using Streaming Replication, there is a much better approach possible. So I think one of the main advantages of a log shipping system over the trigger-based systems is precisely that it doesn't require the master to do anything it wasn't doing already. There's nothing the slave can do which can interfere with the master's normal operation. This independence is really a huge feature. It means you can allow users on the slave that you would never let near the master. The master can continue running production query traffic while users run all kinds of crazy queries on the slave and drive it into the ground and the master will continue on blithely unaware that anything's changed. In the model you describe any long-lived queries on the slave cause tables in the master to bloat with dead records. I think this model is on the roadmap but it's not appropriate for everyone and I think one of the benefits of having delayed it is that it forces us to get the independent model right before throwing in extra complications. It would be too easy to rely on the slave feedback as an answer for hard questions about usability if we had it and just ignore the question of what to do when it's not the right solution for the user. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On Fri, Feb 26, 2010 at 4:43 PM, Richard Huxton d...@archonet.com wrote: Let's see if I've got the concepts clear here, and hopefully my thinking it through will help others reading the archives. There are two queues: I don't see two queues. I only see the one queue of operations which have been executed on the master but not replayed yet on the slave. Every write operation on the master enqueues an operation to it and every operation replayed on the slave dequeues from it. Only a subset of operations create conflicts with concurrent transactions on the slave, namely vacuums and a few similar operations (HOT pruning and btree index pruning). There's no question we need to make sure users have good tools to monitor this queue and are aware of these tools. You can query each slave for its currently replayed log position and hopefully you can find out how long it's been delayed (ie, if it's looking at a log record and waiting for a conflict to clear how long ago that log record was generated). You can also find out what the log position is on the master. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Greg Stark gsst...@mit.edu writes: In the model you describe any long-lived queries on the slave cause tables in the master to bloat with dead records. Yup, same as they would do on the master. I think this model is on the roadmap but it's not appropriate for everyone and I think one of the benefits of having delayed it is that it forces us to get the independent model right before throwing in extra complications. It would be too easy to rely on the slave feedback as an answer for hard questions about usability if we had it and just ignore the question of what to do when it's not the right solution for the user. I'm going to make an unvarnished assertion here. I believe that the notion of synchronizing the WAL stream against slave queries is fundamentally wrong and we will never be able to make it work. The information needed isn't available in the log stream and can't be made available without very large additions (and consequent performance penalties). As we start getting actual beta testing we are going to uncover all sorts of missed cases that are not going to be fixable without piling additional ugly kluges on top of the ones Simon has already crammed into the system. Performance and reliability will both suffer. I think that what we are going to have to do before we can ship 9.0 is rip all of that stuff out and replace it with the sort of closed-loop synchronization Greg Smith is pushing. It will probably be several months before everyone is forced to accept that, which is why 9.0 is not going to ship this year. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Tom Lane wrote: I'm going to make an unvarnished assertion here. I believe that the notion of synchronizing the WAL stream against slave queries is fundamentally wrong and we will never be able to make it work. The information needed isn't available in the log stream and can't be made available without very large additions (and consequent performance penalties). As we start getting actual beta testing we are going to uncover all sorts of missed cases that are not going to be fixable without piling additional ugly kluges on top of the ones Simon has already crammed into the system. Performance and reliability will both suffer. I think that what we are going to have to do before we can ship 9.0 is rip all of that stuff out and replace it with the sort of closed-loop synchronization Greg Smith is pushing. It will probably be several months before everyone is forced to accept that, which is why 9.0 is not going to ship this year. Wow, can I have some varnish with that. :-O You are right that we need to go down the road a bit before we know what we need for 9.0 or 9.1. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On 2/26/10 10:53 AM, Tom Lane wrote: I think that what we are going to have to do before we can ship 9.0 is rip all of that stuff out and replace it with the sort of closed-loop synchronization Greg Smith is pushing. It will probably be several months before everyone is forced to accept that, which is why 9.0 is not going to ship this year. I don't think that publishing visibility info back to the master ... and subsequently burdening the master substantially for each additional slave ... are what most users want. Certainly for use cases like NTT's, it is, but not for most of our users. In fact, I seem to remember specifically discussing the approach of trying to publish snapshots back to the master, and rejecting it on this list during the development of SR. Does anyone know how Oracle solves these issues? Does their structure (separate rollback log) make it easier for them? --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Josh Berkus j...@agliodbs.com writes: On 2/26/10 10:53 AM, Tom Lane wrote: I think that what we are going to have to do before we can ship 9.0 is rip all of that stuff out and replace it with the sort of closed-loop synchronization Greg Smith is pushing. It will probably be several months before everyone is forced to accept that, which is why 9.0 is not going to ship this year. I don't think that publishing visibility info back to the master ... and subsequently burdening the master substantially for each additional slave ... are what most users want. I don't see a substantial additional burden there. What I would imagine is needed is that the slave transmits a single number back --- its current oldest xmin --- and the walsender process publishes that number as its transaction xmin in its PGPROC entry on the master. I don't doubt that this approach will have its own gotchas that we find as we get into it. But it looks soluble. I have no faith in either the correctness or the usability of the approach currently being pursued. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
I don't see a substantial additional burden there. What I would imagine is needed is that the slave transmits a single number back --- its current oldest xmin --- and the walsender process publishes that number as its transaction xmin in its PGPROC entry on the master. If the main purpose of the slave is long-running queries, though, this could cause a lot of bloat on the master. That's a special case, but a reason why we would want to preserve the stop replication functionality. I don't doubt that this approach will have its own gotchas that we find as we get into it. But it looks soluble. I have no faith in either the correctness or the usability of the approach currently being pursued. So, why not start working on it now, instead of arguing about it? It'll be easy to prove the approach once we have some test code. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: On 2/26/10 10:53 AM, Tom Lane wrote: I think that what we are going to have to do before we can ship 9.0 is rip all of that stuff out and replace it with the sort of closed-loop synchronization Greg Smith is pushing. It will probably be several months before everyone is forced to accept that, which is why 9.0 is not going to ship this year. I don't think that publishing visibility info back to the master ... and subsequently burdening the master substantially for each additional slave ... are what most users want. I don't see a substantial additional burden there. What I would imagine is needed is that the slave transmits a single number back --- its current oldest xmin --- and the walsender process publishes that number as its transaction xmin in its PGPROC entry on the master. The additional burden comes from the old snapshot effect. It makes it unusable for offloading reporting queries, for example. In general, it is a very good architectural property that the master is not affected by what happens in a standby, and a closed-loop synchronization would break that. I don't actually understand how tight synchronization on its own would solve the problem. What if the connection to the master is lost? Do you kill all queries in the standby before reconnecting? One way to think about this is to first consider a simple a stop-and-go system. Clearly the database must be consistent at any point in the WAL sequence, if recovery was stopped and the database started up. So it is always safe to pause recovery and run a read-only query against the database as it is at that point in time (this assumes that the index cleanup operations are not required for consistent query results BTW). After the read-only transaction is finished, you can continue recovery. The next step up is to relax that so that you allow replay of those WAL records that are known to not cause trouble to the read-only queries. For example, heap_insert records are very innocent, they only add rows with a yet-uncommitted xmin. Things get more complex when you allow the replay of commit records; all the known-assigned-xids tracking is related to that, so that transactions that are not committed when a snapshot is taken in the standby to be considered uncommitted by the snapshot even after the commit record is later replayed. If that feels too fragile, there might be other methods to achieve that. One I once pondered is to not track all in-progress transactions in shared memory like we do now, but only OldestXmin. When a backend wants to take a snapshot in the slave, it memcpy()s clog from OldestXmin to the latest committed XID, and includes it in the snapshot. The visibility checks use the copy instead of the actual clog, so they see the situation as it was when the snapshot was taken. To keep track of the OldestXmin in the slave, the master can emit that as a WAL record every now and then; it's ok if it lags behind. Then there's the WAL record types that remove data that might still be required by the read-only transactions. This includes vacuum and index deletion records. If you really think the current approach is unworkable, I'd suggest that we fall back to a stop-and-go system, where you either let the recovery to progress or allow queries to run, but not both at the same time. But FWIW I don't think the situation is that grave. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On Fri, Feb 26, 2010 at 7:16 PM, Tom Lane t...@sss.pgh.pa.us wrote: I don't see a substantial additional burden there. What I would imagine is needed is that the slave transmits a single number back --- its current oldest xmin --- and the walsender process publishes that number as its transaction xmin in its PGPROC entry on the master. And when we want to support cascading slaves? Or when you want to bring up a new slave and it suddenly starts advertising a new xmin that's older than the current oldestxmin? But in any case if I were running a reporting database I would want it to just stop replaying logs for a few hours while my big batch report runs, not cause the master to be unable to vacuum any dead records for hours. That defeats much of the purpose of running the queries on the slave. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: I don't actually understand how tight synchronization on its own would solve the problem. What if the connection to the master is lost? Do you kill all queries in the standby before reconnecting? Sure. So what? They'd have been killed if they individually lost connections to the master (or the slave), too. [ assorted analysis based on WAL contents ] The problem is all the interactions that are not reflected (historically anyway) to WAL. We already know about btree page reclamation interlocks and relcache init files. How many others are there, and how messy and expensive is it going to be to deal with them? If you really think the current approach is unworkable, I'd suggest that we fall back to a stop-and-go system, where you either let the recovery to progress or allow queries to run, but not both at the same time. But FWIW I don't think the situation is that grave. I might be wrong. I hope for the sake of the project schedule that I am wrong. But I'm afraid that we will spend several months beavering away to try to make the current approach solid and user-friendly, and eventually conclude that it's a dead end. It would be prudent to have a Plan B; and it looks to me like closed-loop synchronization is the best Plan B. Putting off all thought about it for the next release cycle seems like a recipe for a scheduling disaster. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On Fri, 2010-02-26 at 12:02 -0800, Josh Berkus wrote: I don't see a substantial additional burden there. What I would imagine is needed is that the slave transmits a single number back --- its current oldest xmin --- and the walsender process publishes that number as its transaction xmin in its PGPROC entry on the master. If the main purpose of the slave is long-running queries, though, this could cause a lot of bloat on the master. That's a special case, but a reason why we would want to preserve the stop replication functionality. Do we really think that users, using the slave to run long-running queries is a special case? One of the number one things I can see this being used for is reporting Sincerely, Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
* Greg Stark gsst...@mit.edu [100226 15:10]: On Fri, Feb 26, 2010 at 7:16 PM, Tom Lane t...@sss.pgh.pa.us wrote: I don't see a substantial additional burden there. What I would imagine is needed is that the slave transmits a single number back --- its current oldest xmin --- and the walsender process publishes that number as its transaction xmin in its PGPROC entry on the master. And when we want to support cascading slaves? Or when you want to bring up a new slave and it suddenly starts advertising a new xmin that's older than the current oldestxmin? But in any case if I were running a reporting database I would want it to just stop replaying logs for a few hours while my big batch report runs, not cause the master to be unable to vacuum any dead records for hours. That defeats much of the purpose of running the queries on the slave. *I* would be quite happy having the stop--and-go and the closed-loop be the only 2 modes of operation, and I'ld even be quite happy if the were both limited to separate method: 1) Running SR - then you are forced to use a closed-loop 2) Running HS from a backup/archive - forced to use stop-n-go #1 still needs to deal ith a slave disappearing and not advancing xmin for a period (TCP timeout)? I'll note that until SR does synchronous streaming rep (which will likely require some close-loop plan to allow the slave to be hot), I want situation #2, and hopefully the knob to control how long it allows a stop before going again can be a HUP'able knob so I can change it occasionally without taking the server down... -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
[HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On Fri, Feb 26, 2010 at 8:30 PM, Tom Lane t...@sss.pgh.pa.us wrote: How's it going to do that, when it has no queries at the instant of startup? Why shouldn't it have any queries at walreceiver startup? It has any xlog segments that were copied from the master and any it can find in the archive, it could easily reach a consistent point long before it needs to connect to the master. If you really want to protect your master from any additional overhead you don't currently need to configure a streaming connection at all, you can just use the file shipping interface. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On Fri, Feb 26, 2010 at 9:19 PM, Tom Lane t...@sss.pgh.pa.us wrote: There's *definitely* not going to be enough information in the WAL stream coming from a master that doesn't think it has HS slaves. We can't afford to record all that extra stuff in installations for which it's just useless overhead. BTW, has anyone made any attempt to measure the performance hit that the patch in its current form is creating via added WAL entries? What extra entries? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Tom Lane wrote: I don't see a substantial additional burden there. What I would imagine is needed is that the slave transmits a single number back --- its current oldest xmin --- and the walsender process publishes that number as its transaction xmin in its PGPROC entry on the master. That is exactly the core idea I was trying to suggest in my rambling message. Just that small additional bit of information transmitted and published to the master via that route, and it's possible to optimize this problem in a way not available now. And it's a way that I believe will feel more natural to some users who may not be well served by any of the existing tuning possibilities. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
That is exactly the core idea I was trying to suggest in my rambling message. Just that small additional bit of information transmitted and published to the master via that route, and it's possible to optimize this problem in a way not available now. And it's a way that I believe will feel more natural to some users who may not be well served by any of the existing tuning possibilities. Well, if both you and Tom think it would be relatively easy (or at least easier that continuing to pursue query cancel troubleshooting), then please start coding it. It was always a possible approach, we just collectively thought that query cancel would be easier. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
On Fri, Feb 26, 2010 at 9:44 PM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark gsst...@mit.edu writes: What extra entries? Locks, just for starters. I haven't read enough of the code yet to know what else Simon added. In the past it's not been necessary to record any transient information in WAL, but now we'll have to. Haven't we been writing locks to the WAL since two-phase commit? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Greg Stark wrote: On Fri, Feb 26, 2010 at 9:19 PM, Tom Lane t...@sss.pgh.pa.us wrote: There's *definitely* not going to be enough information in the WAL stream coming from a master that doesn't think it has HS slaves. We can't afford to record all that extra stuff in installations for which it's just useless overhead. BTW, has anyone made any attempt to measure the performance hit that the patch in its current form is creating via added WAL entries? What extra entries? * An xact-assignment record is written every PGPROC_MAX_CACHED_SUBXIDS (= 64) subtransaction ids assigned to a single top-level transaction. * A running-xacts record is written at every online checkpoint * A btree-reuse-page record is written whenever a dead b-tree page is recycled * A vacuum cleanup-info record is written once per VACUUM of a table * A standby-lock record is written for each AccessExclusiveLock acquired. Am I missing something? I doubt any of these are noticeable, though I don't think anyone has measured it. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Hot Standby query cancellation and Streaming Replication integration
Josh Berkus wrote: That is exactly the core idea I was trying to suggest in my rambling message. Just that small additional bit of information transmitted and published to the master via that route, and it's possible to optimize this problem in a way not available now. And it's a way that I believe will feel more natural to some users who may not be well served by any of the existing tuning possibilities. Well, if both you and Tom think it would be relatively easy (or at least easier that continuing to pursue query cancel troubleshooting), then please start coding it. It was always a possible approach, we just collectively thought that query cancel would be easier. You still need query cancels. A feedback loop just makes it happen less frequently. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers