[HACKERS] Reg: SQL Query for Postgres 8.4.3
Hi, Can any one tell me whats the effect of the below Query SELECT substring(B'0001' from 5 for -2); SELECT substring(B'0001' from 4 for -3); its observed that there's an Error invalid memory alloc request size 4244635647 What will actually happen to the Postgresql database. I am using Postgresql 8.4.3 on Ubuntu 8.04 Thanks in Advance Regards, Srinivas Naik
[HACKERS] Pause/Resume feature for Hot Standby
In the original patch I had Pause/Resume feature for controlling recovery during Hot Standby. It was removed for lack of time. With all the discussion around the HS UI, it would be something that could be back very easily. I would like to do it as a recovery control plugin. The plugin would be passed 3 pieces of information and be called before each WAL record was applied: * current WAL pointer * xid - 0 if not a commit/abort * timestamp - if available * boolean flag indicating whether it's a record type that conflicts **No user data would be passed to the plugin**, so no need to revisit the discussions around WAL plugins etc.. The plugin has the benefit of providing a whole range of possible control options, as well as being minimal performance overhead. This would allow initially allow * Pause * Resume and would go into 9.0 as a contrib module, included with the plugin patch. Later we would be able to add on such things as * Pause for a delay * Seek to a particular xid commit record * Seek to a particular WAL pointer and stop This would be particularly helpful in designing an automated test suite, since we can recheck the snapshot after each commit to verify it matches the primary. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TOAST code ignores freespace (was Tweak TOAST code)
On Mon, 2010-05-03 at 23:36 -0400, Jan Wieck wrote: Suppose we had something similar to the NULL value bitmap, specifying plain or compressed values (not TOAST references), that are moved to a shadow tuple inside the toast table. Suppose further we had some statistics about how often attributes appear in a qualification (i.e. end up in a scan key or scan filter or other parts of the qual expression list). Not sure, maybe we even want to know how often or seldom an attribute is heap_getattr()'d at all. Those don't need to be accurate counts. Small random samples will probably do. ANALYZE could evaluate those statistics and adjust the shadow storage settings per attribute accordingly. I can imagine many applications, where this would shrink the main tuples to almost nothing at all. Automatic vertical partitioning. Like it. TODO item for further detailed research. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_standby_delay considered harmful
On Mon, 2010-05-03 at 15:04 -0700, Josh Berkus wrote: I don't see the issue with Tom's approach from a wait perspective. The max wait becomes 1.001X max_standby_delay; there's no way I can think of that replay would wait longer than that. I've yet to see an explanation why it would be longer. Yes, the max wait on any *one* blocker will be max_standby_delay. But if you wait for two blockers, then the total time by which the standby lags will now be 2*max_standby_delay. Add a third, fourth etc and the standby lag keeps rising. We need to avoid confusing these two measurables * standby lag - defined as the total delay from when a WAL record is written to the time the WAL record is applied. This includes both transfer time and any delays imposed by Hot Standby. * standby query delay - defined as the time that recovery will wait for a query to complete before a cancellation takes place. (We could complicate this by asking what happens when recovery is blocked twice by the same query? Would it wait twice, or does it have to track how much it has waited for each query in total so far?) Currently max_standby_delay seeks to constrain the standby lag to a particular value, as a way of providing a bounded time for failover, and also to constrain the amount of WAL that needs to be stored as the lag increases. Currently, there is no guaranteed minimum query delay given to each query. If every query is guaranteed its requested query delay then the standby lag will be unbounded. Less cancellations, higher lag. Some people do want this, though is not currently available. We can do this with two new GUCs: * standby_query_delay - USERSET parameter that allows user to specify a guaranteed query delay, anywhere from 0 to maximum_standby_query_delay * max_standby_query_delay - SIGHUP parameter - parameter exists to provide DBA with a limit on the USERSET standby_query_delay, though I can see some would say this is optional Current behaviour is same as global settings of standby_query_delay = 0 max_standby_query_delay = 0 max_standby_delay = X So if people want minimal cancellations they would specify standby_query_delay = Y (e.g. 30) max_standby_query_delay = Z (e.g. 300) max_standby_delay = -1 -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_standby_delay considered harmful
On Mon, 2010-05-03 at 22:45 -0400, Bruce Momjian wrote: As I remember, 9.0 has two behaviors: o master delays vacuum cleanup o slave delays WAL application and in 9.1 we will be adding: o slave communicates snapshots to master How would this figure into what we ultimately want in 9.1? We would still want all options, since slave communicates snapshot to master doesn't solve the problem it just moves the problem elsewhere. It's a question of which factors the user wishes to emphasise for their specific use. I understand Simon's point that the two behaviors have different benefits. However, I believe few users will be able to understand when to use which. If users can understand how to set NDISTINCT for a column, they can understand this. It's not about complexity of UI, its about solving problems. When people hit an issue, I don't want to be telling people we thought you wouldn't understand it, so we removed the parachute. They might not understand it *before* they hit a problem, so what? But users certainly will afterwards and won't say thanks if you prevent an option for them, especially for the stated reason. (My point about ndistinct: 99% of users have no idea that exists or when to use it, but it still exists as an option because it solves a known issue, just like this.) -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pause/Resume feature for Hot Standby
Simon Riggs si...@2ndquadrant.com writes: In the original patch I had Pause/Resume feature for controlling recovery during Hot Standby. It was removed for lack of time. With all the discussion around the HS UI, it would be something that could be back very easily. Please! Manual control over recovery is the best solution ever proposed for giving the user explicit control over the trade-off between HA and slave queries. It would allow us to say that by default, conflict favors WAL recovery no matter what. If you want to ensure your queries won't get canceled, pause the recovery, run your report, resume the recovery. I understand that automated and flexible conflict resolution still is needed or wanted even with this UI, but that would allow a much more crude automated tool to be acceptable. Specifically, it could only target short queries on the standby, for long running queries you don't want to get cancelled, pause the recovery. Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reg: SQL Query for Postgres 8.4.3
On 04/05/10 18:47, Srinivas Naik wrote: Hi, Can any one tell me whats the effect of the below Query SELECT substring(B'0001' from 5 for -2); SELECT substring(B'0001' from 4 for -3); its observed that there's an Error invalid memory alloc request size 4244635647 What will actually happen to the Postgresql database. I am using Postgresql 8.4.3 on Ubuntu 8.04 I can reproduce this with 8.4.*2* on Ubuntu 10.04 32-bit (postgres built from src) but *not* with 8.4.*3*. Can you double check it is 8.4.3 you are using? and if so tell us whether you are using 32 or 64 bit Ubuntu! Cheers Mark
Re: [HACKERS] max_standby_delay considered harmful
On Tue, May 4, 2010 at 4:37 AM, Simon Riggs si...@2ndquadrant.com wrote: option for them, especially for the stated reason. (My point about ndistinct: 99% of users have no idea that exists or when to use it, but it still exists as an option because it solves a known issue, just like this.) Slightly OT, but funnily enough, when I was up in New York a couple of weeks ago with Bruce and a couple of other folks, I started talking with a DBA up there about his frustrations with PostgreSQL, and - I'm not making this up - the first example he gave me of something he wished he could do in PG to improve query planning was manually override ndistinct estimates. He was pleased to here that we'll have that in 9.0 and I was pleased to be able to tell him it was my patch. If you'd asked me what the odds that someone picking a missing feature would have come up with that one were, I'd have said a billion-to-one against. But I'm not making this up. To be honest, I am far from convinced that the existing behavior is a good one and I'm in favor of modifying it or ripping it out altogether if we can think of something better. But it has to really be better, of course, not just trading one set of pain points for another. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_standby_delay considered harmful
On Tue, 2010-05-04 at 07:13 -0400, Robert Haas wrote: On Tue, May 4, 2010 at 4:37 AM, Simon Riggs si...@2ndquadrant.com wrote: option for them, especially for the stated reason. (My point about ndistinct: 99% of users have no idea that exists or when to use it, but it still exists as an option because it solves a known issue, just like this.) Slightly OT, but funnily enough, when I was up in New York a couple of weeks ago with Bruce and a couple of other folks, I started talking with a DBA up there about his frustrations with PostgreSQL, and - I'm not making this up - the first example he gave me of something he wished he could do in PG to improve query planning was manually override ndistinct estimates. He was pleased to here that we'll have that in 9.0 and I was pleased to be able to tell him it was my patch. If you'd asked me what the odds that someone picking a missing feature would have come up with that one were, I'd have said a billion-to-one against. But I'm not making this up. It matches my experience. I think its a testament to the expertise of our users as well to the hackers that have done so much to make that the top of user's lists for change. To be honest, I am far from convinced that the existing behavior is a good one and I'm in favor of modifying it or ripping it out altogether if we can think of something better. But it has to really be better, of course, not just trading one set of pain points for another. The only way I see as genuine better rather than just a different mix of trade-offs is to come up with ways where there are no conflicts. Hannu came up with one, using filesystem snapshots, but we haven't had time to implement that yet. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reg: SQL Query for Postgres 8.4.3
Hi Mark, Please find the below details: postgresql-8.3 and UBUNTU-8.10 with linux-image-2.6.27.18-standard_810_i386.deb and its an 32bit Ubuntu. On Tue, May 4, 2010 at 9:02 AM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: On 04/05/10 18:47, Srinivas Naik wrote: Hi, Can any one tell me whats the effect of the below Query SELECT substring(B'0001' from 5 for -2); SELECT substring(B'0001' from 4 for -3); its observed that there's an Error invalid memory alloc request size 4244635647 What will actually happen to the Postgresql database. I am using Postgresql 8.4.3 on Ubuntu 8.04 I can reproduce this with 8.4.*2* on Ubuntu 10.04 32-bit (postgres built from src) but *not* with 8.4.*3*. Can you double check it is 8.4.3 you are using? and if so tell us whether you are using 32 or 64 bit Ubuntu! Cheers Mark
Re: [HACKERS] what is good solution for support NULL inside string_to_array function?
2010/5/4 Josh Berkus j...@agliodbs.com: quietly removing NULL is maybe good for compatibility but is wrong for functionality. I agree. I wasn't aware of this little misfeature. Default display for NULL should be a zero-length string. I disagree - NULL is NULL, not empty string (Oracle is different) if array_to_string is equivalent to x[1] || sep || x[2] || sep || x[3] || sep then correct result is NULL and then string_to_array and array_to_string are correct, because string_to_array cannot contain any NULL symbol. Regards Pavel Stehule -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_standby_delay considered harmful
* Simon Riggs (si...@2ndquadrant.com) wrote: If recovery waits for max_standby_delay every time something gets in its way, it should be clear that if many things get in its way it will progressively fall behind. There is no limit to this and it can always fall further behind. It does result in fewer cancelled queries and I do understand many may like that. Guess I wasn't very clear in my previous description of what I *think* the change would be (Tom, please jump in if I've got this wrong..). Recovery wouldn't wait max_standby_delay every time; I agree, that would be a big change in behaviour and could make it very difficult for the slave to keep up. Rather, recovery would proceed as normal until it encounters a lock, at which point it would start a counting down from max_standby_delay, if the lock is released before it hits that, then it will move on, if another lock is encoutered, it would start counting down from where it left off last time. If it hits zero, it'll cancel the other query, and any other queries that get in the way, until it's caught up again completely. Once recovery is fully caught up, the counter would reset again to max_standby_delay. That is *significantly* different from how it works now. (Plus: If there really was no difference, why not leave it as is?) Because it's much more complicated the way it is, it doesn't really work as one would expect in a number of situations, and it's trying to guarantee something that it probably can't. The bottom line is this is about conflict resolution. There is simply no way to resolve conflicts without favouring one or other of the protagonists. Whatever mechanism you come up with that favours one will, disfavour the other. I'm happy to give choices, but I'm not happy to force just one kind of conflict resolution. I don't think anyone is trying to get rid of the knob entirely; you're right, you can't please everyone all the time, so there has to be some kind of knob there which people can adjust based on their particular use case and system. This is about what exactly the knob is and how it's implemented and documented. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Pause/Resume feature for Hot Standby
On Tue, May 4, 2010 at 4:02 AM, Simon Riggs si...@2ndquadrant.com wrote: In the original patch I had Pause/Resume feature for controlling recovery during Hot Standby. It was removed for lack of time. Well, it's not like we have more time now than we did then. I think we need to postpone this discussion to 9.1. If we're going to start accepting patches for new features, then why should we accept only patches for HS/SR? I have two patches already in the queue that I'd like to see committed and if I thought that there was a chance of getting anything further done for 9.0, there'd be several more. Many other people have patches waiting also, or are holding off development because we are in feature freeze right now. Hot Standby is a great feature, but, I don't see any reason to say that we're going to allow new feature development just for HS but not for anything else. I also think that worrying about fine-tuning HS at this point is a bit like complaining that the jump suits of the crew of the Space Shuttle Challenger were not made of 100% recyclable materials. Just yesterday we had a report of an HS server getting into a state where it failed to shut down properly; and I believe that we never fully resolved the issue of occasional extremely-long spikes in HS response time, either. Heikki just fixed a bug our btree recovery code which is apparently new to 9.0 since he did not backpatch it. I think that getting into a discussion of pausing and resuming recovery, or even the parallel discussion on max_standby_delay, are fiddling with things that, granted, are probably not ideal, and yes, we should improve them in a future release, but they're not what we should be worrying about right now. What I think we SHOULD be worried about right now - VERY worried - is stabilizing the existing Hot Standby code to the point where it won't be an embarrassment to us when we ship it. The rate at which we're finding new problems even with the small number of people who test alpha releases and nightly snapshots suggests to me that we're not there yet. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reg: SQL Query for Postgres 8.4.3
On Tue, May 4, 2010 at 7:46 AM, Srinivas Naik naik.sr...@gmail.com wrote: Hi Mark, Please find the below details: postgresql-8.3 and UBUNTU-8.10 with linux-image-2.6.27.18-standard_810_i386.deb and its an 32bit Ubuntu. Err, before you said 8.4.3. Now you're saying 8.3. Those are totally different. Can we get the exact identifier of the package you have installed, plus the output from SELECT version(); Thanks, ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_standby_delay considered harmful
On Tue, 2010-05-04 at 09:12 -0400, Stephen Frost wrote: * Simon Riggs (si...@2ndquadrant.com) wrote: If recovery waits for max_standby_delay every time something gets in its way, it should be clear that if many things get in its way it will progressively fall behind. There is no limit to this and it can always fall further behind. It does result in fewer cancelled queries and I do understand many may like that. Guess I wasn't very clear in my previous description of what I *think* the change would be (Tom, please jump in if I've got this wrong..). Recovery wouldn't wait max_standby_delay every time; I agree, that would be a big change in behaviour and could make it very difficult for the slave to keep up. Rather, recovery would proceed as normal until it encounters a lock, at which point it would start a counting down from max_standby_delay, if the lock is released before it hits that, then it will move on, if another lock is encoutered, it would start counting down from where it left off last time. If it hits zero, it'll cancel the other query, and any other queries that get in the way, until it's caught up again completely. Once recovery is fully caught up, the counter would reset again to max_standby_delay. This new clarification is almost exactly how it works already. Sounds like the existing docs need some improvement. The only difference is that max_standby_delay is measured from log timestamp. Perhaps it should work from WAL receipt timestamp rather than from log timestamp? That would make some of the problems go away without significantly changing the definition. I'll look at that. (And that conflicts are caused by more situations than just locks, but that detail doesn't alter your point). The bottom line is this is about conflict resolution. There is simply no way to resolve conflicts without favouring one or other of the protagonists. Whatever mechanism you come up with that favours one will, disfavour the other. I'm happy to give choices, but I'm not happy to force just one kind of conflict resolution. I don't think anyone is trying to get rid of the knob entirely; you're right, you can't please everyone all the time, so there has to be some kind of knob there which people can adjust based on their particular use case and system. This is about what exactly the knob is and how it's implemented and documented. I'm happy with more than one way. It'd be nice if a single parameter, giving one dimension of tuning, suited all ways people have said they would like it to behave. I've not found a way of doing that. I have no problem at all with adding additional parameters or mechanisms to cater for the multiple dimensions of control people have asked for. So your original interpretation is also valid for some users. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reg: SQL Query for Postgres 8.4.3
On Tue, May 4, 2010 15:40, Robert Haas wrote: On Tue, May 4, 2010 at 7:46 AM, Srinivas Naik naik.sr...@gmail.com wrote: Hi Mark, Please find the below details: postgresql-8.3 and UBUNTU-8.10 with linux-image-2.6.27.18-standard_810_i386.deb and its an 32bit Ubuntu. Err, before you said 8.4.3. Now you're saying 8.3. Those are totally different. Can we get the exact identifier of the package you have installed, plus the output from SELECT version(); fwiw, results for all current postgres versions: -- to be executed: SELECT substring(B'0001' from 5 for -2); SELECT substring(B'0001' from 4 for -3); -- postgres 9.0beta1 ERROR: negative substring length not allowed ERROR: negative substring length not allowed -- postgres 8.4.3 0001 10001 -- postgres 8.3.10 0001 10001 -- postgres 8.2.16 0001 10001 -- postgres 8.1.20 0001 10001 -- postgres 8.0.24 0001 10001 -- postgres 7.4.28 0001 10001 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] what is good solution for support NULL inside string_to_array function?
2010/5/4 Pavel Stehule pavel.steh...@gmail.com: 2010/5/4 Tom Lane t...@sss.pgh.pa.us: Josh Berkus j...@agliodbs.com writes: quietly removing NULL is maybe good for compatibility but is wrong for functionality. I agree. I wasn't aware of this little misfeature. Default display for NULL should be a zero-length string. That's just as broken as Pavel's suggestion. Unless you have something that is guaranteed distingishable from the output of any non-null value, you really can't make a significant improvement here. I wouldn't modify current two params string_to_array and array_to_string function. So there are not any default string (maybe empty string) for NULL. My proposal is new three params functions with explicit null string definition. This cannot break compatibility and enhance functionality - It is just short cut for code from my proposal - in C this functionality can by implemented much faster. I did some coding - the patch can be very simple postgres=# select array_to_string(array[1,2,3,4,5,null],',','*'); array_to_string - 1,2,3,4,5,* (1 row) Time: 0,501 ms postgres=# select string_to_array(array_to_string(array[1,2,3,4,5,null],',','*'),',','*'); string_to_array -- {1,2,3,4,5,NULL} (1 row) Time: 0,617 ms postgres=# select string_to_array('1,2,3,4,5,*',',','*')::int[]; string_to_array -- {1,2,3,4,5,NULL} (1 row) Time: 0,652 ms and then string_to_array and array_to_string are orthogonal with NULL. Pavel Regards Pavel regards, tom lane 3params_arrayfce.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reg: SQL Query for Postgres 8.4.3
On Tue, May 4, 2010 at 10:08 AM, Erik Rijkers e...@xs4all.nl wrote: fwiw, results for all current postgres versions: [ only 9.0beta1 is different ] It looks like the relevant commits are: commit 822f2ac5a2ec7c6f10634f62a0b2dc6cc9929759 Author: Tom Lane t...@sss.pgh.pa.us Date: Mon Jan 25 20:55:32 2010 + Add get_bit/set_bit functions for bit strings, paralleling those for bytea, and implement OVERLAY() for bit strings and bytea. In passing also convert text OVERLAY() to a true built-in, instead of relying on a SQL function. Leonardo F, reviewed by Kevin Grittner commit 75dea10196c31d98d98c0bafeeb576ae99c09b12 Author: Tom Lane t...@sss.pgh.pa.us Date: Thu Jan 7 19:53:11 2010 + Make bit/varbit substring() treat any negative length as meaning all the re of the string. The previous coding treated only -1 that way, and would produce an invalid result value for other negative values. We ought to fix it so that 2-parameter bit substring() is a different C function and the 3-parameter form throws error for negative length, but that takes a pg_proc change which is impractical in the back branches; and in any case somebody might be relying on -1 working this way. So just do this as a back-patchable fix. I think the OP is probably running a version that doesn't include the Jan 7 commit, which was effectively undone by the Jan 25 commit for CVS HEAD. It looks like this was intentional based on spec behavior of overlay(), but should we consider maintaining the historical behavior instead? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] what is good solution for support NULL inside string_to_array function?
On Tue, May 4, 2010 at 10:05 AM, Pavel Stehule pavel.steh...@gmail.com wrote: and then string_to_array and array_to_string are orthogonal with NULL. I like the behavior, but should it share the name with the 2 argument version given the incompatibility? Maybe make a new function to_string(anyarray, sep, nullsym='') and deprecate the old one? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_standby_delay considered harmful
Downthread, I said.. On Tue, 2010-05-04 at 14:49 +0100, Simon Riggs wrote: The only difference is that max_standby_delay is measured from log timestamp. Perhaps it should work from WAL receipt timestamp rather than from log timestamp? That would make some of the problems go away without significantly changing the definition. I'll look at that. Patch to implement this idea attached: for discussion, not tested yet. No docs yet. The attached patch redefines standby delay to be the amount of time elapsed from point of receipt to point of application. The point of receipt is reset every chunk of data when streaming, or every file when reading file by file. In all cases this new time is later than the latest log time we would have used previously. This addresses all of your points, as shown below. On Mon, 2010-05-03 at 11:37 -0400, Tom Lane wrote: There are three really fundamental problems with it: 1. The timestamps we are reading from the log might be historical, if we are replaying from archive rather than reading a live SR stream. In the current implementation that means zero grace period for standby queries. Now if your only interest is catching up as fast as possible, that could be a sane behavior, but this is clearly not the only possible interest --- in fact, if that's all you care about, why did you allow standby queries at all? The delay used is from time of receipt of WAL, no longer from log date. So this would no longer apply. 2. There could be clock skew between the master and slave servers. If the master's clock is a minute or so ahead of the slave's, again we get into a situation where standby queries have zero grace period, even though killing them won't do a darn thing to permit catchup. If the master is behind the slave then we have an artificially inflated grace period, which is going to slow down the slave. The timestamp is from standby, not master, so this would no longer apply. 3. There could be significant propagation delay from master to slave, if the WAL stream is being transmitted with pg_standby or some such. Again this results in cutting into the standby queries' grace period, for no defensible reason. The timestamp is taken immediately at the point the WAL is ready for replay, so other timing overheads would not be included. In addition to these fundamental problems there's a fatal implementation problem: the actual comparison is not to the master's current clock reading, but to the latest commit, abort, or checkpoint timestamp read from the WAL. Thus, if the last commit was more than max_standby_delay seconds ago, zero grace time. Now if the master is really idle then there aren't going to be any conflicts anyway, but what if it's running only long-running queries? Or what happens when it was idle for awhile and then starts new queries? Zero grace period, that's what. We could possibly improve matters for the SR case by having walsender transmit the master's current clock reading every so often (probably once per activity cycle), outside the WAL stream proper. The receiver could subtract off its own clock reading in order to measure the skew, and then we could cancel queries if the de-skewed transmission time falls too far behind. However this doesn't do anything to fix the cases where we aren't reading (and caught up to) a live SR broadcast. -- Simon Riggs www.2ndQuadrant.com *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *** *** 9281,9287 retry: --- 9281,9294 sources); switched_segment = true; if (readFile != -1) + { + /* + * Nudge forwards the WAL receive timestamp when + * we are relying on WAL files. + */ + SetWalRcvTimestamp(); break; + } /* * Nope, not found in archive and/or pg_xlog. *** a/src/backend/replication/walreceiver.c --- b/src/backend/replication/walreceiver.c *** *** 515,521 XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr) } } ! /* Flush the log to disk */ static void XLogWalRcvFlush(void) { --- 515,521 } } ! /* Flush the log to disk and update shared memory pointer and timestamp */ static void XLogWalRcvFlush(void) { *** *** 524,537 XLogWalRcvFlush(void) --- 524,541 /* use volatile pointer to prevent code rearrangement */ volatile WalRcvData *walrcv = WalRcv; char activitymsg[50]; + TimestampTz receivedTimestamp; issue_xlog_fsync(recvFile, recvId, recvSeg); LogstreamResult.Flush = LogstreamResult.Write; + receivedTimestamp = GetCurrentTimestamp(); + /* Update shared-memory status */ SpinLockAcquire(walrcv-mutex); walrcv-receivedUpto = LogstreamResult.Flush; + walrcv-receivedTimestamp = receivedTimestamp; SpinLockRelease(walrcv-mutex); /* Report XLOG streaming progress in PS display */ ***
Re: [HACKERS] Reg: SQL Query for Postgres 8.4.3
Robert Haas robertmh...@gmail.com wrote: I think the OP is probably running a version that doesn't include the Jan 7 commit, which was effectively undone by the Jan 25 commit for CVS HEAD. It sure looks like it. It looks like this was intentional based on spec behavior of overlay(), but should we consider maintaining the historical behavior instead? I know I read through the spec (several versions of it) related to this issue when I reviewed the patch, and if memory serves the 9.0 behavior is what the spec requires. Obviously that's a behavior change, so it can't be back-patched. I'm inclined to think the previous behavior was pretty marginal, and there is certainly a workaround -- omit the third parameter rather than specifying a negative number: SELECT substring(B'0001' from 5); substring -- 0001 (1 row) SELECT substring(B'0001' from 4); substring --- 10001 (1 row) We have maintained nonstandard behavior in the past for compatibility reasons, so it's a fair question; however, I'm inclined toward the standard on this one. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] what is good solution for support NULL inside string_to_array function?
2010/5/4 Merlin Moncure mmonc...@gmail.com: On Tue, May 4, 2010 at 10:05 AM, Pavel Stehule pavel.steh...@gmail.com wrote: and then string_to_array and array_to_string are orthogonal with NULL. I like the behavior, but should it share the name with the 2 argument version given the incompatibility? Maybe make a new function to_string(anyarray, sep, nullsym='') and deprecate the old one? maybe to_string X to_array ... Why not? It shorter, maybe it is cleaner Regards Pavel merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_standby_delay considered harmful
On Tue, 2010-05-04 at 14:49 +0100, Simon Riggs wrote: The only difference is that max_standby_delay is measured from log timestamp. Perhaps it should work from WAL receipt timestamp rather than from log timestamp? That would make some of the problems go away without significantly changing the definition. I'll look at that. Patch to implement this idea posted in response to OT, upthread, so I can respond to the original complaints directly. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reg: SQL Query for Postgres 8.4.3
On Tue, May 4, 2010 at 10:29 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: We have maintained nonstandard behavior in the past for compatibility reasons, so it's a fair question; however, I'm inclined toward the standard on this one. In a case like this, it seems unlikely that someone would be counting on a negative value to throw an error, so I tend to regard doing something else as an extension of the standard rather than a deviation from it. But I don't have strong feelings about it. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reg: SQL Query for Postgres 8.4.3
Robert Haas robertmh...@gmail.com writes: On Tue, May 4, 2010 at 10:29 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: We have maintained nonstandard behavior in the past for compatibility reasons, so it's a fair question; however, I'm inclined toward the standard on this one. In a case like this, it seems unlikely that someone would be counting on a negative value to throw an error, so I tend to regard doing something else as an extension of the standard rather than a deviation from it. But I don't have strong feelings about it. The reason we changed it is that our other versions of substring() already had the spec-required behavior of throwing error for negative length. Only the bit/varbit implementation was out of step. The OP did not state that this behavioral change broke his application, anyway. I suspect the actual subtext is that he's poking into the vulnerability report that was issued against the unpatched code. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pause/Resume feature for Hot Standby
On Tue, 2010-05-04 at 09:36 -0400, Robert Haas wrote: On Tue, May 4, 2010 at 4:02 AM, Simon Riggs si...@2ndquadrant.com wrote: In the original patch I had Pause/Resume feature for controlling recovery during Hot Standby. It was removed for lack of time. Well, it's not like we have more time now than we did then. I think we need to postpone this discussion to 9.1. If we're going to start accepting patches for new features, then why should we accept only patches for HS/SR? Robert, This is clearly a response to issues raised about HS and not a new feature. It's also proposed in the most minimal way possible with respect for the current state of release. Why is you think I want to go to beta less quickly than anyone else? I have many other items to work on in the new release also, none of them have been even discussed, again out of respect for the timing and the process. I also think that worrying about fine-tuning HS at this point is a bit like complaining that the jump suits of the crew of the Space Shuttle Challenger were not made of 100% recyclable materials. Just yesterday we had a report of an HS server getting into a state where it failed to shut down properly; and I believe that we never fully resolved the issue of occasional extremely-long spikes in HS response time, either. Heikki just fixed a bug our btree recovery code which is apparently new to 9.0 since he did not backpatch it. I think that getting into a discussion of pausing and resuming recovery, or even the parallel discussion on max_standby_delay, are fiddling with things that, granted, are probably not ideal, and yes, we should improve them in a future release, but they're not what we should be worrying about right now. What I think we SHOULD be worried about right now - VERY worried - is stabilizing the existing Hot Standby code to the point where it won't be an embarrassment to us when we ship it. The rate at which we're finding new problems even with the small number of people who test alpha releases and nightly snapshots suggests to me that we're not there yet. There hasn't been anything more than a minor bug in weeks, so not really sure how you arrive at that the idea the code needs stabilising. But even if you think we need stabilising, how do you propose I do that? What exact action? When people complain, I propose solutions. If you then object that the proposed solution is actually a new feature, that leaves us in a deadlock. There is no evidence that Erik's strange performance has anything to do with HS; it hasn't been seen elsewhere and he didn't respond to questions about the test setup to provide background. The profile didn't fit any software problem I can see. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pause/Resume feature for Hot Standby
Simon Riggs si...@2ndquadrant.com writes: In the original patch I had Pause/Resume feature for controlling recovery during Hot Standby. It was removed for lack of time. With all the discussion around the HS UI, it would be something that could be back very easily. Sure. In 9.1. You have enough bugs to fix that you have *no* business thinking about adding features for 9.0, even if that were permissible under the ground rules for beta. Pretending that it's a contrib module is just a transparent end-run around that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reg: SQL Query for Postgres 8.4.3
Robert Haas robertmh...@gmail.com wrote: But I don't have strong feelings about it. Nor do I. Perhaps this question should be floated on -general? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pause/Resume feature for Hot Standby
On Tue, 2010-05-04 at 11:12 -0400, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: In the original patch I had Pause/Resume feature for controlling recovery during Hot Standby. It was removed for lack of time. With all the discussion around the HS UI, it would be something that could be back very easily. Sure. In 9.1. You have enough bugs to fix that you have *no* business thinking about adding features for 9.0, even if that were permissible under the ground rules for beta. Pretending that it's a contrib module is just a transparent end-run around that. As stated, this was proposed as a response to your gripes elsewhere. If people gripe, I propose a solution. I'm happy if you say No to the proposed solution, but let's not pretend I'm breaking rules all the time when I do. What bugs do I have to fix? I am not aware of any. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] buildfarm building all live branches from git
Robert Haas wrote: On Mon, May 3, 2010 at 4:04 PM, Andrew Dunstan and...@dunslane.net wrote: For those who have been following along, today I reached a bit of a milestone. The buildfarm member quoll, which is running a git-ized buildfarm client due to be released in a week or so, today successfully built all the live branches, all the way back to 7.4, from git. Note, this is running from my test git repo, not the community's repo. Sadly, that means its change links will be broken - I'm not exactly sure what gets hashed to provide a commit ID in git, but the IDs don't match between these two repos. See http://www.pgbuildfarm.org/cgi-bin/show_status.pl?member=quoll. Thanks for working on this. You're welcome ;-) I have actually fixed the change link issue. I added an enhancement that allows the client to specify a URL where changesets can be found. The server code no longer presumptively assumes that the changeset can be found on the community repo. see the bottom of http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=quolldt=2010-05-03%2021:21:11 for an example of this in action. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pause/Resume feature for Hot Standby
On Tue, May 4, 2010 at 11:10 AM, Simon Riggs si...@2ndquadrant.com wrote: This is clearly a response to issues raised about HS and not a new feature. I don't find that clear at all. In fact, I find the exact opposition position to be clear. It's also proposed in the most minimal way possible with respect for the current state of release. Why is you think I want to go to beta less quickly than anyone else? We're already in beta. I said nothing about when you want to go to beta or do anything else. There hasn't been anything more than a minor bug in weeks, so not really sure how you arrive at that the idea the code needs stabilising. I don't agree that there hasn't been anything more than a minor bug in weeks. I arrive at the idea that the code needs stabilizing on the basis of the fact that we keep finding new bugs. When people complain, I propose solutions. If you then object that the proposed solution is actually a new feature, that leaves us in a deadlock. Not really. You're entitled to say what you think we should do and I am entitled to say what I think we should do. I think we should wait for 9.1. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing HS/SR - 1 vs 2 performance
Hi Simon, In another thread you mentioned you were lacking information from me: On Tue, May 4, 2010 17:10, Simon Riggs wrote: There is no evidence that Erik's strange performance has anything to do with HS; it hasn't been seen elsewhere and he didn't respond to questions about the test setup to provide background. The profile didn't fit any software problem I can see. I'm sorry if I missed requests for things that where not already mentioned. Let me repeat: OS: Centos 5.4 2 quadcores: Intel(R) Xeon(R) CPU X5482 @ 3.20GHz Areca 1280ML primary and standby db both on a 12 disk array (sata 7200rpm, Seagat Barracuda ES.2) It goes without saying (I hope) that apart from the pgbench tests and a few ssh sessions (myself), the machine was idle. It would be interesting if anyone repeated these simple tests and produced evidence that these non-HS. (Unfortunately, I have at the moment not much time for more testing) thanks, Erik Rijkers On Sun, April 25, 2010 21:07, Simon Riggs wrote: On Sun, 2010-04-25 at 20:25 +0200, Erik Rijkers wrote: Sorry if it's too much data, but to me at least it was illuminating; I now understand the effects of the different parameters better. That's great, many thanks. A few observations * Standby performance is actually slightly above normal running. This is credible because of the way snapshots are now taken. We don't need to scan the procarray looking for write transactions, since we know everything is read only. So we scan just the knownassignedxids, which if no activity from primary will be zero-length, so snapshots will actually get taken much faster in this case on standby. The snapshot performance on standby is O(n) where n is the number of write transactions currently on primary (transfer delays blur the word currently). * The results for scale factor 100 are fine, and the results for 100 with few connections get thrown out by long transaction times. With larger numbers of connections the wait problems seem to go away. Looks like Erik (and possibly Hot Standby in general) has an I/O problem, though from what is not yet determined. It could be just hardware, or might be hardware plus other factors. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing HS/SR - 1 vs 2 performance
On Tue, 2010-05-04 at 18:10 +0200, Erik Rijkers wrote: It would be interesting if anyone repeated these simple tests and produced evidence that these non-HS. (Unfortunately, I have at the moment not much time for more testing) Would you be able to make those systems available for further testing? First, I'd perform the same test with the systems swapped, so we know more about the symmetry of the issue. After that, would like to look more into internals. Is it possible to setup SytemTap and dtrace on these systems? Thanks, either way. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing HS/SR - 1 vs 2 performance
On Tue, May 4, 2010 18:19, Simon Riggs wrote: On Tue, 2010-05-04 at 18:10 +0200, Erik Rijkers wrote: It would be interesting if anyone repeated these simple tests and produced evidence that these non-HS. (Unfortunately, I have at the moment not much time for more testing) Would you be able to make those systems available for further testing? No. sorry. First, I'd perform the same test with the systems swapped, so we know more about the symmetry of the issue. After that, would like to look more into internals. you mean systems swapped, primary and standby? primary and standby were on the same machine in these tests (even the same raid). I can eventually move the standby (the 'slow' side, as it stands) to another, quite similar machine. Not in the coming days though... Is it possible to setup SytemTap and dtrace on these systems? I did install systemtap last week. dtrace is not installed (I think. I've never used either.) Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_standby_delay considered harmful
Simon Riggs wrote: On Mon, 2010-05-03 at 22:45 -0400, Bruce Momjian wrote: As I remember, 9.0 has two behaviors: o master delays vacuum cleanup o slave delays WAL application and in 9.1 we will be adding: o slave communicates snapshots to master How would this figure into what we ultimately want in 9.1? We would still want all options, since slave communicates snapshot to master doesn't solve the problem it just moves the problem elsewhere. It's a question of which factors the user wishes to emphasise for their specific use. I understand Simon's point that the two behaviors have different benefits. However, I believe few users will be able to understand when to use which. If users can understand how to set NDISTINCT for a column, they can understand this. It's not about complexity of UI, its about solving problems. When people hit an issue, I don't want to be telling people we thought you wouldn't understand it, so we removed the parachute. They might not understand it *before* they hit a problem, so what? But users certainly will afterwards and won't say thanks if you prevent an option for them, especially for the stated reason. (My point about ndistinct: 99% of users have no idea that exists or when to use it, but it still exists as an option because it solves a known issue, just like this.) Well, this is kind of my point --- that if few people are going to need a parameter and it is going to take us to tell them to use it, it isn't a good parameter because the other 99.9% are going to stare at the parameters and not konw what it does or how it is different from other similar parameters. Adding another parameter might help 0.1% of our users, but it is going to confuse the other 99.9%. :-( -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pause/Resume feature for Hot Standby
Simon Riggs si...@2ndquadrant.com writes: There hasn't been anything more than a minor bug in weeks, so not really sure how you arrive at that the idea the code needs stabilising. Simon, if you don't think the code needs stabilizing, you need to think again. * max_standby_delay logic is broken, as per other thread. * handle_standby_sig_alarm is broken to the point of needing to be thrown away; you can NOT do that kind of thing in an interrupt handler. * RecordKnownAssignedTransactionIds is changing ShmemVariableCache-nextXid without any kind of lock (in general, I suspect all the xlog replay code needs to be revisited to see if it's skipping locks on shared data structures that are now potentially going to be examined by backends) * Use of StandbyTransactionIdIsPrepared seems awfully dubious: why are we trusting the standby's pg_twophase files more than data from the WAL log, *especially* before we have reached consistency? Not to mention that that's a horridly expensive operation (filesystem access) being invoked while holding ProcArrayLock. * Why is ExtendCLOG/ExtendSUBTRANS done in RecordKnownAssignedTransactionIds? It's inappropriate from a modularity standpoint, and it also seems completely wrong that it won't get done if standbyState STANDBY_SNAPSHOT_PENDING. nextXID manipulation there seems equally bogus not to mention unlocked. * snapshotOldestActiveXid is bogus (I complained about this already, you have not fixed it) * LogStandbySnapshot is merest fantasy: no guarantee that either the XIDs list or the locks list will be consistent with the point in WAL where it will get inserted. What's worse, locking things down enough to guarantee consistency would be horrid for performance, or maybe even deadlock-inducing. Could lose both ways: list might contain an XID whose commit/abort went to WAL before the snapshot did, or list might be missing an XID started just after snap was taken, The latter case could possibly be dealt with via nextXid filtering, but that doesn't fix the former case, and anyway we have both ends of the same problem for locks. That's just what I found in a day or so of code reading, and I haven't read anything like all of the HS patches. You need to stop thinking about adding features and start thinking about making what's in there bulletproof. If you happen to have an idle moment when you're not fixing known problems, re-read some code. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_standby_delay considered harmful
On Tue, 2010-05-04 at 13:00 -0400, Bruce Momjian wrote: Well, this is kind of my point --- that if few people are going to need a parameter and it is going to take us to tell them to use it, it isn't a good parameter because the other 99.9% are going to stare at the parameters and not konw what it does or how it is different from other similar parameters. Adding another parameter might help 0.1% of our users, but it is going to confuse the other 99.9%. :-( You've missed my point. Most users of HS will need these parameters. There is no need to understand them immediately, nor do I expect them to do so. People won't understand why they exist until they've understood the actual behaviour, received some errors and *then* they will understand them, want them and need them. Just like deadlocks, ndistinct and loads of other features we provide and support. The current behaviour of max_standby_delay is designed to favour High Availability users, not query users. I doubt that users with HA concerns are only 0.1% of our users. I've accepted that some users may not put that consideration first and so adding some minor, easy to implement additional parameters will improve the behaviour for those people. Forcing just one behaviour will be bad for many people. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pause/Resume feature for Hot Standby
On Tue, 2010-05-04 at 13:23 -0400, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: There hasn't been anything more than a minor bug in weeks, so not really sure how you arrive at that the idea the code needs stabilising. Simon, if you don't think the code needs stabilizing, you need to think again. This list is entirely new to me. I can't fix problems you haven't even raised before, can I? Why have you been saving that list?? No way are these known problems. That's just what I found in a day or so of code reading, and I haven't read anything like all of the HS patches. You need to stop thinking about adding features and start thinking about making what's in there bulletproof. If you happen to have an idle moment when you're not fixing known problems, re-read some code. Nobody is adding new features. Stop barracking me for something that's not even happening, especially if you persuade yourself you should be angry about it. I care as much about beta as anyone else. Yes, I'll go read your list. Thank you for your review. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] including PID or backend ID in relpath of temp rels
Robert Haas escribió: [smgr.c,inval.c] Do we need to call CacheInvalidSmgr for temporary relations? I think the only backend that can have an smgr reference to a temprel other than the owning backend is bgwriter, and AFAICS bgwriter will only have such a reference if it's responding to a request by the owning backend to unlink the associated files, in which case (I think) the owning backend will have no reference. Hmm, wasn't there a proposal to have the owning backend delete the files instead of asking the bgwriter to? [dbsize.c] As with relcache.c, there's a problem if we're asked for the size of a temporary relation that is not our own: we can't call relpath() without knowing the ID of the owning backend, and there's no way to acquire that information for pg_class. I guess we could just refuse to answer the question in that case, but that doesn't seem real cool. Or we could physically scan the directory for files that match a suitably constructed wildcard, I suppose. I don't very much like the wildcard idea; but I don't think it's unreasonable to refuse to provide a file size. If the owning backend has still got part of the table in local buffers, you'll get a misleading answer, so perhaps it's best to not give an answer at all. Maybe this problem could be solved if we could somehow force that backend to write down its local buffers, in which case it'd be nice to have a solution to the dbsize problem. [syncscan.c] It seems we pursue this optimization even for temprels; I can't think of why that would be useful in practice. If it's useless overhead, should we skip it? This is really independent of this project; just a side thought. Maybe recently used buffers are more likely to be in the OS page cache, so perhaps it's not good to disable it. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] including PID or backend ID in relpath of temp rels
On Tue, May 4, 2010 at 2:06 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Robert Haas escribió: Hey, thanks for writing back! I just spent the last few hours thinking about this and beating my head against the wall. [smgr.c,inval.c] Do we need to call CacheInvalidSmgr for temporary relations? I think the only backend that can have an smgr reference to a temprel other than the owning backend is bgwriter, and AFAICS bgwriter will only have such a reference if it's responding to a request by the owning backend to unlink the associated files, in which case (I think) the owning backend will have no reference. Hmm, wasn't there a proposal to have the owning backend delete the files instead of asking the bgwriter to? I did propose that upthread; it may have been proposed previously also. This might be worth doing independently of the rest of the patch (which I'm starting to fear is doomed, cue ominous soundtrack) since it would reduce the chance of orphaning data files and possibly simplify the logic also. [dbsize.c] As with relcache.c, there's a problem if we're asked for the size of a temporary relation that is not our own: we can't call relpath() without knowing the ID of the owning backend, and there's no way to acquire that information for pg_class. I guess we could just refuse to answer the question in that case, but that doesn't seem real cool. Or we could physically scan the directory for files that match a suitably constructed wildcard, I suppose. I don't very much like the wildcard idea; but I don't think it's unreasonable to refuse to provide a file size. If the owning backend has still got part of the table in local buffers, you'll get a misleading answer, so perhaps it's best to not give an answer at all. Maybe this problem could be solved if we could somehow force that backend to write down its local buffers, in which case it'd be nice to have a solution to the dbsize problem. I'm sure we could add some kind of signaling mechanism that would tell all backends to flush their local buffers, but I'm not too sure it would help this case very much, because you likely wouldn't want to wait for all the backends to complete that process before reporting results. [syncscan.c] It seems we pursue this optimization even for temprels; I can't think of why that would be useful in practice. If it's useless overhead, should we skip it? This is really independent of this project; just a side thought. Maybe recently used buffers are more likely to be in the OS page cache, so perhaps it's not good to disable it. I don't get it. If the whole relation fits in the page cache, it doesn't much matter where you start a seqscan. If it doesn't, starting where the last one ended is anti-optimal. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reg: SQL Query for Postgres 8.4.3
I am sorry for that, but I made two different installations and I was messing up with various inputs. Actually, the installed versions are below *postgresql-8.3* *Ubuntu 8.10 with 2.6.27 Kernel* *and its an 32Bit O/S* pgsql$ SELECT substring(B'0001' from 5 for -2); ERROR:invalid memory alloc request size 4244635647 I just wanted to know how severe it is and how it can effect the database to result Memory Corruption/DoS. Please help me in making the point clear. Thanks again. On Tue, May 4, 2010 at 8:45 PM, Kevin Grittner kevin.gritt...@wicourts.govwrote: Robert Haas robertmh...@gmail.com wrote: But I don't have strong feelings about it. Nor do I. Perhaps this question should be floated on -general? -Kevin
Re: [HACKERS] testing HS/SR - 1 vs 2 performance
Erik Rijkers wrote: OS: Centos 5.4 2 quadcores: Intel(R) Xeon(R) CPU X5482 @ 3.20GHz Areca 1280ML primary and standby db both on a 12 disk array (sata 7200rpm, Seagat Barracuda ES.2) To fill in from data you already mentioned upthread: 32 GB RAM CentOS release 5.4 (Final), x86_64 Linux, 2.6.18-164.el5 Thanks for the all the reporting you've done here, really helpful. Questions to make sure I'm trying to duplicate the right thing here: Is your disk array all configured as one big RAID10 volume, so essentially a 6-disk stripe with redundancy, or something else? In particular I want know whether the WAL/database/archives are split onto separate volumes or all on one big one when you were testing. Is this is on ext3 with standard mount parameters? Also, can you confirm that every test you ran only had a single pgbench worker thread (-j 1 or not specified)? That looked to be the case from the ones I saw where you posted the whole command used. It would not surprise me to find that the CPU usage profile of a standby is just different enough from the primary that it results in the pgbench program not being scheduled enough time, due to the known Linux issues in that area. Not going to assume that, of course, just one thing I want to check when trying to replicate what you've run into. I didn't see any glaring HS performance issues like you've been reporting on last time I tried performance testing in this area, just a small percentage drop. But I didn't specifically go looking for it either. With your testing rig out of service, we're going to try and replicate that on a system here. My home server is like a scaled down version of yours (single quad-core, 8GB RAM, smaller Areca controller, 5 disks instead of 12) and it's running the same CentOS version. If the problems really universal I should see it here too. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_standby_delay considered harmful
Simon, Yes, the max wait on any *one* blocker will be max_standby_delay. But if you wait for two blockers, then the total time by which the standby lags will now be 2*max_standby_delay. Add a third, fourth etc and the standby lag keeps rising. I still don't see how that works. If we're locking for applying log segments, then any query which came in after the recovery lock would, presumably, wait. So you'd have a lot of degraded query performance, but no more than max_standby_delay of waiting to apply logs. I'm more interested in your assertion that there's a lot in the replication stream which doesn't take a lock; if that's the case, then implementing any part of Tom's proposal is hopeless. * standby query delay - defined as the time that recovery will wait for a query to complete before a cancellation takes place. (We could complicate this by asking what happens when recovery is blocked twice by the same query? Would it wait twice, or does it have to track how much it has waited for each query in total so far?) Aha! Now I see the confusion. AFAIK, Tom was proposing that the pending recovery data would wait for max_standby_delay, total, then cancel *all* queries which conflicted with it. Now that we've talked this out, though, I can see that this can still result in mass cancel issues, just like the current max_standby_delay. The main advantage I can see to Tom's idea is that (presumably) it can be more discriminating about which queries it cancels. I agree that waiting on *each* query for up to # time would be a completely different behavior, and as such, should be a option for DBAs. We might make it the default option, but we wouldn't make it the only option. Speaking of which, was *your* more discriminating query cancel ever applied? Currently max_standby_delay seeks to constrain the standby lag to a particular value, as a way of providing a bounded time for failover, and also to constrain the amount of WAL that needs to be stored as the lag increases. Currently, there is no guaranteed minimum query delay given to each query. Yeah, I can just see a lot of combinational issues with this. For example, what if the user's network changes in some way to retard delivery of log segments to the point where the delivery time is longer than max_standby_delay? To say nothing about system clock synch, which isn't perfect even if you have it set up. I can see DBAs who are very focussed on HA wanting a standby-lag based control anyway, when HA is far more important than the ability to run queries on the slave. But I don't that that is the largest group; I think that far more people will want to balance the two considerations. Ultimately, as you say, we would like to have all three knobs: standby lag: max time measured from master timestamp to slave timestamp application lag: max time measured from local receipt of WAL records (via log copy or recovery connection) to their application query lag: max time any query which is blocking a recovery operation can run These three, in combination, would let us cover most potential use cases. So I think you've assessed that's where we're going in the 9.1-9.2 timeframe. However, I'd say for 9.0 that application lag is the least confusing option and the least dependant on the DBA's server room setup. So if we can only have one of these for 9.0 (and I think going out with more than one might be too complex, especially at this late date) I think that's the way to go. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_standby_delay considered harmful
On Mon, May 3, 2010 at 4:37 PM, Tom Lane t...@sss.pgh.pa.us wrote: 1. The timestamps we are reading from the log might be historical, 2. There could be clock skew between the master and slave servers. 3. There could be significant propagation delay from master to slave, So it sounds like what you're expecting is for max_standby_delay to represent not the maximum lag between server commit and standby commit but rather the maximum lag introduced by conflicts. Or perhaps maximum lag introduced relative to the lag present at startup. I think it's possible to implement either of these and it would solve all three problems above: The slave maintains a static measure of how far behind it is from the master. Every time it executes a recovery operation or waits on a conflict it adds the time it spent executing or waiting. Every time it executes a commit record it subtracts the *difference* between this commit record and the last. I assume we clip at 0 so it never goes negative which has odd effects but it seems to match what I would expect to happen. In the face of a standby recovering historical logs then it would start with a assumed delay of 0. As long as the conflicts don't slow down execution of the logs so that they run slower than the server then the measured delay would stay near 0. The only time queries would be canceled would be if the conflicts are causing problems replaying the logs. In the face of clock skew it nothing changes as long as the clocks run at the same speed. In the face of an environment where the master is idle I think this scheme has the same problems you described but I think this might be manageable. Perhaps we need more timestamps in the master's log stream aside from the commit timestamps. Or perhaps we don't care about standby delay except when reading a commit record since any other record isn't actually delayed unless its commit is delayed. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reg: SQL Query for Postgres 8.4.3
Srinivas Naik naik.sr...@gmail.com wrote: Actually, the installed versions are below *postgresql-8.3* I just wanted to know how severe it is and how it can effect the database to result Memory Corruption/DoS. Well, you're clearly *not* on 8.3.10, or you would not get the error. Perhaps you should apply the latest bug fixes? http://www.postgresql.org/support/versioning http://www.postgresql.org/docs/8.3/static/release.html As far as I know it isn't any more conducive to DoS attacks than, say, your average syntax error; however, if you're trying to keep that risk low, you should be keeping up with the minor releases anyway. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] including PID or backend ID in relpath of temp rels
Robert Haas escribió: On Tue, May 4, 2010 at 2:06 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Robert Haas escribió: Hey, thanks for writing back! I just spent the last few hours thinking about this and beating my head against the wall. :-) [smgr.c,inval.c] Do we need to call CacheInvalidSmgr for temporary relations? I think the only backend that can have an smgr reference to a temprel other than the owning backend is bgwriter, and AFAICS bgwriter will only have such a reference if it's responding to a request by the owning backend to unlink the associated files, in which case (I think) the owning backend will have no reference. Hmm, wasn't there a proposal to have the owning backend delete the files instead of asking the bgwriter to? I did propose that upthread; it may have been proposed previously also. This might be worth doing independently of the rest of the patch (which I'm starting to fear is doomed, cue ominous soundtrack) since it would reduce the chance of orphaning data files and possibly simplify the logic also. +1 for doing it separately, but hopefully that doesn't mean the rest of this patch is doomed ... [dbsize.c] As with relcache.c, there's a problem if we're asked for the size of a temporary relation that is not our own: we can't call relpath() without knowing the ID of the owning backend, and there's no way to acquire that information for pg_class. I guess we could just refuse to answer the question in that case, but that doesn't seem real cool. Or we could physically scan the directory for files that match a suitably constructed wildcard, I suppose. I don't very much like the wildcard idea; but I don't think it's unreasonable to refuse to provide a file size. If the owning backend has still got part of the table in local buffers, you'll get a misleading answer, so perhaps it's best to not give an answer at all. Maybe this problem could be solved if we could somehow force that backend to write down its local buffers, in which case it'd be nice to have a solution to the dbsize problem. I'm sure we could add some kind of signaling mechanism that would tell all backends to flush their local buffers, but I'm not too sure it would help this case very much, because you likely wouldn't want to wait for all the backends to complete that process before reporting results. Hmm, I was thinking in the pg_relation_size function -- given this new mechanism you could get an accurate size of temp tables for other backends. I wasn't thinking in the pg_database_size function, and perhaps it's better to *not* include temp tables in that report at all. [syncscan.c] It seems we pursue this optimization even for temprels; I can't think of why that would be useful in practice. If it's useless overhead, should we skip it? This is really independent of this project; just a side thought. Maybe recently used buffers are more likely to be in the OS page cache, so perhaps it's not good to disable it. I don't get it. If the whole relation fits in the page cache, it doesn't much matter where you start a seqscan. If it doesn't, starting where the last one ended is anti-optimal. Err, I was thinking that a syncscan started a bunch of pages earlier than the point where the previous scan ended, but yeah, that's a bit silly. Maybe we should just ignore syncscan in temp tables altogether, as you propose. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing HS/SR - 1 vs 2 performance
Erik Rijkers wrote: Hi Simon, In another thread you mentioned you were lacking information from me: On Tue, May 4, 2010 17:10, Simon Riggs wrote: There is no evidence that Erik's strange performance has anything to do with HS; it hasn't been seen elsewhere and he didn't respond to questions about the test setup to provide background. The profile didn't fit any software problem I can see. I'm sorry if I missed requests for things that where not already mentioned. Let me repeat: OS: Centos 5.4 2 quadcores: Intel(R) Xeon(R) CPU X5482 @ 3.20GHz Areca 1280ML primary and standby db both on a 12 disk array (sata 7200rpm, Seagat Barracuda ES.2) It goes without saying (I hope) that apart from the pgbench tests and a few ssh sessions (myself), the machine was idle. It would be interesting if anyone repeated these simple tests and produced evidence that these non-HS. (Unfortunately, I have at the moment not much time for more testing) FWIW - I'm seeing a behaviour here under pgbench -S workloads that looks kinda related. using -j 16 -c 16 -T 120 I get either 10tps and around 66 contextswitches per second or on some runs I end up with 15tps and around 1M contextswitches/s sustained. I mostly get the 100k result but once in a while I get the 150k one. And one even can anticipate the final transaction rate from watching vmstat 1... I'm not sure yet on what is causing that behaviour but that is with 9.0B1 on a Dual Quadcore Nehalem box with 16 cpu threads (8+HT) on a pure in-memory workload (scale = 20 with 48GB RAM). Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing HS/SR - 1 vs 2 performance
On Tue, May 4, 2010 20:26, Greg Smith wrote: Erik Rijkers wrote: OS: Centos 5.4 2 quadcores: Intel(R) Xeon(R) CPU X5482 @ 3.20GHz Areca 1280ML primary and standby db both on a 12 disk array (sata 7200rpm, Seagat Barracuda ES.2) To fill in from data you already mentioned upthread: 32 GB RAM CentOS release 5.4 (Final), x86_64 Linux, 2.6.18-164.el5 Thanks for the all the reporting you've done here, really helpful. Questions to make sure I'm trying to duplicate the right thing here: Is your disk array all configured as one big RAID10 volume, so essentially a 6-disk stripe with redundancy, or something else? In particular I want know whether the WAL/database/archives are split onto separate volumes or all on one big one when you were testing. Everything together: the raid is what Areca call 'raid10(1E)'. (to be honest I don't remember what that 1E exactly means - extra flexibility in the number of disks, I think). Btw, some of my emails contained the postgresql.conf of both instances. Is this is on ext3 with standard mount parameters? ext3 noatime Also, can you confirm that every test you ran only had a single pgbench worker thread (-j 1 or not specified)? That looked to be the case from the ones I saw where you posted the whole command used. It would not yes; the literal cmd: time /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/bin/pgbench -h /tmp -p 6565 -U rijkers -n -S -c 20 -T 900 -j 1 replicas To avoid wrapping in the emails I just removed '-h \tmp', -U rijkers', and 'replicas'. (I may have run the primary's pgbench binary also against the slave - don't think that should make any difference) surprise me to find that the CPU usage profile of a standby is just different enough from the primary that it results in the pgbench program not being scheduled enough time, due to the known Linux issues in that area. Not going to assume that, of course, just one thing I want to check when trying to replicate what you've run into. I didn't see any glaring HS performance issues like you've been reporting on last time I tried performance testing in this area, just a small percentage drop. But I didn't specifically go looking for it Here, it seems repeatable, but does not occur with all scales. Hm, maybe I should just dump *all* of my results on the wiki for reference. (I'll look at that later). either. With your testing rig out of service, we're going to try and replicate that on a system here. My home server is like a scaled down version of yours (single quad-core, 8GB RAM, smaller Areca controller, 5 disks instead of 12) and it's running the same CentOS version. If the problems really universal I should see it here too. Thanks, Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] including PID or backend ID in relpath of temp rels
On Tue, May 4, 2010 at 3:03 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Hmm, wasn't there a proposal to have the owning backend delete the files instead of asking the bgwriter to? I did propose that upthread; it may have been proposed previously also. This might be worth doing independently of the rest of the patch (which I'm starting to fear is doomed, cue ominous soundtrack) since it would reduce the chance of orphaning data files and possibly simplify the logic also. +1 for doing it separately, but hopefully that doesn't mean the rest of this patch is doomed ... I wonder if it would be possible to reject access to temporary relations at a higher level. Right now, if you create a temporary relation in one session, you can issue a SELECT statement against it in another relation, and get back 0 rows. If you then insert data into it and select against it again, you'll get an error saying that you can't access temporary tables of other sessions. If you try to truncate somebody else's temporary relation, it fails; but if you try to drop it, it works. In fact, you can even run ALTER TABLE ... ADD COLUMN on somebody else's temp table, as long as you don't do anything that requires a rewrite. CLUSTER fails; VACUUM and VACUUM FULL both appear to work but apparently actually don't do anything under the hood, so that database-wide vacuums don't barf. The whole thing seems pretty leaky. It would be nice if we could find a small set of control points where we basically reject ALL access to somebody else's temp relations, period. One possible thing we might do (bearing in mind that we might need to wall off access at multiple levels) would be to forbid creating a relcache entry for a non-local temprel. That would, in turn, forbid doing pretty much anything to such a relation, although I'm not sure what else would get broken in the process. But it would eliminate, for example, all the checks for RELATION_IS_OTHER_TEMP, since that Just Couldn't Happen. It would would eliminate the need to install specific handling for this case in dbsize.c - we'd just automatically croak. And it's also probably necessary to do this anyhow if we want to ever eliminate those CacheInvalidSmgr() calls for temp rels, because if I can drop your temprel, that implies I can smgropen() it. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_standby_delay considered harmful
On Tue, 2010-05-04 at 11:27 -0700, Josh Berkus wrote: I still don't see how that works. ... The good news is we agree by the time we get to the bottom... ;-) I'm more interested in your assertion that there's a lot in the replication stream which doesn't take a lock; if that's the case, then implementing any part of Tom's proposal is hopeless. (No, still valid, the idea is generic) * standby query delay - defined as the time that recovery will wait for a query to complete before a cancellation takes place. (We could complicate this by asking what happens when recovery is blocked twice by the same query? Would it wait twice, or does it have to track how much it has waited for each query in total so far?) Aha! Now I see the confusion. BTW, Tom's proposal was approx half a sentence long so that is the source of any confusion. AFAIK, Tom was proposing that the pending recovery data would wait for max_standby_delay, total, then cancel *all* queries which conflicted with it. Now that we've talked this out, though, I can see that this can still result in mass cancel issues, just like the current max_standby_delay. The main advantage I can see to Tom's idea is that (presumably) it can be more discriminating about which queries it cancels. As I said to Stephen, this is exactly how it works already and wasn't what was proposed. I agree that waiting on *each* query for up to # time would be a completely different behavior, and as such, should be a option for DBAs. We might make it the default option, but we wouldn't make it the only option. Glad to hear you say that. Speaking of which, was *your* more discriminating query cancel ever applied? Currently max_standby_delay seeks to constrain the standby lag to a particular value, as a way of providing a bounded time for failover, and also to constrain the amount of WAL that needs to be stored as the lag increases. Currently, there is no guaranteed minimum query delay given to each query. Yeah, I can just see a lot of combinational issues with this. For example, what if the user's network changes in some way to retard delivery of log segments to the point where the delivery time is longer than max_standby_delay? To say nothing about system clock synch, which isn't perfect even if you have it set up. I can see DBAs who are very focussed on HA wanting a standby-lag based control anyway, when HA is far more important than the ability to run queries on the slave. But I don't that that is the largest group; I think that far more people will want to balance the two considerations. Ultimately, as you say, we would like to have all three knobs: standby lag: max time measured from master timestamp to slave timestamp application lag: max time measured from local receipt of WAL records (via log copy or recovery connection) to their application query lag: max time any query which is blocking a recovery operation can run These three, in combination, would let us cover most potential use cases. So I think you've assessed that's where we're going in the 9.1-9.2 timeframe. However, I'd say for 9.0 that application lag is the least confusing option and the least dependant on the DBA's server room setup. So if we can only have one of these for 9.0 (and I think going out with more than one might be too complex, especially at this late date) I think that's the way to go. Before you posted, I submitted a patch on this thread to redefine max_standby_delay to depend upon the application lag, as you've newly defined it here - though obviously I didn't call it that. That solves Tom's 3 issues. max_apply_delay might be technically more accurate term, though isn't sufficiently better parameter name as to be worth the change. That patch doesn't implement his proposal, but that can be done as well as (though IMHO not instead of). Given that two people have already misunderstood what Tom proposed, and various people are saying we need only one, I'm getting less inclined to have that at all. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pause/Resume feature for Hot Standby
On Tue, 2010-05-04 at 13:23 -0400, Tom Lane wrote: * max_standby_delay logic is broken, as per other thread. Proposed fix submitted, * handle_standby_sig_alarm is broken to the point of needing to be thrown away; you can NOT do that kind of thing in an interrupt handler. This was modelled very closely on handle_sig_alarm() and was reviewed by other hackers. I'm not great on that, as you know, so if you can explain what it is I can't do, and how that differs from handle_sig_alarm running the deadlock detector in the same way, then I'll work on it some more. * RecordKnownAssignedTransactionIds is changing ShmemVariableCache-nextXid without any kind of lock (in general, I suspect all the xlog replay code needs to be revisited to see if it's skipping locks on shared data structures that are now potentially going to be examined by backends) There is only one writer and this a single integer value, so any reads are atomic. This is not being used as a memory barrier, so our earlier discussion about weak-memory ordering doesn't apply. The only other reader is bgwriter. I'm happy to add additional locking if you think its really needed. * Use of StandbyTransactionIdIsPrepared seems awfully dubious: why are we trusting the standby's pg_twophase files more than data from the WAL log, *especially* before we have reached consistency? StandbyTransactionIdIsPrepared() is only called in two places, both of which relate to pruning the KnownAssignedXids array. Pruning only occurs when the WAL log specifically does not contain the information we need, which only occurs when those hypothetical FATAL errors come along. In that case we rely upon the pg_twophase files. Both of those call points happen in ProcArrayApplyRecoveryInfo() which does get called before we are consistent, though we can change that if you see a problem. At this point, I don't see an issue. Not to mention that that's a horridly expensive operation (filesystem access) being invoked while holding ProcArrayLock. I just optimised that in the recent patch you committed. It isn't a high cost item any longer now that we are able to prune KnownAssignedXids() from the left, since pruning will typically not test more than one xid. * Why is ExtendCLOG/ExtendSUBTRANS done in RecordKnownAssignedTransactionIds? Heikki placed them there, so I left that coding, since it does work. RecordKnown..() is supposed to be the logical equivalent of assigning an xid, so it seemed logical. Happy to move wherever you see fit. It's inappropriate from a modularity standpoint, and it also seems completely wrong that it won't get done if standbyState STANDBY_SNAPSHOT_PENDING. Yes, that looks like a logic error and will be fixed. However, its trapped later by clog code to zero new blocks, so in practice there is no bug. nextXID manipulation there seems equally bogus not to mention unlocked. Traced the code, looks fine to me. Yes, unlocked. * snapshotOldestActiveXid is bogus (I complained about this already, you have not fixed it) I understood you were fixing it, as raised during your recent review of the KAX patch. Will fix. * LogStandbySnapshot is merest fantasy: no guarantee that either the XIDs list or the locks list will be consistent with the point in WAL where it will get inserted. What's worse, locking things down enough to guarantee consistency would be horrid for performance, or maybe even deadlock-inducing. Could lose both ways: list might contain an XID whose commit/abort went to WAL before the snapshot did, or list might be missing an XID started just after snap was taken, The latter case could possibly be dealt with via nextXid filtering, but that doesn't fix the former case, and anyway we have both ends of the same problem for locks. That was recoded by Heikki and I left it as written, though I checked it, considered it correct and take responsibility for it. Will review further and report back. Thanks for the review. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] testing HS/SR - 1 vs 2 performance
On Tue, 2010-05-04 at 21:34 +0200, Stefan Kaltenbrunner wrote: FWIW - I'm seeing a behaviour here under pgbench -S workloads that looks kinda related. using -j 16 -c 16 -T 120 I get either 10tps and around 66 contextswitches per second or on some runs I end up with 15tps and around 1M contextswitches/s sustained. I mostly get the 100k result but once in a while I get the 150k one. And one even can anticipate the final transaction rate from watching vmstat 1... I'm not sure yet on what is causing that behaviour but that is with 9.0B1 on a Dual Quadcore Nehalem box with 16 cpu threads (8+HT) on a pure in-memory workload (scale = 20 with 48GB RAM). Educated guess at a fix: please test this patch. It's good for performance testing, but doesn't work correctly at failover, which would obviously be addressed prior to any commit. -- Simon Riggs www.2ndQuadrant.com *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *** *** 158,163 static XLogRecPtr LastRec; --- 158,164 * known, need to check the shared state. */ static bool LocalRecoveryInProgress = true; + static bool KnownRecoveryState = false; /* * Local state for XLogInsertAllowed(): *** *** 6524,6537 CheckRecoveryConsistency(void) bool RecoveryInProgress(void) { ! /* ! * We check shared state each time only until we leave recovery mode. We ! * can't re-enter recovery, so there's no need to keep checking after the ! * shared variable has once been seen false. ! */ ! if (!LocalRecoveryInProgress) ! return false; ! else { /* use volatile pointer to prevent code rearrangement */ volatile XLogCtlData *xlogctl = XLogCtl; --- 6525,6533 bool RecoveryInProgress(void) { ! if (KnownRecoveryState) ! return LocalRecoveryInProgress; ! else { /* use volatile pointer to prevent code rearrangement */ volatile XLogCtlData *xlogctl = XLogCtl; *** *** 6541,6546 RecoveryInProgress(void) --- 6537,6544 LocalRecoveryInProgress = xlogctl-SharedRecoveryInProgress; SpinLockRelease(xlogctl-info_lck); + KnownRecoveryState = true; + /* * Initialize TimeLineID and RedoRecPtr when we discover that recovery * is finished. InitPostgres() relies upon this behaviour to ensure -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] GUCs that need restart
There are quite a few GUC parameters that need restart. Is there a way we can avoid some of them needing restart? I am specifically looking at archive_mode and the new wal_level. From my limited understanding, these parameters need restart because in a running cluster we cannot safely change these GUCs and make sure that other/running backends will pick them up immediately so that they start behaving differently as required by the GUC. Are there other reasons to have them set to PGC_POSTMASTER? If the above is correct and the only reason, then can we have them assigned to a new PGC_ mode and have the SET commands somehow wait for all backends to pickup the value before returning? (specifically, wait for any running backends to exit the transaction). I know there are genuine reasons behind having them depend on restart, but am just trying to eliminate that, at least for some parameters which a DBA might want to change on the fly, being fully aware of the consequences. Regards, -- gurjeet.singh @ EnterpriseDB - The Enterprise Postgres Company http://www.enterprisedb.com singh.gurj...@{ gmail | yahoo }.com Twitter/Skype: singh_gurjeet Mail sent from my BlackLaptop device
Re: [HACKERS] max_standby_delay considered harmful
AFAIK, Tom was proposing that the pending recovery data would wait for max_standby_delay, total, then cancel *all* queries which conflicted with it. Now that we've talked this out, though, I can see that this can still result in mass cancel issues, just like the current max_standby_delay. The main advantage I can see to Tom's idea is that (presumably) it can be more discriminating about which queries it cancels. As I said to Stephen, this is exactly how it works already and wasn't what was proposed. Well, it's not exactly how it works, as I understand it ... doesn't the timer running out on the slave currently cancel *all* running queries with old snapshots, regardless of what relations they touch? Before you posted, I submitted a patch on this thread to redefine max_standby_delay to depend upon the application lag, as you've newly defined it here - though obviously I didn't call it that. That solves Tom's 3 issues. max_apply_delay might be technically more accurate term, though isn't sufficiently better parameter name as to be worth the change. Yeah, that looks less complicated for admins. Thanks. That patch doesn't implement his proposal, but that can be done as well as (though IMHO not instead of). Given that two people have already misunderstood what Tom proposed, and various people are saying we need only one, I'm getting less inclined to have that at all. Given your clarification on the whole set of behaviors, I'm highly dubious about the idea of implementing Tom's proposal when we're already Beta 1. It seems like a 9.1 thing. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GUCs that need restart
Gurjeet Singh singh.gurj...@gmail.com wrote: There are quite a few GUC parameters that need restart. Is there a way we can avoid some of them needing restart? I am specifically looking at archive_mode and the new wal_level. I'll second this on a don't know if it's practicable, but it would be nice if... basis. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] including PID or backend ID in relpath of temp rels
Robert Haas robertmh...@gmail.com writes: One possible thing we might do (bearing in mind that we might need to wall off access at multiple levels) would be to forbid creating a relcache entry for a non-local temprel. That would, in turn, forbid doing pretty much anything to such a relation, although I'm not sure what else would get broken in the process. Dropping temprels left behind by a crashed backend would get broken by that; which is a deal-breaker, because we have to be able to clean those up. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] including PID or backend ID in relpath of temp rels
Alvaro Herrera alvhe...@commandprompt.com writes: I don't very much like the wildcard idea; but I don't think it's unreasonable to refuse to provide a file size. If the owning backend has still got part of the table in local buffers, you'll get a misleading answer, so perhaps it's best to not give an answer at all. FWIW, that's not the case, anymore than it is for blocks in shared buffer cache for regular rels. smgrextend() results in an observable extension of the file EOF immediately, whether or not you can see up-to-date data for those pages. Now people have often complained about the extra I/O involved in that, and it'd be nice to have a solution, but it's not clear to me that fixing it would be harder for temprels than regular rels. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] including PID or backend ID in relpath of temp rels
On Tue, May 4, 2010 at 5:12 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: One possible thing we might do (bearing in mind that we might need to wall off access at multiple levels) would be to forbid creating a relcache entry for a non-local temprel. That would, in turn, forbid doing pretty much anything to such a relation, although I'm not sure what else would get broken in the process. Dropping temprels left behind by a crashed backend would get broken by that; which is a deal-breaker, because we have to be able to clean those up. Phooey. It was such a good idea in my head. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_standby_delay considered harmful
Josh Berkus j...@agliodbs.com writes: Given your clarification on the whole set of behaviors, I'm highly dubious about the idea of implementing Tom's proposal when we're already Beta 1. It seems like a 9.1 thing. I think you missed the point: do nothing is not a viable option. I was proposing something that seemed simple enough to be safe to drop into 9.0 at this point. I'm less convinced that what Simon is proposing is safe enough. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_standby_delay considered harmful
On Tue, 2010-05-04 at 18:53 -0400, Tom Lane wrote: I think you missed the point: do nothing is not a viable option. I was proposing something that seemed simple enough to be safe to drop into 9.0 at this point. I've posted a patch that meets your stated objections. If you could review that, this could be done in an hour. There are other ways, but you'll need to explain a proposal in enough detail that we're clear what you actually mean. I'm less convinced that what Simon is proposing is safe enough. Which proposal? -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_standby_delay considered harmful
Tom Lane wrote: 1. The timestamps we are reading from the log might be historical, if we are replaying from archive rather than reading a live SR stream. In the current implementation that means zero grace period for standby queries. Now if your only interest is catching up as fast as possible, that could be a sane behavior, but this is clearly not the only possible interest --- in fact, if that's all you care about, why did you allow standby queries at all? If the standby is not current, you may not want people to execute queries against it. In some situations, returning results against obsolete data is worse than not letting the query execute at all. As I see it, the current max_standby_delay implementation includes the expectation that the results you are getting are no more than max_standby_delay behind the master, presuming that new data is still coming in. If the standby has really fallen further behind than that, there are situations where you don't want it doing anything but catching up until that is no longer the case, and you especially don't want it returning stale query data. The fact that tuning in that direction could mean the standby never actually executes any queries is something you need to monitor for--it suggests the standby isn't powerful/well connected to the master enough to keep up--but that's not necessarily the wrong behavior. Saying I only want the standby to execute queries if it's not too far behind the master is the answer to why did you allow standby queries at all? when tuning for that use case. 2. There could be clock skew between the master and slave servers. Not the database's problem to worry about. Document that time should be carefully sync'd and move on. I'll add that. 3. There could be significant propagation delay from master to slave, if the WAL stream is being transmitted with pg_standby or some such. Again this results in cutting into the standby queries' grace period, for no defensible reason. Then people should adjust their max_standby_delay upwards to account for that. For high availability purposes, it's vital that the delay number be referenced to the commit records on the master. If lag is eating a portion of that, again it's something people should be monitoring for, but not something we can correct. The whole idea here is that max_standby_delay is an upper bound on how stale the data on the standby can be, and whether or not lag is a component to that doesn't impact how the database is being asked to act. In addition to these fundamental problems there's a fatal implementation problem: the actual comparison is not to the master's current clock reading, but to the latest commit, abort, or checkpoint timestamp read from the WAL. Right; this has been documented for months at http://wiki.postgresql.org/wiki/Hot_Standby_TODO and on the list before that, i.e. If there's little activity in the master, that can lead to surprising results. The suggested long-term fix has been adding keepalive timestamps into SR, which seems to get reinvented every time somebody plays with this for a bit. The HS documentation improvements I'm working on will suggest that you make sure this doesn't happen, that people have some sort of keepalive WAL-generating activity on the master regularly, if they expect max_standby_delay to work reasonably in the face of an idle master. It's not ideal, but it's straightforward to work around in user space. I'm inclined to think that we should throw away all this logic and just have the slave cancel competing queries if the replay process waits more than max_standby_delay seconds to acquire a lock. This is simple, understandable, and behaves the same whether we're reading live data or not. I don't consider something that allows queries to execute when not playing recent live data is necessarily a step forward, from the perspective of implementations preferring high-availability. It's reasonable for some people to request that the last thing a standby that's not current (max_standby_delay behind the master, based on the last thing received) should be doing is answering any queries, when it doesn't have current data and it should be working on catchup instead. Discussion here obviously has wandered past your fundamental objections here and onto implementation trivia, but I didn't think the difference between what you expected and what's actually committed already was properly addressed before doing that. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_standby_delay considered harmful
Greg Smith g...@2ndquadrant.com writes: If the standby is not current, you may not want people to execute queries against it. In some situations, returning results against obsolete data is worse than not letting the query execute at all. As I see it, the current max_standby_delay implementation includes the expectation that the results you are getting are no more than max_standby_delay behind the master, presuming that new data is still coming in. If the standby has really fallen further behind than that, there are situations where you don't want it doing anything but catching up until that is no longer the case, and you especially don't want it returning stale query data. That is very possibly a useful thing to be able to specify, but the current implementation has *nothing whatsoever* to do with making such a guarantee. It will only kill queries that are creating a lock conflict. I would even argue that it's a bad thing to have a parameter that looks like it might do that, when it doesn't. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_standby_delay considered harmful
On 5/4/10 4:26 PM, Greg Smith wrote: Not the database's problem to worry about. Document that time should be carefully sync'd and move on. I'll add that. Releasing a hot standby which *only* works for users with an operational ntp implementation is highly unrealistic. Having built-in replication in PostgreSQL was supposed to give the *majority* of users a *simple* option for 2-server failover, not cater only to the high end. Every administrative requirement we add to HS/SR eliminates another set of potential users, as well as adding another set of potential failure conditions which need to be monitored. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Need to contact driver authors about change in index naming behavior ...
Hackers, Driver maintainers, The 9.0 don't rename index cols behavior has already broken JDBC. We need to get in touch with other driver authors to see if they are affected by this, and to let them know that they'll need a new driver release for 9.0, if so. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] LogStandbySnapshot (was another thread)
On Tue, 2010-05-04 at 13:23 -0400, Tom Lane wrote: * LogStandbySnapshot is merest fantasy: no guarantee that either the XIDs list or the locks list will be consistent with the point in WAL where it will get inserted. What's worse, locking things down enough to guarantee consistency would be horrid for performance, or maybe even deadlock-inducing. Could lose both ways: list might contain an XID whose commit/abort went to WAL before the snapshot did, or list might be missing an XID started just after snap was taken, The latter case could possibly be dealt with via nextXid filtering, but that doesn't fix the former case, and anyway we have both ends of the same problem for locks. This was the only serious complaint on your list, so lets address it. Clearly we don't want to lock everything down, for all the reasons you say. That creates a gap between when data is derived and when data logged to WAL. LogStandbySnapshot() occurs during online checkpoints on or after the logical checkpoint location and before the physical checkpoint location. We start recovery from a checkpoint, so we have a starting point in WAL for our processing. The time sequence on the primary of these related events is Logical Checkpoint location newxids/commits/locks Before1 AccessExclusiveLocks derived newxids/commits/locks Before2 AccessExclusiveLocks WAL record inserted newxids/commits/locks After1 RunningXact derived newxids/commits/locks After2 RunningXact WAL record inserted though when we read them back from WAL, they will be in this order, and we cannot tell the difference between events at Before 1 2 or After 1 2. Logical Checkpoint location = STANDBY_INITIALIZED newxids/commits/locks Before1 newxids/commits/locks Before2 AccessExclusiveLocks WAL record newxids/commits/locks After1 newxids/commits/locks After2 RunningXact WAL record = STANDBY_SNAPSHOT_READY We're looking for a consistent point. We don't know what the exact time-synchronised point is on master, so we have to use an exact point in WAL and work from there. We need to understand that the serialization of events in the log can be slightly different to how they occurred on the primary, but that doesn't change anything important. So to get a set of xids + locks that are consistent at the moment the RunningXact WAL record is read we need to 1. Begin processing incoming changes from the time we are STANDBY_INITIALIZED, though forgive any errors for removals of missing items until we hit STANDBY_SNAPSHOT_READY a) locks - we ignore missing locks in StandbyReleaseLocks() b) xids - we ignore missing xids in KnownAssignedXidsRemove() 2. Any transaction commits/aborts from the time we are STANDBY_INITIALIZED, through to STANDBY_SNAPSHOT_READY need to be saved, so that we can remove them again from the snapshot state. That is because events might otherwise exist in the standby that will never be removed from snapshot. We do this by simple test whether the related xid has already completed. a) locks - we ignore locks for already completed xids in StandbyAcquireAccessExclusiveLock() b) xids - we ignore already completed xids in ProcArrayApplyRecoveryInfo() We currently do all of the above. So it looks correct to me. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_standby_delay considered harmful
On Tue, 2010-05-04 at 16:34 -0700, Josh Berkus wrote: On 5/4/10 4:26 PM, Greg Smith wrote: Not the database's problem to worry about. Document that time should be carefully sync'd and move on. I'll add that. Releasing a hot standby which *only* works for users with an operational ntp implementation is highly unrealistic. Having built-in replication in PostgreSQL was supposed to give the *majority* of users a *simple* option for 2-server failover, not cater only to the high end. Every administrative requirement we add to HS/SR eliminates another set of potential users, as well as adding another set of potential failure conditions which need to be monitored. +1 Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_standby_delay considered harmful
Josh Berkus wrote: Having built-in replication in PostgreSQL was supposed to give the *majority* of users a *simple* option for 2-server failover, not cater only to the high end. If that's your criteria, 9.0 has already failed that goal. Just the fact that you have to make your own base backup and manage that whole area alone excludes simple as a goal anyone can claim 9.0 meets with a straight face, long before you get to the mechanics of how HS handles query cancellation. The new replication oriented features are functional, but neither are close to simple yet. Based on the complication level of replication in other database products, I wouldn't put money on that even being possible. You can make a simpler path the default one, but the minute you want to support more than one use case the complexity involved in setting up replication explodes. Anyway, I have no idea where the idea that recommending time synchronization is a somehow a high end requirement, given that every OS I'm aware of makes that trivial nowadays. Slave servers that drift too far away from the master time are going to cause all sorts of problems for user apps too. Any app that gauges how long ago something happened by comparing a database timestamp with now() is going to give misleading results for example, and I know I see those all the time. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_standby_delay considered harmful
Greg Smith g...@2ndquadrant.com writes: Anyway, I have no idea where the idea that recommending time synchronization is a somehow a high end requirement, Considering that clock skew was only one of several scenarios in which the max_standby_delay code misbehaves, it's not that important whether you consider it highly probable or not. The code still needs a redesign, and we may as well eliminate the assumption of tight synchronization while we are at it. There's no really good reason to have that requirement in there. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_standby_delay considered harmful
Releasing a hot standby which *only* works for users with an operational ntp implementation is highly unrealistic. Having built-in replication in PostgreSQL was supposed to give the *majority* of users a *simple* option for 2-server failover, not cater only to the high end. Every administrative requirement we add to HS/SR eliminates another set of potential users, as well as adding another set of potential failure conditions which need to be monitored. To be completely practical, I'm saying that we should apply test Simon's latest patch moving the delay calculation to be application lag instead of standby lag. I'm also suggesting that we should have a standby lag option for 9.1 (as well as, probably, a wait forever option ala Tom's suggestion). -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reg: SQL Query for Postgres 8.4.3
On 05/05/10 06:24, Srinivas Naik wrote: I am sorry for that, but I made two different installations and I was messing up with various inputs. Actually, the installed versions are below *postgresql-8.3* *Ubuntu 8.10 with 2.6.27 Kernel* *and its an 32Bit O/S* pgsql$ SELECT substring(B'0001' from 5 for -2); ERROR:invalid memory alloc request size 4244635647 Please log into postgres do: SELECT version(); (and Robert suggested) and show us the output - as we need to know the 3rd number e.g 8.3.x in the postgres version to help you any more. regards Mark
Re: [HACKERS] Reg: SQL Query for Postgres 8.4.3
On 05/05/10 13:15, Mark Kirkwood wrote: Please log into postgres do: SELECT version(); (and Robert suggested) Should read *as* Robert suggested - sorry. Also you could do this from the os: $ aptitude show postgresql-8.3* *which will display more detail for the version. Cheers Mark * *
Re: [HACKERS] Need to contact driver authors about change in index naming behavior ...
On Tue, May 04, 2010 at 04:40:22PM -0700, Josh Berkus wrote: Hackers, Driver maintainers, The 9.0 don't rename index cols behavior has already broken JDBC. We need to get in touch with other driver authors to see if they are affected by this, and to let them know that they'll need a new driver release for 9.0, if so. Looks at first glance like it still works for PL/Perl. I created a table t with single PK text column id, renamed it to idiotic, and pointed the attached program at it. Git master (92b93d85a49eb8d55060961c0d53e085ed92) is running on port 2225, and I'm connecting as my shell user. Here's the SQL I ran before the script. CREATE table simple(id SERIAL PRIMARY KEY); ALTER TABLE simple RENAME COLUMN id TO idiotic; CREATE table complexer(a text, b text, primary key(a,b)); ALTER TABLE complexer RENAME a to alpha, b to bravo; ALTER TABLE complexer RENAME a to alpha; ALTER TABLE complexer RENAME b to bravo; Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate #!/usr/bin/perl use strict; use warnings; use DBI; my @tables = qw(simple complexer); my $dbh=DBI-connect(dbi:Pg:;port=2225,,) or die Couldn't connect.; foreach my $table (@tables) { my @key_column_names=$dbh-primary_key(undef, public,$table); print PK column(s): , join(, , @key_column_names), .\n; my $sth=$dbh-primary_key_info( undef, public, $table, {pg_onerow=2}); if (defined $sth) { my $pk = $sth-fetchall_arrayref()-[0]; print Table $pk-[2] has a primary key on these columns:\n; for (my $x=0; defined $pk-[3][$x]; $x++) { print Column: $pk-[3][$x] (data type: $pk-[6][$x])\n; } } } $dbh-disconnect(); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Need to contact driver authors about change in index naming behavior ...
David Fetter wrote: On Tue, May 04, 2010 at 04:40:22PM -0700, Josh Berkus wrote: Hackers, Driver maintainers, The 9.0 don't rename index cols behavior has already broken JDBC. We need to get in touch with other driver authors to see if they are affected by this, and to let them know that they'll need a new driver release for 9.0, if so. Looks at first glance like it still works for PL/Perl. Er, ITYM Perl/DBD::Pg. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Need to contact driver authors about change in index naming behavior ...
On Tue, May 04, 2010 at 10:00:52PM -0400, Andrew Dunstan wrote: David Fetter wrote: On Tue, May 04, 2010 at 04:40:22PM -0700, Josh Berkus wrote: Hackers, Driver maintainers, The 9.0 don't rename index cols behavior has already broken JDBC. We need to get in touch with other driver authors to see if they are affected by this, and to let them know that they'll need a new driver release for 9.0, if so. Looks at first glance like it still works for PL/Perl. Er, ITYM Perl/DBD::Pg. You're right, of course. Oops. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_standby_delay considered harmful
Tom Lane wrote: Greg Smith g...@2ndquadrant.com writes: Anyway, I have no idea where the idea that recommending time synchronization is a somehow a high end requirement, Considering that clock skew was only one of several scenarios in which the max_standby_delay code misbehaves, it's not that important whether you consider it highly probable or not. The code still needs a redesign, and we may as well eliminate the assumption of tight synchronization while we are at it. There's no really good reason to have that requirement in there. Should I be concerned that we are redesigning HS features at this stage in the release? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_standby_delay considered harmful
On Tue, May 4, 2010 at 11:06 PM, Bruce Momjian br...@momjian.us wrote: Should I be concerned that we are redesigning HS features at this stage in the release? Yep. You can decide whether you want to be concerned by the redesign itself, or by the concerns about the underlying code that are motivating the redesigns, but yes, you should definitely be concerned. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] including PID or backend ID in relpath of temp rels
On Tue, Apr 27, 2010 at 9:59 PM, Robert Haas robertmh...@gmail.com wrote: [storage.c,xact.c,twophase.c] smgrGetPendingDeletes returns via an out parameter (its second argument) a list of RelFileNodes pending delete, which we then write to WAL or to the two-phase state file. It appears that we are playing a little bit fast and loose with this. I think that the two-phase code path is solid because we prohibit PREPARE TRANSACTION if the transaction has referenced any temporary tables, so when we read the two-phase state file it's safe to assume that all the tables mentioned are non-temporary. But the ordinary one-phase commit writes permanent and temporary relfilenodes to WAL without distinction, and then, in xl_redo_commit() and xl_redo_abort(), does this: XLogDropRelation(xlrec-xnodes[i], fork); smgrdounlink(srel, fork, false, true); The third argument to smgrdounlink() is isTemp, which we're here passing as false, but might really be true. I don't think it technically matters at present because the only effect of that parameter right now is that we pass it through to DropRelFileNodeBuffers(), which will drop shared buffers rather than local buffers as a result of the incorrect setting. But that won't matter because the WAL replay process shouldn't have any local buffers anyway, since temp relations are not otherwise WAL-logged. For the same reason, I don't think the call to XLogDropRelation() is an issue because its only purpose is to remove entries from invalid_page_tab, and there won't be any temporary pages in there anyway. Of course if we're going to do $SUBJECT this will need to be changed anyway, but assuming the above analysis is correct I think the existing coding at least deserves a comment... then again, maybe I'm all mixed up? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_standby_delay considered harmful
Robert Haas wrote: On Tue, May 4, 2010 at 11:06 PM, Bruce Momjian br...@momjian.us wrote: Should I be concerned that we are redesigning HS features at this stage in the release? Yep. You can decide whether you want to be concerned by the redesign itself, or by the concerns about the underlying code that are motivating the redesigns, but yes, you should definitely be concerned. Our process is shot to pieces. But then, we knew that, didn't we ;-) cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_standby_delay considered harmful
On Wed, May 5, 2010 at 5:01 AM, Andrew Dunstan and...@dunslane.net wrote: Our process is shot to pieces. But then, we knew that, didn't we ;-) Franky I think these kinds of usability questions are things that we'll never have great success getting feedback on without users banging on the system. There are solutions but none of them are perfect. We a) either let releases go out with functionality that nobody's sure users will be happy with and promise that the next release will polish it, b) commit big functionality changes only at the beginning of the release cycle and live with a two-release cycle latency for such changes, or c) make last-minute changes during betas. So far we've seen a combination of (a) and (c). I think (a) has worked out well, sure we have some non-ideal behaviour in new features but people forgive and forget when the later releases polish it. I've always advocated for (b) though, I think we've been lucky with the cases where we've needed last minute adjustments that they've worked better than we had a right to expect. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] buildfarm building all live branches from git
On Mon, May 3, 2010 at 14:04, Andrew Dunstan and...@dunslane.net wrote: [ Awesome work getting buildfarm support for git ] Note, this is running from my test git repo, not the community's repo. BTW +1 for gitting (heh, git puns are fun) a good git repo published. Ive given up trying to trust it for back branches and always either go to release tarballs or cvs. Sadly, that means its change links will be broken - I'm not exactly sure what gets hashed to provide a commit ID in git, but the IDs don't match between these two repos. Yeah, git basically hashes *everything* including the previous commits. So if one commit is different in the repo all the commits after that will have a different hash :-( -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers