Re: [HACKERS] getting rid of freezing
On 05/24/2013 07:00 PM, Robert Haas wrote: On Fri, May 24, 2013 at 11:29 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, May 24, 2013 at 10:53 AM, Andres Freund and...@2ndquadrant.com wrote: [all-visible cannot restore hint bits without FPI because of torn pages] I haven't yet thought about this sufficiently yet. I think we might have a chance of working around this, let me ponder a bit. Yeah. I too feel like there might be a solution. But I don't know have something specific in mind, yet anyway. One thought I had is that it might be beneficial to freeze when a page ceases to be all-visible, rather than when it becomes all-visible. That what I aimed to describe in my mail earlier, but your description is much clearer :) Any operation that makes the page not-all-visible is going to emit an FPI anyway, so we don't have to worry about torn pages in that case. Under such a scheme, we'd have to enforce the rule that xmin and xmax are ignored for any page that is all-visible; Agreed. We already relay on all-visible pages enough that we can trust it to be correct. Making that universal rule should not add any risks . The rule page all-visible == assume all tuples frozen would also enable VACUUM FREEZE to only work only on the non-all-visible pages . and when a page ceases to be all-visible, we have to go back and really freeze the pre-existing tuples. We can do this unconditionally, or in milder case use vacuum_freeze_min_age if we want to retain xids for forensic purposes. I think we might be able to use the existing all_visible_cleared/new_all_visible_cleared flags to trigger this behavior, without adding anything new to WAL at all. This seems to be easiest -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move unused buffers to freelist
On Friday, May 24, 2013 8:22 PM Jim Nasby wrote: On 5/14/13 8:42 AM, Amit Kapila wrote: In the attached patch, bgwriter/checkpointer moves unused (usage_count =0 refcount = 0) buffer’s to end of freelist. I have implemented a new API StrategyMoveBufferToFreeListEnd() to move buffer’s to end of freelist. Instead of a separate function, would it be better to add an argument to StrategyFreeBuffer? Yes, it could be done with a parameter which will decide whether to put buffer at head or tail in freelist. However currently the main focus is to check in which cases this optimization can give benefit. Robert had ran tests for quite a number of cases where it doesn't show any significant gain. I am also trying with various configurations to see if it gives any benefit. Robert has given some suggestions to change the way currently new function is getting called, I will try it and update the results of same. I am not very sure that default pgbench is a good test scenario to test this optimization. If you have any suggestions for tests where it can show benefit, that would be a great input. ISTM this is similar to the other strategy stuff in the buffer manager, so perhaps it should mirror that... With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind, a tool for resynchronizing an old master after failover
On Thursday, May 23, 2013 4:40 PM Heikki Linnakangas wrote: Hi, I've been hacking on a tool to allow resynchronizing an old master server after failover. The need to do a full backup/restore has been a common complaint ever since we've had streaming replication. I saw on the wiki that this was discussed in the dev meeting; too bad I couldn't make it. In a nutshell, the idea is to do copy everything that has changed between the cluster, like rsync does, but instead of reading through all files, use the WAL to determine what has changed. Here's a somewhat more detailed explanation, from the README: This is really a nice idea and an important requirement from many users. Does this tool handle all kind of operations user would have performaed after forking of new cluster or it would mandate that user should not have performed certain kind of operations on old cluster after new cluster forked off? Theory of operation --- The basic idea is to copy everything from the new cluster to old, except for the blocks that we know to be the same. 1. Scan the WAL log of the old cluster, starting from the point where the new cluster's timeline history forked off from the old cluster. For each WAL record, make a note of the data blocks that are touched. This yields a list of all the data blocks that were changed in the old cluster, after the new cluster forked off. a. How about if after forking off new cluster, a new relation gets created in old cluster, then it might not find the blocks of same in new cluster, if new cluster also got the same name relation as old but with different data, it might get error when it start to replay WAL of new master as mentioned in point-4. b. How about if after forking off new cluster, an update occurs such that it has to put new row in new heap page, now in WAL it will have mention of old and new row blocks (blk-1 and blk-2). It might be the case new cluster will not have blk-2, so only blk-1 will be copied from new cluster, in such scenario, it will have 2 valid versions of same row. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_export_snapshot on standby side
On 21 May 2013 19:16, Fujii Masao masao.fu...@gmail.com wrote: We cannot run parallel pg_dump on the standby server because pg_export_snapshot() always fails on the standby. Is this the oversight of parallel pg_dump or pg_export_snapshot()? pg_export_snapshot() fails in the standby because it always assigns new XID and which is not allowed in the standby. Do we really need to assign new XID even in the standby for the exportable snapshot? Having looked at the code, I say No, we don't *need* to. There are various parts of the code that deal with takenDuringRecovery, so much of this was clearly intended to work in recovery. We use the topXid for the name of the snapshot file. That is clearly unnecessary and we should be using the virtualxid instead like we do elsewhere. We also use the topXid to test whether it is still running, though again, we could equally use the virtualxid instead. There is no problem with virtualxids possibly not being active anymore, since if we didn't have an xid before and don't have one now, and the xmin is the same, the snapshot is still valid. I think we should treat this as a bug and fix it in 9.3 while we're still in beta. Why? Because once we begin using the topXid as the filename we can't then change later to using the vxid. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind, a tool for resynchronizing an old master after failover
On Sat, May 25, 2013 at 10:05 AM, Amit kapila amit.kap...@huawei.com wrote: In a nutshell, the idea is to do copy everything that has changed between the cluster, like rsync does, but instead of reading through all files, use the WAL to determine what has changed. Here's a somewhat more detailed explanation, from the README: This is really a nice idea and an important requirement from many users. +1 Does this tool handle all kind of operations user would have performaed after forking of new cluster or it would mandate that user should not have performed certain kind of operations on old cluster after new cluster forked off? Truncate and all kinds of DROP come to mind, also table rewrites from ALTER. The tool should probably just flag those relation files to be copied wholesale. Regards, Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Planning incompatibilities for Postgres 10.0
There are a number of changes we'd probably like to make to the way things work in Postgres. This thread is not about discussing what those are, just to say that requirements exist and have been discussed in various threads over time. The constraint on such changes is that we've decided that we must have an upgrade path from release to release. So I'd like to make a formal suggestion of a plan for how we cope with this: 1. Implement online upgrade in 9.4 via the various facilities we have in-progress. That looks completely possible. 2. Name the next release after that 10.0 (would have been 9.5). We declare now that a) 10.0 will support on-line upgrade from 9.4 (only) b) various major incompatibilities will be introduced in 10.0 - the change in release number will indicate to everybody that is the case c) agree that there will be no pg_upgrade patch from 9.4 to 10.0, so that we will not be constrained by that This plan doesn't presume any particular change. Each change would need to be discussed on a separate thread, with a separate case for each. All I'm suggesting is that we have a coherent plan for the timing of such changes, so we can bundle them together into one release. By doing this now we give ourselves lots of time to plan changes that will see us good for another decade. If we don't do this, then we simply risk losing the iniative by continuing to support legacy formats and approaches. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of freezing
On 24 May 2013 17:00, Robert Haas robertmh...@gmail.com wrote: On Fri, May 24, 2013 at 11:29 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, May 24, 2013 at 10:53 AM, Andres Freund and...@2ndquadrant.com wrote: [all-visible cannot restore hint bits without FPI because of torn pages] I haven't yet thought about this sufficiently yet. I think we might have a chance of working around this, let me ponder a bit. Yeah. I too feel like there might be a solution. But I don't know have something specific in mind, yet anyway. One thought I had is that it might be beneficial to freeze when a page ceases to be all-visible, rather than when it becomes all-visible. Any operation that makes the page not-all-visible is going to emit an FPI anyway, so we don't have to worry about torn pages in that case. Under such a scheme, we'd have to enforce the rule that xmin and xmax are ignored for any page that is all-visible; and when a page ceases to be all-visible, we have to go back and really freeze the pre-existing tuples. I think we might be able to use the existing all_visible_cleared/new_all_visible_cleared flags to trigger this behavior, without adding anything new to WAL at all. I like the idea but it would mean we'd have to freeze in the foreground path rather in a background path. Have we given up on the double buffering idea to remove FPIs completely? If we did that, then this wouldn't work. Anyway, I take it the direction of this idea is that we don't need a separate freezemap, just use the vismap. That seems to be forcing ideas down a particular route we may regret. I'd rather just keep those things separate, even if we manage to merge the WAL actions for most of the time. Some other related thoughts: ISTM that if we really care about keeping xids for debug purposes that it could be a parameter. For the mainline, we just freeze blocks at the same time we do page pruning. I think the right way is actually to rethink and simplify all this complexity of Freezing/Pruning/Hinting/Visibility -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Processing long AND/OR lists
When Postgres encounters a long list of AND/OR chains, it errors out at check_stack_depth() after a limit of few thousand. At around 10,000 elements, the recursion at assign_expr_collations() causes the error. But at a little higher element count, around 15,000, the recursion check errors out a little earlier, in the stack around transformAExprAnd(). The test queries were generated using the attached test.sh script. This is not a synthetic test. Recently I saw a report where Slony generated a huge list of 'log_actionseq nnn and log_actionseq nnn and ...' for a SYNC event, and that SYNC event could not complete due to this error. Granted that Slony can be fixed by making it generate the condition as 'log_actionseq not in (nnn, nnn)' which is processed non-recursively, but IMHO Postgres should be capable of handling this trivial construct, however long it may be (of course, limited by memory). To that end, I wrote the attached patch, and although I seem to be playing by the rules, `make check` fails. Some diffs look benign, but others not so much. AIUI, a BoolExpr is perfectly capable of holding multiple expressions as a list. And since SQL standard does not say anything about short-circuit evaluation, the evaluation order of these expressions should not affect the results. So clearly turning a tree of booleans expressions into a list is not so subtle a change. I suspect that this patch is messing up the join conditions. I see two ways to fix it: 1) Fix the order of expressions in BoolExpr such that the order of evaluation remains unchanged compared to before the patch. I expect this to take care of the benign diffs. But I doubt this will address the other diffs. 2) Construct a tree same as done before the patch, but do it non-recursively. This is I guess the right thing to do, but then we'll have to make similar tree-traversal changes in other places, for eg. in BoolExpr walking in expression_tree_walker() or maybe just the stack below assign_expr_collations(). Best regards, -- Gurjeet Singh http://gurjeet.singh.im/ EnterpriseDB Inc. non_recursive_and_or_transformation.patch Description: Binary data test.sh Description: Bourne shell script -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions make install
If it seems to be on the right way, I'll keep fixing EXTENSION building with VPATH. I haven't tried the patch, but let me just say that Debian (and apt.postgresql.org) would very much like the VPATH situation getting improved. At the moment we seem to have to invent a new build recipe for every extension around. I have been busy the last week. I just took time to inspect our contribs, USE_PGXS is not supported by all of them atm because of SHLIB_PREREQS (it used submake) I have a patch pending here to fix that. Once all our contribs can build with USE_PGXS I fix the VPATH. The last step is interesting: installcheck/REGRESS. For this one, if I can know exactly what's required (for debian build for example), then I can also fix this target. -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Planning incompatibilities for Postgres 10.0
On Sat, May 25, 2013 at 4:39 AM, Simon Riggs si...@2ndquadrant.com wrote: There are a number of changes we'd probably like to make to the way things work in Postgres. This thread is not about discussing what those are, just to say that requirements exist and have been discussed in various threads over time. The constraint on such changes is that we've decided that we must have an upgrade path from release to release. So I'd like to make a formal suggestion of a plan for how we cope with this: 1. Implement online upgrade in 9.4 via the various facilities we have in-progress. That looks completely possible. 2. Name the next release after that 10.0 (would have been 9.5). We declare now that a) 10.0 will support on-line upgrade from 9.4 (only) b) various major incompatibilities will be introduced in 10.0 - the change in release number will indicate to everybody that is the case c) agree that there will be no pg_upgrade patch from 9.4 to 10.0, so that we will not be constrained by that This plan doesn't presume any particular change. Each change would need to be discussed on a separate thread, with a separate case for each. All I'm suggesting is that we have a coherent plan for the timing of such changes, so we can bundle them together into one release. By doing this now we give ourselves lots of time to plan changes that will see us good for another decade. If we don't do this, then we simply risk losing the iniative by continuing to support legacy formats and approaches. Huh. I don't think that bumping the version number to 10.0 vs 9.5 is justification to introduce breaking changes. In fact, I would rather see 10.0 be the version where we formally stop doing that. I understand that some stuff needs to be improved but it often doesn't seem to be worth the cost in the long run. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning incompatibilities for Postgres 10.0
On Sat, 2013-05-25 at 10:39 +0100, Simon Riggs wrote: The constraint on such changes is that we've decided that we must have an upgrade path from release to release. Is this proposal only relaxing the binary upgrade requirement, or would it also relax other compatibility requirements, such as language and API compatibility? We need a couple major drivers of the incompatibility that really show users some value for going through the upgrade pain. Preferably, at least one would be a serious performance boost, because the users that encounter the most logical upgrade pain are also the ones that need a performance boost the most. Before we set a specific schedule, I think it would be a good idea to start prototyping some performance improvements that involve breaking the data format. Then, depending on how achievable it is, we can plan for however many more 9.X releases we think we need. That being said, I agree with you that planning in advance is important here, so that everyone knows when they need to get format-breaking changes in by. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning incompatibilities for Postgres 10.0
On 25 May 2013 18:13, Jeff Davis pg...@j-davis.com wrote: On Sat, 2013-05-25 at 10:39 +0100, Simon Riggs wrote: The constraint on such changes is that we've decided that we must have an upgrade path from release to release. Is this proposal only relaxing the binary upgrade requirement, or would it also relax other compatibility requirements, such as language and API compatibility? I'm suggesting that as many as possible changes we would like to make can happen in one release. This is for the benefit of users, so we dont make every release a source of incompatibilities. And that release should be the first one where we have online upgrade possible, which is one after next. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)
On Thu, May 16, 2013 at 7:05 PM, Greg Smith g...@2ndquadrant.com wrote: On 5/16/13 9:16 AM, Jon Nelson wrote: Am I doing this the right way? Should I be posting the full patch each time, or incremental patches? There are guidelines for getting your patch in the right format at https://wiki.postgresql.org/wiki/Working_with_Git#Context_diffs_with_Git that would improve this one. You have some formatting issues with tab spacing at lines 120 through 133 in your v3 patch. And it looks like there was a formatting change on line 146 that is making the diff larger than it needs to be. I've corrected the formatting change (end-of-line whitespace was stripped) on line 146. The other whitespace changes are - I think - due to newly-indented code due to a new code block. Included please find a v4 patch which uses context diffs per the above url. The biggest thing missing from this submission is information about what performance testing you did. Ideally performance patches are submitted with enough information for a reviewer to duplicate the same test the author did, as well as hard before/after performance numbers from your test system. It often turns tricky to duplicate a performance gain, and being able to run the same test used for initial development eliminates a lot of the problems. This has been a bit of a struggle. While it's true that WAL file creation doesn't happen with great frequency, and while it's also true that - with strace and other tests - it can be proven that fallocate(16MB) is much quicker than writing it zeroes by hand, proving that in the larger context of a running install has been challenging. Attached you'll find a small test script (t.sh) which creates a new cluster in 'foo', changes some config values, starts the cluster, and then times how long it takes pgbench to prepare a database. I've used wal_level = hot_standby in the hopes that this generates the largest number of WAL files (and I set the number of such files to 1024). The hardware is an AMD 9150e with a 2-disk software RAID1 (SATA disks) on kernel 3.9.2 and ext4 (x86_64, openSUSE 12.3). The test results are not that surprising. The longer the test (the larger the scale factor) the less of a difference using posix_fallocate makes. With a scale factor of 100, I see an average of 10-11% reduction in the time taken to initialize the database. With 300, it's about 5.5% and with 900, it's between 0 and 1.2%. I will be doing more testing but this is what I started with. I'm very open to suggestions. Second bit of nitpicking. There are already some GUC values that appear or disappear based on compile time options. They're all debugging related things though. I would prefer not to see this one go away when it's implementation isn't available. That's going to break any scripts that SHOW the setting to see if it's turned on or not as a first problem. I think the right model to follow here is the IFDEF setup used for effective_io_concurrency. I wouldn't worry about this too much though. Having a wal_use_fallocate GUC is good for testing. But if it works out well, when it's ready for commit I don't see why anyone would want it turned off on platforms where it works. There are already too many performance tweaking GUCs. Something has to be very likely to be changed from the default before its worth adding one for it. Ack. I've revised the patch to always have the GUC (for now), default to false, and if configure can't find posix_fallocate (or the user disables it by way of pg_config_manual.h) then it remains a GUC that simply can't be changed. I'll also be re-running the tests. -- Jon attachment: plot.png fallocate-v4.patch Description: Binary data t.sh Description: Bourne shell script -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning incompatibilities for Postgres 10.0
On Sat, May 25, 2013 at 10:39:30AM +0100, Simon Riggs wrote: There are a number of changes we'd probably like to make to the way things work in Postgres. This thread is not about discussing what those are, just to say that requirements exist and have been discussed in various threads over time. The constraint on such changes is that we've decided that we must have an upgrade path from release to release. So I'd like to make a formal suggestion of a plan for how we cope with this: 1. Implement online upgrade in 9.4 via the various facilities we have in-progress. That looks completely possible. 2. Name the next release after that 10.0 (would have been 9.5). We declare now that a) 10.0 will support on-line upgrade from 9.4 (only) b) various major incompatibilities will be introduced in 10.0 - the change in release number will indicate to everybody that is the case c) agree that there will be no pg_upgrade patch from 9.4 to 10.0, so that we will not be constrained by that Assuming online upgrade is going to require logical replication, you are also assuming 2x storage as you need to have a second cluster to perform the upgrade. pg_upgrade would still be needed to upgrade a cluster in-place. This sounds like, I created a new tool which does some of what the old tool does. Let's break the old tool to allow some unspecified changes I might want to make. I consider this thread to be not thought-through, obviously. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_export_snapshot on standby side
On Sat, May 25, 2013 at 6:18 PM, Simon Riggs si...@2ndquadrant.com wrote: On 21 May 2013 19:16, Fujii Masao masao.fu...@gmail.com wrote: We cannot run parallel pg_dump on the standby server because pg_export_snapshot() always fails on the standby. Is this the oversight of parallel pg_dump or pg_export_snapshot()? pg_export_snapshot() fails in the standby because it always assigns new XID and which is not allowed in the standby. Do we really need to assign new XID even in the standby for the exportable snapshot? Having looked at the code, I say No, we don't *need* to. Good to hear. There are various parts of the code that deal with takenDuringRecovery, so much of this was clearly intended to work in recovery. We use the topXid for the name of the snapshot file. That is clearly unnecessary and we should be using the virtualxid instead like we do elsewhere. We also use the topXid to test whether it is still running, though again, we could equally use the virtualxid instead. There is no problem with virtualxids possibly not being active anymore, since if we didn't have an xid before and don't have one now, and the xmin is the same, the snapshot is still valid. I think we should treat this as a bug and fix it in 9.3 while we're still in beta. +1 Why? Because once we begin using the topXid as the filename we can't then change later to using the vxid. I'm afraid that pg_export_snapshot() in 9.2 has already been using topXid as the filename. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_export_snapshot on standby side
Fujii Masao escribió: On Sat, May 25, 2013 at 6:18 PM, Simon Riggs si...@2ndquadrant.com wrote: I think we should treat this as a bug and fix it in 9.3 while we're still in beta. +1 Why? Because once we begin using the topXid as the filename we can't then change later to using the vxid. I'm afraid that pg_export_snapshot() in 9.2 has already been using topXid as the filename. I don't think this matters at all. The filename is an implementation detail which we don't even need to keep compatible across pg_upgrade. If we think this is a bug, we should backpatch the fix to 9.2. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] background worker and normal exit
Hi, I found that the normal exit (i.e., with exit code 0) of bgworker always leads to the immediate restart of bgworker whatever bgw_restart_time is. Is this intentional? Not only crash but also normal exit should go along with bgw_restart_time? I'm now writing the bgworker which is allowed to be running only during recovery. After recovery ends, that bgworker is expected to exit with code 0. I was thinking to avoid the restart of the bgworker after normal exit, by using BGW_NEVER_RESTART. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of freezing
Andres, all visible is only set in vacuum and it determines which parts of a table will be scanned in a non full table vacuum. So, since we won't regularly start vacuum in the insert only case there will still be a batch of work at once. But nearly all of that work is *already* performed. We would just what the details of that around for a bit. *But* since we now would only need to vacuum the non all-visible part that would get noticeably cheaper as well. Yeah, I can see that. Seems worthwhile, then. I think for that case we should run vacuum more regularly for insert only tables since we currently don't do regularly enough which a) increases the amount of work needed at once and b) prevents index only scans from working there. Yes. I'm not sure how we would set this though; I think it's another example of how autovacuum's parameters for when to vaccuum etc. are too simple-minded for the real world. Doing an all-visible scan on an insert-only table, for example, should be based on XID age and not on % inserted, no? Speaking of which, I need to get on revamping the math for autoanalyze. Mind you, in the real-world insert-only table case, this does create extra IO -- real insert-only tables often have a few rows ( 5% ) which are updated/deleted. Vacuum would see these and want to clean the pages up, which would create much more substantial IO. It might still be a good tradeoff, but we should be aware of it. Unless we want a special VACUUM ALL VISIBLE mode. I vote no, unless we demonstrate some really convincing case for it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] View Index and UNION
Hi I've encountered a fundamental problem which - to me - can only be solved with an (future/possible) real index on views in PostgreSQL (like the exist already in MS SQL Server and Ora): Given following schema: 1. TABLE a and TABLE b, each with INDEX on attribute geom. 2. A VIEW with union: CREATE VIEW myview AS SELECT * FROM a UNION SELECT * FROM b; 3. And a simple query with KNN index and a coordinate mypos : SELECT * FROM myview ORDER BY ST_Geomfromtext(mypos) - myview.geom Now, the problem is, that for the order by it is not enough that each on the two tables calculate the ordering separately: We want a total ordering over all involved tables! In fact, the planner realizes that and chooses a seq scan over all tuples of table a and b - which is slow and suboptimal! To me, that's a use case where we would wish to have a distinct index on views. Any opinions on this? Yours, Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] View Index and UNION
Could this scenario not be handled by a step that orders the two tables independently, then for the view interleaves the presorted results? Merging two sorted sets into a single sorted set is usually a trivial task, and it could still take advantage of the existing indexes. William King Senior Engineer Quentus Technologies, INC 1037 NE 65th St Suite 273 Seattle, WA 98115 Main: (877) 211-9337 Office: (206) 388-4772 Cell: (253) 686-5518 william.k...@quentustech.com On 05/25/2013 05:35 PM, Stefan Keller wrote: Hi I've encountered a fundamental problem which - to me - can only be solved with an (future/possible) real index on views in PostgreSQL (like the exist already in MS SQL Server and Ora): Given following schema: 1. TABLE a and TABLE b, each with INDEX on attribute geom. 2. A VIEW with union: CREATE VIEW myview AS SELECT * FROM a UNION SELECT * FROM b; 3. And a simple query with KNN index and a coordinate mypos : SELECT * FROM myview ORDER BY ST_Geomfromtext(mypos) - myview.geom Now, the problem is, that for the order by it is not enough that each on the two tables calculate the ordering separately: We want a total ordering over all involved tables! In fact, the planner realizes that and chooses a seq scan over all tuples of table a and b - which is slow and suboptimal! To me, that's a use case where we would wish to have a distinct index on views. Any opinions on this? Yours, Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] background worker and normal exit
On Sun, May 26, 2013 at 6:16 AM, Fujii Masao masao.fu...@gmail.com wrote: I found that the normal exit (i.e., with exit code 0) of bgworker always leads to the immediate restart of bgworker whatever bgw_restart_time is. Is this intentional? Yes, per the docs: http://www.postgresql.org/docs/devel/static/bgworker.html Background workers are expected to be continuously running; if they exit cleanly, postgres will restart them immediately. Not only crash but also normal exit should go along with bgw_restart_time? bgw_restart_time corresponds to the time a bgworker is restarted in case of a crash only. I'm now writing the bgworker which is allowed to be running only during recovery. After recovery ends, that bgworker is expected to exit with code 0. I was thinking to avoid the restart of the bgworker after normal exit, by using BGW_NEVER_RESTART. This flag makes a worker not to restart only in case of a crash. To solve your problem, you could as well allow your process to restart and put it in indefinite sleep if server is not in recovery such it it will do nothing in your case. Regards, -- Michael
Re: [HACKERS] Processing long AND/OR lists
On Sat, May 25, 2013 at 9:56 AM, Gurjeet Singh gurj...@singh.im wrote: When Postgres encounters a long list of AND/OR chains, it errors out at check_stack_depth() after a limit of few thousand. At around 10,000 elements, the recursion at assign_expr_collations() causes the error. But at a little higher element count, around 15,000, the recursion check errors out a little earlier, in the stack around transformAExprAnd(). The test queries were generated using the attached test.sh script. This is not a synthetic test. Recently I saw a report where Slony generated a huge list of 'log_actionseq nnn and log_actionseq nnn and ...' for a SYNC event, and that SYNC event could not complete due to this error. Granted that Slony can be fixed by making it generate the condition as 'log_actionseq not in (nnn, nnn)' which is processed non-recursively, but IMHO Postgres should be capable of handling this trivial construct, however long it may be (of course, limited by memory). To that end, I wrote the attached patch, and although I seem to be playing by the rules, `make check` fails. Some diffs look benign, but others not so much. AIUI, a BoolExpr is perfectly capable of holding multiple expressions as a list. And since SQL standard does not say anything about short-circuit evaluation, the evaluation order of these expressions should not affect the results. So clearly turning a tree of booleans expressions into a list is not so subtle a change. I suspect that this patch is messing up the join conditions. I see two ways to fix it: 1) Fix the order of expressions in BoolExpr such that the order of evaluation remains unchanged compared to before the patch. I expect this to take care of the benign diffs. But I doubt this will address the other diffs. 2) Construct a tree same as done before the patch, but do it non-recursively. This is I guess the right thing to do, but then we'll have to make similar tree-traversal changes in other places, for eg. in BoolExpr walking in expression_tree_walker() or maybe just the stack below assign_expr_collations(). As suspected, the problem was that a JOIN's USING clause processing cooks up its own join conditions, and those were the ones that couldn't cope with the changed order of output. Fixing that order of arguments of the BoolExpr also fixed all the JOIN related diffs. Now `make check` passes except for this benign diff. | WHERE (((rsl.sl_color = rsh.slcolor) AND (rsl.sl_len_cm = rsh.slminlen_cm)) AND (rsl.sl_len_cm = rsh.slmaxlen_cm)); | WHERE ((rsl.sl_color = rsh.slcolor) AND (rsl.sl_len_cm = rsh.slminlen_cm) AND (rsl.sl_len_cm = rsh.slmaxlen_cm)); That's expected, since for cases like these, the patch converts what used to be a tree of 2-argument BoolExprs into a single BoolExpr with many arguments. -- Gurjeet Singh http://gurjeet.singh.im/ EnterpriseDB Inc. non_recursive_and_or_transformation_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions make install
Re: Cédric Villemain 2013-05-25 201305251641.28401.ced...@2ndquadrant.com I just took time to inspect our contribs, USE_PGXS is not supported by all of them atm because of SHLIB_PREREQS (it used submake) I have a patch pending here to fix that. Once all our contribs can build with USE_PGXS I fix the VPATH. The evil part of the problem is that you'd need to fix it all the way back to 8.4 (or 9.0 once 8.4 is EOL) if we want it to build extensions on apt.pg.org. (Or find a way that is compatible with as many majors as possible.) The last step is interesting: installcheck/REGRESS. For this one, if I can know exactly what's required (for debian build for example), then I can also fix this target. Debian builds don't have root access, so make installcheck would need to install into a temp dir (probably DESTDIR). Currently this fails because the server will still insist on reading the extension control files from PGSHAREDIR. (See the thread starting at http://www.postgresql.org/message-id/20120221101903.GA15647@gnash for details.) Otherwise than that, and my pending pg_regress --host /tmp patch, things should probably ok for buildds. Christoph -- c...@df7cb.de | http://www.df7cb.de/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers