Re: [HACKERS] Pre-alloc ListCell's optimization
* Stephen Frost (sfr...@snowman.net) wrote: Finally, sorry it's kind of a fugly patch, it's just a proof of concept and I'd be happy to clean it up if others feel it's worthwhile and a reasonable approach, but I really need to get it out there and take a break from it (I've been a bit obsessive-compulsive about it since PGCon.. :D). Erm, sorry, just to clarify, while it's a P-O-C patch, it does compile cleanly and passes all the regression tests, so it's something that one can play with at least. Not sure if it'd be worth benchmarking it until we feel comfortable that this is a decent approach, but I wouldn't complain if someone decided to... Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Should partial dumps include extensions?
On Tue, May 24, 2011 at 4:44 PM, Tom Lane t...@sss.pgh.pa.us wrote: There's a complaint here http://archives.postgresql.org/pgsql-general/2011-05/msg00714.php about the fact that 9.1 pg_dump always dumps CREATE EXTENSION commands for all loaded extensions. Should we change that? A reasonable compromise might be to suppress extensions in the same cases where we suppress procedural languages, ie if --schema or --table was used (see include_everything switch in pg_dump.c). Making it work like procedural languages seems sensible to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pre-alloc ListCell's optimization
Excerpts from Stephen Frost's message of mar may 24 22:56:21 -0400 2011: A couple of notes regarding the patch: First, it uses ffs(), which might not be fully portable.. We could certainly implement the same thing in userspace and use ffs() when it's available. Err, see RIGHTMOST_ONE in bitmapset.c. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tackling full page writes
Robert Haas wrote: 2. The other fairly obvious alternative is to adjust our existing WAL record types to be idempotent - i.e. to not rely on the existing page contents. For XLOG_HEAP_INSERT, we currently store the target tid and the tuple contents. I'm not sure if there's anything else, but we would obviously need the offset where the new tuple should be written, which we currently infer from reading the existing page contents. For XLOG_HEAP_DELETE, we store just the TID of the target tuple; we would certainly need to store its offset within the block, and maybe the infomask. For XLOG_HEAP_UPDATE, we'd need the old and new offsets and perhaps also the old and new infomasks. Assuming that's all we need and I'm not missing anything (which I won't bet on), that means we'd be adding, say, 4 bytes per insert or delete and 8 bytes per update. So, if checkpoints are spread out widely enough that there will be more than ~2K operations per page between checkpoints, then it makes more sense to just do a full page write and call it good. If not, this idea might have legs. I vote for wal_level = idempotent because so few people will know what idempotent means. ;-) Idempotent does seem like the most promising idea. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] The way to know whether the standby has caught up with the master
Hi, For reliable high-availability, when the master crashes, the clusterware must know whether it can promote the standby safely without any data loss, before actually promoting it. IOW, it must know whether the standby has already caught up with the primary. Otherwise, failover might cause data loss. We can know that from pg_stat_replication on the master. But the problem is that pg_stat_replication is not available since the master is not running at that moment. So that info should be available also on the standby. To achieve that, I'm thinking to change walsender so that, when the standby has caught up with the master, it sends back the message indicating that to the standby. And I'm thinking to add new function (or view like pg_stat_replication) available on the standby, which shows that info. Thought? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The way to know whether the standby has caught up with the master
On 25.05.2011 07:42, Fujii Masao wrote: For reliable high-availability, when the master crashes, the clusterware must know whether it can promote the standby safely without any data loss, before actually promoting it. IOW, it must know whether the standby has already caught up with the primary. Otherwise, failover might cause data loss. We can know that from pg_stat_replication on the master. But the problem is that pg_stat_replication is not available since the master is not running at that moment. So that info should be available also on the standby. To achieve that, I'm thinking to change walsender so that, when the standby has caught up with the master, it sends back the message indicating that to the standby. And I'm thinking to add new function (or view like pg_stat_replication) available on the standby, which shows that info. By the time the standby has received that message, it might not be caught-up anymore because new WAL might've been generated in the master already. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The way to know whether the standby has caught up with the master
On Wed, May 25, 2011 at 2:16 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 25.05.2011 07:42, Fujii Masao wrote: For reliable high-availability, when the master crashes, the clusterware must know whether it can promote the standby safely without any data loss, before actually promoting it. IOW, it must know whether the standby has already caught up with the primary. Otherwise, failover might cause data loss. We can know that from pg_stat_replication on the master. But the problem is that pg_stat_replication is not available since the master is not running at that moment. So that info should be available also on the standby. To achieve that, I'm thinking to change walsender so that, when the standby has caught up with the master, it sends back the message indicating that to the standby. And I'm thinking to add new function (or view like pg_stat_replication) available on the standby, which shows that info. By the time the standby has received that message, it might not be caught-up anymore because new WAL might've been generated in the master already. Right. But, thanks to sync rep, until such a new WAL has been replicated to the standby, the commit of transaction is not visible to the client. So, even if there are some WAL not replicated to the standby, the clusterware can promote the standby safely without any data loss (to the client point of view), I think. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Foreign memory context read
Indeed I was acting weird there. I had completely forgotten about the bool pointer. Moreover, I actually got confused about the palloc0's return type...whether it was a datum or a pointer to datum. Looked back at the expansion and got it clear. Thanks a lot Mr. Tom. Regards, Vaibhav On Mon, 2011-05-23 at 09:58 -0400, Tom Lane wrote: Vaibhav Kaushal vaibhavkaushal...@gmail.com writes: My mind started wandering after that error. Now, actually, i was trying to do something like this: *last_result = palloc0(sizeof(Datum)); bool *isnnuull = true; *last_result = slot_getattr(slot, num_atts, *isnnuull); This seems utterly confused about data types. The first line thinks that last_result is of type Datum ** (ie, pointer to pointer to Datum), since it's storing a pointer-to-Datum through it. The third line however is treating last_result as of type Datum *, since it's storing a Datum (not pointer to Datum) through it. And the second line is assigning true (a bool value) to a variable declared as pointer to bool, which you then proceed to incorrectly dereference while passing it as the last argument to slot_getattr. The code will certainly crash on that deref, independently of the multiple other bugs here. Recommendation: gcc is your friend. Pay attention to the warnings it gives you. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Proposal: Another attempt at vacuum improvements
Hi All, Some of the ideas regarding vacuum improvements were discussed here: http://archives.postgresql.org/pgsql-hackers/2008-05/msg00863.php http://archives.postgresql.org/pgsql-patches/2008-06/msg00059.php A recent thread was started by Robert Haas, but I don't know if we logically concluded that either. http://archives.postgresql.org/pgsql-hackers/2011-03/msg00946.php This was once again brought up by Robert Haas in a discussion with Tom and me during the PGCon and we agreed there are few things we can do make vacuum more performant. One of the things that Tom mentioned is that the vacuum today is not aware of the fact that its a periodic operation and there might be ways to utilize that in some way. The biggest gripe today is that vacuum needs two heap scans and each scan dirties the buffer. While visibility map ensures that not-all blocks are read and written during the scan, for a very large table, even a small percentage of blocks can be significant. Further, post-HOT, the second scan of the heap does not really reclaim any significant space, except for dead line pointers. So there is a good reason to avoid that. I wanted to start a discussion just about that. I am proposing one solution below, but I am not married to the idea. So the idea is to separate the index vacuum (removing index pointers to dead tuples) from the heap vacuum. When we do heap vacuum (either by HOT-pruning or using regular vacuum), we can spool the dead line pointers somewhere. To avoid any hot-spots during normal processing, the spooling can be done periodically like the stats collection. One obvious choice for spooling dead line pointers is to use a relation fork. The index vacuum will be kicked off periodically depending on the number of spooled deal line pointers. When that happens, the index vacuum will remove all index pointers pointing to those dead line pointers and forget the spooled line pointers. The dead line pointers themselves will be removed whenever a heap page is later vacuumed, either as part of HOT pruning or the next heap vacuum. We would need some mechanism though to know that the index pointers to the existing dead line pointers have been vacuumed and its safe to remove them now. May be we can track the last operation that generated a dead line pointer in the page using a LSN in the page header and also keep track of the LSN of the last successful index vacuum. If the index vacuum LSN is greater than the page header vacuum LSN, we can safely remove the existing dead line pointers. I am deliberately not suggesting how to track the index vacuum LSN since my last proposal to do something similar through a pg_class column was shot down by Tom :-) In nutshell, what I am suggesting is to do heap and index vacuuming independently. The heap will be vacuumed either by HOT pruning or a periodic heap vacuum and the dead line pointers will be collected. An index vacuum will remove the index pointers to those dead line pointers. And at some later point, the dead line pointers will be removed, either as part of retail or complete heap vacuum. Its not clear if its useful, but a single index vacuum can follow multiple heap vacuums or vice versa. Another advantage of this technique would be that we can then support start/stop heap vacuum or vacuuming a range of blocks at a time or even vacuuming only those blocks which are already cached in the buffer cache. Just a hand-waving at this point, but seems possible. Suggestions/comments/criticism all welcome, but please don't shoot down the idea on implementation details since I have really not spent time on that, so it will be easy find holes and corner cases. That can be worked out if we believe something like this will be useful. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Reducing overhead of frequent table locks
On Mon, May 23, 2011 at 09:15:27PM -0400, Robert Haas wrote: On Fri, May 13, 2011 at 4:16 PM, Noah Misch n...@leadboat.com wrote: ? ? ? ?if (level = ShareUpdateExclusiveLock) ? ? ? ? ? ? ? ?++strong_lock_counts[my_strong_lock_count_partition] ? ? ? ? ? ? ? ?sfence ? ? ? ? ? ? ? ?if (strong_lock_counts[my_strong_lock_count_partition] == 1) ? ? ? ? ? ? ? ? ? ? ? ?/* marker 1 */ ? ? ? ? ? ? ? ? ? ? ? ?import_all_local_locks ? ? ? ? ? ? ? ?normal_LockAcquireEx ? ? ? ?else if (level = RowExclusiveLock) ? ? ? ? ? ? ? ?lfence ? ? ? ? ? ? ? ?if (strong_lock_counts[my_strong_lock_count_partition] == 0) ? ? ? ? ? ? ? ? ? ? ? ?/* marker 2 */ ? ? ? ? ? ? ? ? ? ? ? ?local_only ? ? ? ? ? ? ? ? ? ? ? ?/* marker 3 */ ? ? ? ? ? ? ? ?else ? ? ? ? ? ? ? ? ? ? ? ?normal_LockAcquireEx ? ? ? ?else ? ? ? ? ? ? ? ?normal_LockAcquireEx At marker 1, we need to block until no code is running between markers two and three. ?You could do that with a per-backend lock (LW_SHARED by the strong locker, LW_EXCLUSIVE by the backend). ?That would probably still be a win over the current situation, but it would be nice to have something even cheaper. Barring some brilliant idea, or anyway for a first cut, it seems to me that we can adjust the above pseudocode by assuming the use of a LWLock. In addition, two other adjustments: first, the first line should test level ShareUpdateExclusiveLock, rather than =, per previous discussion. Second, import_all_local locks needn't really move everything; just those locks with a matching locktag. Thus: !if (level ShareUpdateExclusiveLock) !++strong_lock_counts[my_strong_lock_count_partition] !sfence !for each backend !take per-backend lwlock for target backend !transfer fast-path entries with matching locktag !release per-backend lwlock for target backend !normal_LockAcquireEx !else if (level = RowExclusiveLock) !lfence !if (strong_lock_counts[my_strong_lock_count_partition] == 0) !take per-backend lwlock for own backend !fast-path lock acquisition !release per-backend lwlock for own backend !else !normal_LockAcquireEx !else !normal_LockAcquireEx This drops the part about only transferring fast-path entries once when a strong_lock_counts cell transitions from zero to one. Granted, that itself requires some yet-undiscussed locking. For that matter, we can't have multiple strong lockers completing transfers on the same cell in parallel. Perhaps add a FastPathTransferLock, or an array of per-cell locks, that each strong locker holds for that entire if body and while decrementing the strong_lock_counts cell at lock release. As far as the level of detail of this pseudocode goes, there's no need to hold the per-backend LWLock while transferring the fast-path entries. You just need to hold it sometime between bumping strong_lock_counts and transferring the backend's locks. This ensures that, for example, the backend is not sleeping in the middle of a fast-path lock acquisition for the whole duration of this code. Now, a small fly in the ointment is that we haven't got, with PostgreSQL, a portable library of memory primitives. So there isn't an obvious way of doing that sfence/lfence business. I was thinking that, if the final implementation could benefit from memory barrier interfaces, we should create those interfaces now. Start with only a platform-independent dummy implementation that runs a lock/unlock cycle on a spinlock residing in backend-local memory. I'm 75% sure that would be sufficient on all architectures for which we support spinlocks. It may turn out that we can't benefit from such interfaces at this time ... Now, it seems to me that in the strong lock case, the sfence isn't really needed anyway, because we're about to start acquiring and releasing an lwlock for every backend, and that had better act as a full memory barrier anyhow, or we're doomed. The weak lock case is more interesting, because we need the fence before we've taken any LWLock. Agreed. But perhaps it'd be sufficient to just acquire the per-backend lwlock before checking strong_lock_counts[]. If, as we hope, we get back a zero, then we do the fast-path lock acquisition, release the lwlock, and away we go. If we get back any other value, then we've wasted an lwlock acquisition cycle. Or actually maybe not: it seems to me that in that case we'd better transfer all of our fast-path entries into the main hash table before trying to acquire any lock the slow way, at least if we don't want the deadlock detector to have to know about the fast-path. So then we get this: !if (level
Re: [HACKERS] SSI predicate locking on heap -- tuple or row?
Kevin Grittner wrote: Dan Ports wrote: Does that make sense to you? Makes sense to me. Like the proof I offered, you have shown that there is no cycle which can develop with the locks copied which isn't there anyway if we don't copy the locks. I woke up with the nagging thought that while the above is completely accurate, it deserves some slight elaboration. These proofs show that there is no legitimate cycle which could cause an anomaly which the move from row-based to tuple-based logic will miss. They don't prove that the change will generate all the same serialization failures; and in fact, some false positives are eliminated by the change. That's a good thing. In addition to the benefits mentioned in prior posts, there will be a reduction in the rate of rollbacks (in particular corner cases) from what people see in beta1 without a loss of correctness. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Operator families vs. casts
PostgreSQL 9.1 will implement ALTER TABLE ALTER TYPE operations that use a binary coercion cast without rewriting the table or unrelated indexes. It will always rewrite any indexes and recheck any foreign key constraints that depend on a changing column. This is unnecessary for 100% of core binary coercion casts. In my original design[1], I planned to detect this by comparing the operator families of the old and would-be-new indexes. (This still yields some unnecessary rewrites; oid_ops and int4_ops are actually compatible, for example.) When I implemented[2] it, I found that the contracts[3] for operator families are not strong enough to prove that the existing indexes and constraints remain valid. Specifically, I wished assume val0 = val1 iff val0::a = val1::b for any val0, val1, a, b such that we resolve both equality operators in the same operator family. The operator family contracts say nothing about consistency with casts. Is there a credible use case for violating that assumption? If not, I'd like to document it as a requirement for operator family implementors. The above covers B-tree and hash operator families. GIN and GiST have no operator family contracts. Here was the comment in my first patch intended to sweep that under the table: ! * We do not document a contract for GIN or GiST operator families. Only the ! * GIN operator family array_ops has more than one constituent operator class, ! * and only typmod-only changes to arrays can avoid a rewrite. Preserving a GIN ! * index across such a change is safe. We therefore support GiST and GIN here ! * using the same rules as for B-tree and hash indexes, but that is mostly ! * academic. Any forthcoming contract for GiST or GIN operator families should, ! * all other things being equal, bolster the validity of this assumption. ! * ! * Exclusion constraints raise the question: can we trust that the operator has ! * the same semantics with the new type? The operator will fall in the index's ! * operator family. For B-tree or hash, the operator will be = or , ! * yielding an affirmative answer from contractual requirements. For GiST and ! * GIN, we assume that a similar requirement would fall out of any contract for ! * their operator families, should one arise. We therefore support exclusion ! * constraints without any special treatment, but this is again mostly academic. Any thoughts on what to do here? We could just add basic operator family contracts requiring what we need. Perhaps, instead, the ALTER TABLE code should require an operator family match for B-tree and hash but an operator class match for other access methods. For now, I plan to always rewrite indexes on expressions or having predicates. With effort, we could detect compatible changes there, too. I also had a more mundane design question in the second paragraph of [2]. It can probably wait for the review of the next version of the patch. However, given that it affects a large percentage of the patch, I'd appreciate any early feedback on it. Thanks, nm [1] http://archives.postgresql.org/message-id/20101229125625.ga27...@tornado.gateway.2wire.net [2] http://archives.postgresql.org/message-id/20110113230124.ga18...@tornado.gateway.2wire.net [3] http://www.postgresql.org/docs/9.0/interactive/xindex.html#XINDEX-OPFAMILY -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sepgsql: fix relkind handling on foreign tables
2011/5/23 Robert Haas robertmh...@gmail.com: On Sun, May 22, 2011 at 5:52 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: The attached patch fixes up case handling in foreign tables. Now it didn't assign security label on foreign table on its creation time, and didn't check access rights on the dml hook. This patch fixes these problems; It allows foreign tables default labeling and access checks as db_table object class. A foreign table is really more like a view, or a function call. Are you sure you want to handle it like a table? It might be a tentative solution, so I'll want to cancel this patch. Its nature is indeed more similar to function call rather than tables, but not a function itself. So, it might be a better idea to define its own object class such as db_foreign_table instead of existing object classes. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sepgsql: fix relkind handling on foreign tables
On Tue, May 24, 2011 at 6:57 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: 2011/5/23 Robert Haas robertmh...@gmail.com: On Sun, May 22, 2011 at 5:52 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: The attached patch fixes up case handling in foreign tables. Now it didn't assign security label on foreign table on its creation time, and didn't check access rights on the dml hook. This patch fixes these problems; It allows foreign tables default labeling and access checks as db_table object class. A foreign table is really more like a view, or a function call. Are you sure you want to handle it like a table? It might be a tentative solution, so I'll want to cancel this patch. Its nature is indeed more similar to function call rather than tables, but not a function itself. So, it might be a better idea to define its own object class such as db_foreign_table instead of existing object classes. Perhaps. Or else use db_view. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Small patch for GiST: move childoffnum to child
During preparing patch of my GSoC project I found reasonable to move childoffnum (GISTInsertStack structure) from parent to child. This means that now child have childoffnum of parent's link to child. It allows to maintain entire parts of tree in that GISTInsertStack structures. Also it simplifies existing code a bit. Heikki advice me that since this change simplifies existing code it can be considered as a separate patch. -- With best regards, Alexander Korotkov. gist_childoffnum.path Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.1 support for hashing arrays
Robert Haas wrote: On Sun, May 22, 2011 at 11:49 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I believe, however, that applying this will invalidate the contents of any hash indexes on array types that anyone has built using 9.1beta1. Do we need to do something about that? Like bumping catversion? Sure. Although note that the system catalogs are not actually changing, which goes to someone else's recent point about catversion getting bumped for things other than changes in the things for which the cat in catversion is an abbreviation. I would probably complain about that, except you already did it post-beta1: http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=9bb6d9795253bb521f81c626fea49a704a369ca9 Unfortunately, I was unable to make that omelet without breaking some eggs. :-( Possibly Bruce will feel like adding a check to pg_upgrade for the case. I wouldn't bother myself though. ?It seems quite unlikely that anyone's depending on the feature yet. I'll leave that to you, Bruce, and whoever else wants to weigh in to hammer that one out. Oh, you are worried someone might have stored hash indexes with the old catalog format? Seems like something we might mention in the next beta release announcement, but nothing more. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #6034: pg_upgrade fails when it should not.
Robert Haas wrote: On Mon, May 23, 2011 at 8:26 AM, Bruce Momjian br...@momjian.us wrote: Sorry, I was unclear. ?The question is whether the case of _name_ of the locale is significant, meaning can you have two locale names that differ only by case and behave differently? That would seem surprising to me, but I really have no idea. There's the other direction, too: two locales that vary by something more than case, but still have identical behavior. Maybe we just decide not to worry about that, but then why worry about this? Well, if we remove the check then people could easily get broken upgrades by upgrading to a server with a different locale. A Google search seems to indicate the locale names are case-sensitive so I am thinking the problem is that the user didn't have exact locales, and needs that to use pg_upgrade. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing overhead of frequent table locks
On Tue, May 24, 2011 at 5:07 AM, Noah Misch n...@leadboat.com wrote: This drops the part about only transferring fast-path entries once when a strong_lock_counts cell transitions from zero to one. Right: that's because I don't think that's what we want to do. I don't think we want to transfer all per-backend locks to the shared hash table as soon as anyone attempts to acquire a strong lock; instead, I think we want to transfer only those fast-path locks which have the same locktag as the strong lock someone is attempting to acquire. If we do that, then it doesn't matter whether the strong_lock_counts[] cell is transitioning from 0 to 1 or from 6 to 7: we still have to check for strong locks with that particular locktag. Granted, that itself requires some yet-undiscussed locking. For that matter, we can't have multiple strong lockers completing transfers on the same cell in parallel. Perhaps add a FastPathTransferLock, or an array of per-cell locks, that each strong locker holds for that entire if body and while decrementing the strong_lock_counts cell at lock release. I was imagining that the per-backend LWLock would protect the list of fast-path locks. So to transfer locks, you would acquire the per-backend LWLock for the backend which has the lock, and then the lock manager partition LWLock, and then perform the transfer. As far as the level of detail of this pseudocode goes, there's no need to hold the per-backend LWLock while transferring the fast-path entries. You just need to hold it sometime between bumping strong_lock_counts and transferring the backend's locks. This ensures that, for example, the backend is not sleeping in the middle of a fast-path lock acquisition for the whole duration of this code. See above; I'm lost. Now, a small fly in the ointment is that we haven't got, with PostgreSQL, a portable library of memory primitives. So there isn't an obvious way of doing that sfence/lfence business. I was thinking that, if the final implementation could benefit from memory barrier interfaces, we should create those interfaces now. Start with only a platform-independent dummy implementation that runs a lock/unlock cycle on a spinlock residing in backend-local memory. I'm 75% sure that would be sufficient on all architectures for which we support spinlocks. It may turn out that we can't benefit from such interfaces at this time ... OK. Now, it seems to me that in the strong lock case, the sfence isn't really needed anyway, because we're about to start acquiring and releasing an lwlock for every backend, and that had better act as a full memory barrier anyhow, or we're doomed. The weak lock case is more interesting, because we need the fence before we've taken any LWLock. Agreed. But perhaps it'd be sufficient to just acquire the per-backend lwlock before checking strong_lock_counts[]. If, as we hope, we get back a zero, then we do the fast-path lock acquisition, release the lwlock, and away we go. If we get back any other value, then we've wasted an lwlock acquisition cycle. Or actually maybe not: it seems to me that in that case we'd better transfer all of our fast-path entries into the main hash table before trying to acquire any lock the slow way, at least if we don't want the deadlock detector to have to know about the fast-path. So then we get this: ! if (level ShareUpdateExclusiveLock) ! ++strong_lock_counts[my_strong_lock_count_partition] ! for each backend ! take per-backend lwlock for target backend ! transfer fastpath entries with matching locktag ! release per-backend lwlock for target backend ! else if (level = RowExclusiveLock) ! take per-backend lwlock for own backend ! if (strong_lock_counts[my_strong_lock_count_partition] == 0) ! fast-path lock acquisition ! done = true ! else ! transfer all fastpath entries ! release per-backend lwlock for own backend ! if (!done) ! normal_LockAcquireEx Could you elaborate on the last part (the need for else transfer all fastpath entries) and, specifically, how it aids deadlock avoidance? I didn't think this change would have any impact on deadlocks, because all relevant locks will be in the global lock table before any call to normal_LockAcquireEx. Oh, hmm, maybe you're right. I was concerned about the possibility that of a backend which already holds locks going to sleep on a lock wait, and maybe running the deadlock detector, and failing to notice a deadlock. But I guess that can't happen: if any of the locks it holds are relevant to the deadlock detector, the backend attempting to acquire those locks will transfer them before attempting to acquire the lock itself, so it should be OK. To
Re: [HACKERS] Reducing overhead of frequent table locks
On Tue, May 24, 2011 at 08:53:11AM -0400, Robert Haas wrote: On Tue, May 24, 2011 at 5:07 AM, Noah Misch n...@leadboat.com wrote: This drops the part about only transferring fast-path entries once when a strong_lock_counts cell transitions from zero to one. Right: that's because I don't think that's what we want to do. I don't think we want to transfer all per-backend locks to the shared hash table as soon as anyone attempts to acquire a strong lock; instead, I think we want to transfer only those fast-path locks which have the same locktag as the strong lock someone is attempting to acquire. If we do that, then it doesn't matter whether the strong_lock_counts[] cell is transitioning from 0 to 1 or from 6 to 7: we still have to check for strong locks with that particular locktag. Oh, I see. I was envisioning that you'd transfer all locks associated with the strong_lock_counts cell; that is, all the locks that would now go directly to the global lock table when requested going forward. Transferring only exact matches seems fine too, and then I agree with your other conclusions. Granted, that itself requires some yet-undiscussed locking. ?For that matter, we can't have multiple strong lockers completing transfers on the same cell in parallel. Perhaps add a FastPathTransferLock, or an array of per-cell locks, that each strong locker holds for that entire if body and while decrementing the strong_lock_counts cell at lock release. I was imagining that the per-backend LWLock would protect the list of fast-path locks. So to transfer locks, you would acquire the per-backend LWLock for the backend which has the lock, and then the lock manager partition LWLock, and then perform the transfer. I see later in your description that the transferer will delete each fast-path lock after transferring it. Given that, this does sound adequate. As far as the level of detail of this pseudocode goes, there's no need to hold the per-backend LWLock while transferring the fast-path entries. ?You just need to hold it sometime between bumping strong_lock_counts and transferring the backend's locks. ?This ensures that, for example, the backend is not sleeping in the middle of a fast-path lock acquisition for the whole duration of this code. See above; I'm lost. It wasn't a particularly useful point. To validate the locking at this level of detail, I think we need to sketch the unlock protocol, too. ?On each strong lock release, we'll decrement the strong_lock_counts cell. ?No particular interlock with fast-path lockers should be needed; a stray AccessShareLock needlessly making it into the global lock table is no problem. ?As mentioned above, we _will_ need an interlock with lock transfer operations. ?How will transferred fast-path locks get removed from the global lock table? ?Presumably, the original fast-path locker should do so at transaction end; anything else would contort the life cycle. Then add a way for the backend to know which locks had been transferred as well as an interlock against concurrent transfer operations. ?Maybe that's all. I'm thinking that the backend can note, in its local-lock table, whether it originally acquired a lock via the fast-path or not. Any lock not originally acquired via the fast-path will be released just as now. For any lock that WAS originally acquired via the fast-path, we'll take our own per-backend lwlock, which protects the fast-path queue, and scan the fast-path queue for a matching entry. If none is found, then we know the lock was transferred, so release the per-backend lwlock and do it the regular way (take lock manager partition lock, etc.). Sounds good. To put it another way: the current system is fair; the chance of hitting lock exhaustion is independent of lock level. ?The new system would be unfair; lock exhaustion is much more likely to appear for a ShareUpdateExclusiveLock acquisition, through no fault of that transaction. ?I agree this isn't ideal, but it doesn't look to me like an unacceptable weakness. ?Making lock slots first-come, first-served is inherently unfair; we're not at all set up to justly arbitrate between mutually-hostile lockers competing for slots. ?The overall situation will get better, not worse, for the admin who wishes to defend against hostile unprivileged users attempting a lock table DOS. Well, it's certainly true that the proposed system is far less likely to bomb out trying to acquire an AccessShareLock than what we have today, since in the common case the AccessShareLock doesn't use up any shared resources. And that should make a lot of people happy. But as to the bad scenario, one needn't presume that the lockers are hostile - it may just be that the system is running on the edge of a full lock table. In the worst case, someone wanting a strong lock on a table may end up transferring a hundred or
Re: [HACKERS] Operator families vs. casts
Noah Misch n...@leadboat.com writes: PostgreSQL 9.1 will implement ALTER TABLE ALTER TYPE operations that use a binary coercion cast without rewriting the table or unrelated indexes. It will always rewrite any indexes and recheck any foreign key constraints that depend on a changing column. This is unnecessary for 100% of core binary coercion casts. In my original design[1], I planned to detect this by comparing the operator families of the old and would-be-new indexes. (This still yields some unnecessary rewrites; oid_ops and int4_ops are actually compatible, for example.) No, they aren't: signed and unsigned comparisons do not yield the same sort order. I think that example may destroy the rest of your argument. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Operator families vs. casts
On Tue, May 24, 2011 at 10:10:34AM -0400, Tom Lane wrote: Noah Misch n...@leadboat.com writes: PostgreSQL 9.1 will implement ALTER TABLE ALTER TYPE operations that use a binary coercion cast without rewriting the table or unrelated indexes. It will always rewrite any indexes and recheck any foreign key constraints that depend on a changing column. This is unnecessary for 100% of core binary coercion casts. In my original design[1], I planned to detect this by comparing the operator families of the old and would-be-new indexes. (This still yields some unnecessary rewrites; oid_ops and int4_ops are actually compatible, for example.) No, they aren't: signed and unsigned comparisons do not yield the same sort order. True; scratch the parenthetical comment. I think that example may destroy the rest of your argument. Not that I'm aware of. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing overhead of frequent table locks
On Tue, May 24, 2011 at 10:03 AM, Noah Misch n...@leadboat.com wrote: Let's see if I understand the risk better now: the new system will handle lock load better, but when it does hit a limit, understanding why that happened will be more difficult. Good point. No silver-bullet ideas come to mind for avoiding that. The only idea I can think of is to try to impose some bounds. For example, suppose we track the total number of locks that the system can handle in the shared hash table. We try to maintain the system in a state where the number of locks that actually exist is less than that number, even though some of them may be stored elsewhere. You can imagine a system where backends grab a global mutex to reserve a certain number of slots, and store that in shared memory together with their fast-path list, but another backend which is desperate for space can go through and trim back reservations to actual usage. Will the pg_locks view scan fast-path lock tables? If not, we probably need another view that does. We can then encourage administrators to monitor for fast-path lock counts that get high relative to shared memory capacity. I think pg_locks should probably scan the fast-path tables. Another random idea for optimization: we could have a lock-free array with one entry per backend, indicating whether any fast-path locks are present. Before acquiring its first fast-path lock, a backend writes a 1 into that array and inserts a store fence. After releasing its last fast-path lock, it performs a store fence and writes a 0 into the array. Anyone who needs to grovel through all the per-backend fast-path arrays for whatever reason can perform a load fence and then scan the array. If I understand how this stuff works (and it's very possible that I don't), when the scanning backend sees a 0, it can be assured that the target backend has no fast-path locks and therefore doesn't need to acquire and release that LWLock or scan that fast-path array for entries. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding an example for replication configuration to pg_hba.conf
Magnus Hagander wrote: On Thu, May 19, 2011 at 11:09, Dave Page dp...@pgadmin.org wrote: On Thu, May 19, 2011 at 2:44 PM, Selena Deckelmann sel...@chesnok.com wrote: On Wed, May 18, 2011 at 8:20 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Greg Smith's message of mi? may 18 23:07:13 -0400 2011: Two things that could be changed from this example to make it more useful: -The default database is based on your user name, which is postgres in most packaged builds but not if you compile your own. ?I don't know whether it's practical to consider substituting that into this file, or if it's just enough to mention that as an additional doc comment. You mean the default username, not the default database, but yeah; so do we need a @default_username@ token to be replaced by initdb with whatever it has as effective_user? ?(In this case the patch is no longer 2 lines, but still should be trivial enough). That would be nice. So, we just add that token to initdb? Seems simple. I added some explanation of the all vs replication bit in the header comments. Revision attached. Looks good to me. As I mentioned offlist, I'd like it in teal please. Applied with some further minor bikeshedding (remove trailing spaces, rewrap so columns aren't wider than 80 chars, etc) Let me just point out that people who have already run initdb during beta will not see this in their pg_hba.conf, nor in their share/pg_hba.conf.sample, even after they have upgraded to a later beta, unless they run initdb. However, we have bumped the catalog version for something else so they should then get this change. My point is if we change configuration files and then don't bump the catalog version, the share/*.sample files get out of sync with the files in /data, which can be kind of confusing. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] moving toast table to its own tablespace
Robert Haas wrote: On Thu, May 19, 2011 at 3:17 PM, Alvaro Herrera alvhe...@alvh.no-ip.org wrote: Is there a reason we don't allow moving the toast table to a separate tablespace, other than unimplemented feature? ?If not, I propose such a syntax as ALTER TABLE foo SET TOAST TABLESPACE bar; Off the top of my head, I don't see any reason not to allow that. Added to TODO: Allow toast tables to be moved to a different tablespace * http://archives.postgresql.org/pgsql-hackers/2011-05/msg00980.php -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding an example for replication configuration to pg_hba.conf
On Tue, May 24, 2011 at 10:53, Bruce Momjian br...@momjian.us wrote: Magnus Hagander wrote: On Thu, May 19, 2011 at 11:09, Dave Page dp...@pgadmin.org wrote: On Thu, May 19, 2011 at 2:44 PM, Selena Deckelmann sel...@chesnok.com wrote: On Wed, May 18, 2011 at 8:20 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Greg Smith's message of mi? may 18 23:07:13 -0400 2011: Two things that could be changed from this example to make it more useful: -The default database is based on your user name, which is postgres in most packaged builds but not if you compile your own. ?I don't know whether it's practical to consider substituting that into this file, or if it's just enough to mention that as an additional doc comment. You mean the default username, not the default database, but yeah; so do we need a @default_username@ token to be replaced by initdb with whatever it has as effective_user? ?(In this case the patch is no longer 2 lines, but still should be trivial enough). That would be nice. So, we just add that token to initdb? Seems simple. I added some explanation of the all vs replication bit in the header comments. Revision attached. Looks good to me. As I mentioned offlist, I'd like it in teal please. Applied with some further minor bikeshedding (remove trailing spaces, rewrap so columns aren't wider than 80 chars, etc) Let me just point out that people who have already run initdb during beta will not see this in their pg_hba.conf, nor in their share/pg_hba.conf.sample, even after they have upgraded to a later beta, unless they run initdb. However, we have bumped the catalog version for something else so they should then get this change. Why would they not see it in their share/pg_hba.conf.sample? It will not affect the existing one in $PGDATA, but why wouldn't the installed .sample change? My point is if we change configuration files and then don't bump the catalog version, the share/*.sample files get out of sync with the files in /data, which can be kind of confusing. They would - but what you are saying above is that they would not get out of sync, because the share/*.sample also don't update. Just a mistake in what you said above, or am I missing something? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pull up aggregate subquery
Robert Haas robertmh...@gmail.com writes: On Mon, May 23, 2011 at 4:02 PM, Tom Lane t...@sss.pgh.pa.us wrote: Yeah. For simple scan/join queries it seems likely that we only care about parameterizing indexscans, since the main opportunity for a win is to not scan all of a large table. Restricting things that way would help reduce the number of extra Paths to carry around. But I'm not sure whether the same argument can be made for arbitrary subqueries. I must be misunderstanding you, because index scans are the thing we already *do* parameterize; and what else would make any sense? The point I was trying to make is that the ultimate reason for having a parameterized portion-of-a-plan will be that there's a parameterized indexscan somewhere down at the bottom. I had originally imagined that we might parameterize any old scan; for example consider replacing Nestloop Join Filter: a.x = b.y - Seqscan on a - Seqscan on b with Nestloop - Seqscan on a - Seqscan on b Filter: b.y = a.x Although this isn't nearly as useful as if we could be using an index on b.y, there would still be some marginal gain to be had, because we'd be able to reject rows of b without first passing them up to the join node. But I'm afraid that going all-out like that would slow the planner down far too much (too many Paths to consider) to be justified by a marginal runtime gain. So the idea I have at the moment is that we'll still only parameterize indexscans, but then allow those to be joined to unrelated relations while still remaining parameterized. That should reduce the number of parameterized Paths hanging around, because only joinclauses that match indexes will give rise to such Paths. But I think this is all fairly unrelated to the case that Hitoshi is on about. As you said earlier, it seems like we'd have to derive both parameterized and unparameterized plans for the subquery, which seems mighty expensive. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2 schedule
On Mon, May 23, 2011 at 10:44:20PM -0400, Greg Smith wrote: At the developer meeting last week: http://wiki.postgresql.org/wiki/PgCon_2011_Developer_Meeting there was an initial schedule for 9.2 hammered out and dutifully transcribed at http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Development_Plan , and the one part I wasn't sure I had written down correctly I see Robert already fixed. There was a suggestion to add some publicity around the schedule for this release. Already started. :) http://www.postgresql.org/community/weeklynews/pwn20110522 There's useful PR value to making it more obvious to people that the main development plan is regular and time-based, even if the release date itself isn't fixed. The right time to make an initial announcement like that is soon, particularly if a goal here is to get more submitted into the first 9.2 CF coming in only a few weeks. Anyone have changes to suggest before this starts working its way toward an announcement? I thought we'd agreed on the timing for the first CF, and that I was to announce it in the PostgreSQL Weekly News, so I did just that. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing overhead of frequent table locks
On Tue, May 24, 2011 at 10:35:23AM -0400, Robert Haas wrote: On Tue, May 24, 2011 at 10:03 AM, Noah Misch n...@leadboat.com wrote: Let's see if I understand the risk better now: the new system will handle lock load better, but when it does hit a limit, understanding why that happened will be more difficult. ?Good point. ?No silver-bullet ideas come to mind for avoiding that. The only idea I can think of is to try to impose some bounds. For example, suppose we track the total number of locks that the system can handle in the shared hash table. We try to maintain the system in a state where the number of locks that actually exist is less than that number, even though some of them may be stored elsewhere. You can imagine a system where backends grab a global mutex to reserve a certain number of slots, and store that in shared memory together with their fast-path list, but another backend which is desperate for space can go through and trim back reservations to actual usage. Forcing artificial resource exhaustion is a high price to pay. I suppose it's quite like disabling Linux memory overcommit, and some folks would like it. Another random idea for optimization: we could have a lock-free array with one entry per backend, indicating whether any fast-path locks are present. Before acquiring its first fast-path lock, a backend writes a 1 into that array and inserts a store fence. After releasing its last fast-path lock, it performs a store fence and writes a 0 into the array. Anyone who needs to grovel through all the per-backend fast-path arrays for whatever reason can perform a load fence and then scan the array. If I understand how this stuff works (and it's very possible that I don't), when the scanning backend sees a 0, it can be assured that the target backend has no fast-path locks and therefore doesn't need to acquire and release that LWLock or scan that fast-path array for entries. I'm probably just missing something, but can't that conclusion become obsolete arbitrarily quickly? What if the scanning backend sees a 0, and the subject backend is currently sleeping just before it would have bumped that value? We need to take the LWLock is there's any chance that the subject backend has not yet seen the scanning backend's strong_lock_counts[] update. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pull up aggregate subquery
On Tue, May 24, 2011 at 11:11 AM, Tom Lane t...@sss.pgh.pa.us wrote: I must be misunderstanding you, because index scans are the thing we already *do* parameterize; and what else would make any sense? The point I was trying to make is that the ultimate reason for having a parameterized portion-of-a-plan will be that there's a parameterized indexscan somewhere down at the bottom. I had originally imagined that we might parameterize any old scan; for example consider replacing Nestloop Join Filter: a.x = b.y - Seqscan on a - Seqscan on b with Nestloop - Seqscan on a - Seqscan on b Filter: b.y = a.x Oh, I see. I have a general gripe with nested loop plans: we already consider too many of them. IIRC, when I last fooled around with this, the number of nested loop paths that we generate far exceeded the number of merge or hash join paths, and most of those paths suck and are a complete waste of time. It strikes me that we ought to be trying to find ways to get rid of some of the paths we're already considering, rather than adding any more. In this particular case, if the second plan is actually faster, it probably won't be by much; we could think about trying to make some kind of ex-post-facto transformation instead of throwing everything into the path machinery. Although this isn't nearly as useful as if we could be using an index on b.y, there would still be some marginal gain to be had, because we'd be able to reject rows of b without first passing them up to the join node. But I'm afraid that going all-out like that would slow the planner down far too much (too many Paths to consider) to be justified by a marginal runtime gain. Agreed. So the idea I have at the moment is that we'll still only parameterize indexscans, but then allow those to be joined to unrelated relations while still remaining parameterized. That should reduce the number of parameterized Paths hanging around, because only joinclauses that match indexes will give rise to such Paths. That seems fine, yeah. If anything, we might want to limit it even more, but certainly that's a good place to start, and see how it shakes out. But I think this is all fairly unrelated to the case that Hitoshi is on about. As you said earlier, it seems like we'd have to derive both parameterized and unparameterized plans for the subquery, which seems mighty expensive. That was my first thought, too, but then I wondered if I was getting cheap. Most of the time, the subquery will be something simple, and replanning it twice won't really matter much. If it happens to be something complicated, then it will take longer, but on the other hand that's exactly the sort of byzantine query where you probably want the planner to pull out all the stops. Aggregates tend to feel slow almost invariably, because the amount of data being processed under the hood is much larger than what actually gets emitted, so I think we should at least consider the possibility that users really won't care about a bit of extra work. The case I'm concerned about is where you have several levels of nested aggregates, and the effect starts to multiply. But even if that turns out to be a problem, we could handle it by limiting consideration of the alternate path to the top 1 or 2 levels and handle the rest as we do now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cannot build docs of 9.1 on Windows
On 05/19/2011 06:29 PM, MauMau wrote: From: Andrew Dunstan and...@dunslane.net On Thu, May 19, 2011 10:32 am, Robert Haas wrote: 2011/5/16 MauMau maumau...@gmail.com: Can't open perl script make-errcodes-table.pl: No such file or directory I think this is the root of the problem. We have no script called make-errcodes-table.pl. Can you try changing it to generate-errcodes-table.pl and see if that works? Building docs under Windows in the buildfarm is on my TODO list. We already support it (as of a few weeks ago) for non-Windows build systems. That will help us make sure we don't have this kind of drift. Thank you. I could remove the error Can't open perl script make-errcodes-table.pl: N... by changing make-errcodes-table.pl to generate-errcodes-table.pl, but all other results seems to be same as before. Andrew, could you announce the commit when you have successfully built docs on Windows? Can I know that fact by watching pgsql-hackers and pgsql-docs? I'll git-fetch the patch. builddoc.bat failed on my system and reading it made my head hurt. So I did what I've done with other bat files and rewrote it in Perl. The result is attached. It works for me, and should be a dropin replacement. Just put it in the src/tools/msvc directory and run perl builddoc.pl. Please test it and if it works for you we'll use it and make builddoc.bat a thin wrapper like build.bat and vcregress.bat. cheers andrew builddoc.pl Description: Perl program -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing overhead of frequent table locks
On Tue, May 24, 2011 at 11:38 AM, Noah Misch n...@leadboat.com wrote: Another random idea for optimization: we could have a lock-free array with one entry per backend, indicating whether any fast-path locks are present. Before acquiring its first fast-path lock, a backend writes a 1 into that array and inserts a store fence. After releasing its last fast-path lock, it performs a store fence and writes a 0 into the array. Anyone who needs to grovel through all the per-backend fast-path arrays for whatever reason can perform a load fence and then scan the array. If I understand how this stuff works (and it's very possible that I don't), when the scanning backend sees a 0, it can be assured that the target backend has no fast-path locks and therefore doesn't need to acquire and release that LWLock or scan that fast-path array for entries. I'm probably just missing something, but can't that conclusion become obsolete arbitrarily quickly? What if the scanning backend sees a 0, and the subject backend is currently sleeping just before it would have bumped that value? We need to take the LWLock is there's any chance that the subject backend has not yet seen the scanning backend's strong_lock_counts[] update. Can't we bump strong_lock_counts[] *first*, make sure that change is globally visible, and only then start scanning the array? Once we've bumped strong_lock_counts[] and made sure everyone can see that change, it's still possible for backends to take a fast-path lock in some *other* fast-path partition, but nobody should be able to add any more fast-path locks in the partition we care about after that point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2 schedule
On Tue, May 24, 2011 at 11:33 AM, David Fetter da...@fetter.org wrote: On Mon, May 23, 2011 at 10:44:20PM -0400, Greg Smith wrote: At the developer meeting last week: http://wiki.postgresql.org/wiki/PgCon_2011_Developer_Meeting there was an initial schedule for 9.2 hammered out and dutifully transcribed at http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Development_Plan , and the one part I wasn't sure I had written down correctly I see Robert already fixed. There was a suggestion to add some publicity around the schedule for this release. Already started. :) http://www.postgresql.org/community/weeklynews/pwn20110522 There's useful PR value to making it more obvious to people that the main development plan is regular and time-based, even if the release date itself isn't fixed. The right time to make an initial announcement like that is soon, particularly if a goal here is to get more submitted into the first 9.2 CF coming in only a few weeks. Anyone have changes to suggest before this starts working its way toward an announcement? I thought we'd agreed on the timing for the first CF, and that I was to announce it in the PostgreSQL Weekly News, so I did just that. We talked about doing a separate -announce post just for this item, and there seemed to be some support for that. I'm OK with either way, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Domains versus polymorphic functions, redux
In http://archives.postgresql.org/pgsql-bugs/2011-05/msg00171.php Regina Obe complains that this fails in 9.1, though it worked before: regression=# CREATE DOMAIN topoelementarray AS integer[]; CREATE DOMAIN regression=# SELECT array_upper(ARRAY[[1,2], [3,4]]::topoelementarray, 1); ERROR: function array_upper(topoelementarray, integer) does not exist This is a consequence of the changes I made to fix bug #5717, particularly the issues around ANYARRAY matching discussed here: http://archives.postgresql.org/pgsql-hackers/2010-10/msg01545.php Regina is the second or third beta tester to complain about domains over arrays no longer matching ANYARRAY, so I think we'd better do something about it. I haven't tried to code anything up yet, but the ideas I'm considering trying to implement go like this: 1. If a domain type is passed to an ANYARRAY argument, automatically downcast it to its base type (which of course had better then be an array). This would include inserting an implicit cast into the expression tree, so that if the function uses get_fn_expr_argtype or similar, it would see the base type. Also, if the function returns ANYARRAY, its result is considered to be of the base type not the domain. 2. If a domain type is passed to an ANYELEMENT argument, automatically downcast it to its base type if there is any ANYARRAY argument, or if the function result type is ANYARRAY, or if any other ANYELEMENT argument is not of the same domain type. The first two cases are necessary since we don't have arrays of domains: the match is guaranteed to fail if we don't do this, since there can be no matching array type for the domain. The third case is meant to handle cases like function(domain-over-int, 42) where the function has two ANYELEMENT arguments: we now fail, but reducing the domain to int would allow success. An alternative rule we could use in place of #2 is just smash domains to base types always, when they're matched to ANYELEMENT. That would be simpler and more in keeping with #1, but it might change the behavior in cases where the historical behavior is reasonable (unlike the cases discussed in my message referenced above...) I find this simpler rule tempting from an implementor's standpoint, but am unsure if there'll be complaints. Comments, better ideas? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2 schedule
On Tue, May 24, 2011 at 11:54:19AM -0400, Robert Haas wrote: On Tue, May 24, 2011 at 11:33 AM, David Fetter da...@fetter.org wrote: On Mon, May 23, 2011 at 10:44:20PM -0400, Greg Smith wrote: At the developer meeting last week: http://wiki.postgresql.org/wiki/PgCon_2011_Developer_Meeting there was an initial schedule for 9.2 hammered out and dutifully transcribed at http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Development_Plan , and the one part I wasn't sure I had written down correctly I see Robert already fixed. There was a suggestion to add some publicity around the schedule for this release. Already started. :) http://www.postgresql.org/community/weeklynews/pwn20110522 There's useful PR value to making it more obvious to people that the main development plan is regular and time-based, even if the release date itself isn't fixed. The right time to make an initial announcement like that is soon, particularly if a goal here is to get more submitted into the first 9.2 CF coming in only a few weeks. Anyone have changes to suggest before this starts working its way toward an announcement? I thought we'd agreed on the timing for the first CF, and that I was to announce it in the PostgreSQL Weekly News, so I did just that. We talked about doing a separate -announce post just for this item, and there seemed to be some support for that. I'm OK with either way, though. For what it's worth, I think there should also be a separate -announce (and -general, and -hackers) post for the item. This is about getting the message out early and broadly so people have the best chance of getting it in time to act on it. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pull up aggregate subquery
Robert Haas robertmh...@gmail.com writes: On Tue, May 24, 2011 at 11:11 AM, Tom Lane t...@sss.pgh.pa.us wrote: The point I was trying to make is that the ultimate reason for having a parameterized portion-of-a-plan will be that there's a parameterized indexscan somewhere down at the bottom. Oh, I see. I have a general gripe with nested loop plans: we already consider too many of them. IIRC, when I last fooled around with this, the number of nested loop paths that we generate far exceeded the number of merge or hash join paths, and most of those paths suck and are a complete waste of time. Hm, really? My experience is that it's the mergejoin paths that breed like rabbits, because there are so many potential sort orders. But I think this is all fairly unrelated to the case that Hitoshi is on about. As you said earlier, it seems like we'd have to derive both parameterized and unparameterized plans for the subquery, which seems mighty expensive. That was my first thought, too, but then I wondered if I was getting cheap. Yeah, it's certainly possible that we're worrying too much. Usually I only get concerned about added planner logic if it will impact the planning time for simple queries. Simple tends to be in the eye of the beholder, but something with a complicated aggregate subquery is probably not simple by anyone's definition. In this case the sticky point is that there could be multiple possible sets of clauses available to be pushed down, depending on what you assume is the outer relation for the eventual upper-level nestloop. So worst case, you could have not just one parameterized plan to generate in addition to the regular kind, but 2^N of them ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing overhead of frequent table locks
On Tue, May 24, 2011 at 11:52:54AM -0400, Robert Haas wrote: On Tue, May 24, 2011 at 11:38 AM, Noah Misch n...@leadboat.com wrote: Another random idea for optimization: we could have a lock-free array with one entry per backend, indicating whether any fast-path locks are present. ?Before acquiring its first fast-path lock, a backend writes a 1 into that array and inserts a store fence. ?After releasing its last fast-path lock, it performs a store fence and writes a 0 into the array. ?Anyone who needs to grovel through all the per-backend fast-path arrays for whatever reason can perform a load fence and then scan the array. ?If I understand how this stuff works (and it's very possible that I don't), when the scanning backend sees a 0, it can be assured that the target backend has no fast-path locks and therefore doesn't need to acquire and release that LWLock or scan that fast-path array for entries. I'm probably just missing something, but can't that conclusion become obsolete arbitrarily quickly? ?What if the scanning backend sees a 0, and the subject backend is currently sleeping just before it would have bumped that value? ?We need to take the LWLock is there's any chance that the subject backend has not yet seen the scanning backend's strong_lock_counts[] update. Can't we bump strong_lock_counts[] *first*, make sure that change is globally visible, and only then start scanning the array? Once we've bumped strong_lock_counts[] and made sure everyone can see that change, it's still possible for backends to take a fast-path lock in some *other* fast-path partition, but nobody should be able to add any more fast-path locks in the partition we care about after that point. There's a potentially-unbounded delay between when the subject backend reads strong_lock_counts[] and when it sets its fast-path-used flag. (I didn't mean not yet seen in the sense that some memory load would not show the latest value. I just meant that the subject backend may still be taking relevant actions based on its previous load of the value.) We could have the subject set its fast-path-used flag before even checking strong_lock_counts[], then clear the flag when strong_lock_counts[] dissuaded it from proceeding. Maybe that's what you had in mind? That being said, it's a slight extra cost for all fast-path lockers to benefit the strong lockers, so I'm not prepared to guess whether it will pay off. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Domains versus polymorphic functions, redux
On Tue, May 24, 2011 at 11:12 AM, Tom Lane t...@sss.pgh.pa.us wrote: 1. If a domain type is passed to an ANYARRAY argument, automatically downcast it to its base type (which of course had better then be an array). This would include inserting an implicit cast into the expression tree, so that if the function uses get_fn_expr_argtype or similar, it would see the base type. Also, if the function returns ANYARRAY, its result is considered to be of the base type not the domain. Does that mean that plpgsql %type variable declarations will see the base type (and miss any constraint checks?). I think it's fine either way, but that's worth noting. An alternative rule we could use in place of #2 is just smash domains to base types always, when they're matched to ANYELEMENT. That would be simpler and more in keeping with #1, but it might change the behavior in cases where the historical behavior is reasonable (unlike the cases discussed in my message referenced above...) I find this simpler rule tempting from an implementor's standpoint, but am unsure if there'll be complaints. #2a seems cleaner to me (superficially). Got an example of a behavior you think is changed? In particular, is there a way the new function would fail where it used to not fail? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Domains versus polymorphic functions, redux
On May 24, 2011, at 9:12 AM, Tom Lane wrote: An alternative rule we could use in place of #2 is just smash domains to base types always, when they're matched to ANYELEMENT. That would be simpler and more in keeping with #1, but it might change the behavior in cases where the historical behavior is reasonable (unlike the cases discussed in my message referenced above...) I find this simpler rule tempting from an implementor's standpoint, but am unsure if there'll be complaints. I'm not sure where the historical behavior manifests, but this certainly seems like it might be the most consistent implementation, as well. Which option is least likely to violate the principal of surprise? Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2 schedule
David Fetter wrote: I thought we'd agreed on the timing for the first CF, and that I was to announce it in the PostgreSQL Weekly News, so I did just that. Yes, and excellent. The other ideas were: -Publish information about the full schedule to some of the more popular mailing lists -Link to this page more obviously from postgresql.org (fixed redirect URL is probably the right approach) to bless it, and potentially improve its search rank too. The specific new problem being highlighted to work on here is that the schedule and development process is actually quite good as open-source projects go, but that fact isn't visible at all unless you're already on the inside of the project. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Cascade replication (WIP)
Hi, I'd like to propose cascade replication feature (i.e., allow the standby to accept replication connection from another standby) for 9.2. This feature is useful to reduce the overhead of the master since by using that we can decrease the number of standbys directly connecting to the master. I attached the WIP patch, which changes walsender so that it starts replication even during recovery. Then, the walsender attempts to send all WAL that's already been fsync'd to the standby's disk (i.e., send WAL up to the bigger location between the receive location and the replay one). When the standby is promoted, all walsenders in that standby end because they cannot continue replication any more in that case because of the timeline mismatch. The standby must not accept replication connection from that standby itself. Otherwise, since any new WAL data would not appear in that standby, replication cannot advance any more. As a safeguard against this, I introduced new ID to identify each instance. The walsender sends that ID as the fourth field of the reply of IDENTIFY_SYSTEM, and then walreceiver checks whether the IDs are the same between two servers. If they are the same, which means that the standby is just connecting to that standby itself, so walreceiver emits ERROR. One remaining problem which I'll have to tackle is that: Even while walreceiver is not in progress (i.e., the startup process is retrieving WAL file from the archive), the cascading walsender should continuously send new WAL data. This means that the walsender should send the WAL file restored from the archive. The problem is that the name of such a restored WAL file is always RECOVERYXLOG. For now, walsender cannot handle the WAL file with such a name. To address the above problem, I'm thinking to make the startup process restore the WAL file with its real name instead of RECOVERYXLOG. Then, like in the master, the walsender can read and send the restored WAL file. The required WAL file can be recycled before being sent. So we might need to enable wal_keep_segments setting even in the standby. Comments? Objections? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center *** a/doc/src/sgml/protocol.sgml --- b/doc/src/sgml/protocol.sgml *** *** 1357,1362 The commands accepted in walsender mode are: --- 1357,1374 /listitem /varlistentry + varlistentry + term +identificationkey + /term + listitem + para +Identification key. Also useful to check that the standby is +not connecting to that standby itself. + /para + /listitem + /varlistentry + /variablelist /para /listitem *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *** *** 9551,9556 GetXLogReplayRecPtr(void) --- 9551,9572 } /* + * Get current standby flush position, ie, the last WAL position + * known to be fsync'd to disk in standby. + */ + XLogRecPtr + GetStandbyFlushRecPtr(void) + { + XLogRecPtr recvptr; + XLogRecPtr redoptr; + + recvptr = GetWalRcvWriteRecPtr(NULL); + redoptr = GetXLogReplayRecPtr(); + + return XLByteLT(recvptr, redoptr) ? redoptr : recvptr; + } + + /* * Report the last WAL replay location (same format as pg_start_backup etc) * * This is useful for determining how much of WAL is visible to read-only *** a/src/backend/postmaster/postmaster.c --- b/src/backend/postmaster/postmaster.c *** *** 351,357 static void processCancelRequest(Port *port, void *pkt); static int initMasks(fd_set *rmask); static void report_fork_failure_to_client(Port *port, int errnum); static CAC_state canAcceptConnections(void); - static long PostmasterRandom(void); static void RandomSalt(char *md5Salt); static void signal_child(pid_t pid, int signal); static bool SignalSomeChildren(int signal, int targets); --- 351,356 *** *** 2410,2415 reaper(SIGNAL_ARGS) --- 2409,2423 pmState = PM_RUN; /* + * Kill the cascading walsender to urge the cascaded standby to + * reread the timeline history file, adjust its timeline and + * establish replication connection again. This is required + * because the timeline of cascading standby is not consistent + * with that of cascaded one just after failover. + */ + SignalSomeChildren(SIGUSR2, BACKEND_TYPE_WALSND); + + /* * Crank up the background writer, if we didn't do that already * when we entered consistent recovery state. It doesn't matter * if this fails, we'll just try again later. *** *** 4369,4375 RandomSalt(char *md5Salt) /* * PostmasterRandom */ ! static long PostmasterRandom(void) { /* --- 4377,4383 /* * PostmasterRandom */ ! long PostmasterRandom(void) { /* *** a/src/backend/replication/basebackup.c ---
Re: [HACKERS] Domains versus polymorphic functions, redux
On Tue, May 24, 2011 at 12:12:55PM -0400, Tom Lane wrote: In http://archives.postgresql.org/pgsql-bugs/2011-05/msg00171.php Regina Obe complains that this fails in 9.1, though it worked before: regression=# CREATE DOMAIN topoelementarray AS integer[]; CREATE DOMAIN regression=# SELECT array_upper(ARRAY[[1,2], [3,4]]::topoelementarray, 1); ERROR: function array_upper(topoelementarray, integer) does not exist This is a consequence of the changes I made to fix bug #5717, particularly the issues around ANYARRAY matching discussed here: http://archives.postgresql.org/pgsql-hackers/2010-10/msg01545.php Regina is the second or third beta tester to complain about domains over arrays no longer matching ANYARRAY, so I think we'd better do something about it. I haven't tried to code anything up yet, but the ideas I'm considering trying to implement go like this: 1. If a domain type is passed to an ANYARRAY argument, automatically downcast it to its base type (which of course had better then be an array). This would include inserting an implicit cast into the expression tree, so that if the function uses get_fn_expr_argtype or similar, it would see the base type. Also, if the function returns ANYARRAY, its result is considered to be of the base type not the domain. We discussed this a few weeks ago: http://archives.postgresql.org/message-id/20110511093217.gb26...@tornado.gateway.2wire.net What's to recommend #1 over what I proposed then? Seems like a discard of functionality for little benefit. 2. If a domain type is passed to an ANYELEMENT argument, automatically downcast it to its base type if there is any ANYARRAY argument, or if the function result type is ANYARRAY, or if any other ANYELEMENT argument is not of the same domain type. The first two cases are necessary since we don't have arrays of domains: the match is guaranteed to fail if we don't do this, since there can be no matching array type for the domain. The third case is meant to handle cases like function(domain-over-int, 42) where the function has two ANYELEMENT arguments: we now fail, but reducing the domain to int would allow success. This seems generally consistent with other function-resolution rules around domains. On the other hand, existing users have supposedly coped by adding an explicit cast to one or the other argument to get the behavior they want. New applications will quietly get the cast, as it were, on the domain argument(s). I hesitate to say this is so clearly right as to warrant that change. Even if it is right, though, this smells like 9.2 material. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Domains versus polymorphic functions, redux
David E. Wheeler da...@kineticode.com writes: On May 24, 2011, at 9:12 AM, Tom Lane wrote: An alternative rule we could use in place of #2 is just smash domains to base types always, when they're matched to ANYELEMENT. That would be simpler and more in keeping with #1, but it might change the behavior in cases where the historical behavior is reasonable (unlike the cases discussed in my message referenced above...) I find this simpler rule tempting from an implementor's standpoint, but am unsure if there'll be complaints. I'm not sure where the historical behavior manifests, but this certainly seems like it might be the most consistent implementation, as well. Which option is least likely to violate the principal of surprise? Well, the basic issue here is what happens when a function like create function noop(anyelement) returns anyelement ... is applied to a domain argument. Currently, the result is thought to be of the domain type, whereas if we smash to base unconditionally, the result will be thought to be of the domain's base type. You can make an argument for either behavior, but I think the argument for the current behavior hinges on the assumption that such a function isn't doing anything to the argument value, only passing it through as-is. I should probably also point out the previous discussion of this area from a couple weeks ago, notably here: http://archives.postgresql.org/pgsql-hackers/2011-05/msg00640.php The example I gave there seems relevant: create function negate(anyelement) returns anyelement as $$ select - $1 $$ language sql; create domain pos as int check (value 0); select negate(42::pos); This example function isn't quite silly --- it will work on any datatype having a unary '-' operator, and you could imagine someone wanting to do something roughly like this in more realistic cases. But if you want to assume that the function returns pos when handed pos, you'd better be prepared to insert a CastToDomain node to recheck the domain constraint. Right now the SQL-function code doesn't support such cases: regression=# select negate(42::pos); ERROR: return type mismatch in function declared to return pos DETAIL: Actual return type is integer. CONTEXT: SQL function negate during inlining If we smashed to base type then this issue would go away. On the other hand it feels like we'd be taking yet another step away from allowing domains to be usefully used in function declarations. I can't put my finger on any concrete consequence of that sort, since what we're talking about here is ANYELEMENT/ANYARRAY functions not functions declared to take domains --- but it sure seems like this would put domains even further away from the status of first-class citizenship in the type system. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Domains versus polymorphic functions, redux
Merlin Moncure mmonc...@gmail.com writes: On Tue, May 24, 2011 at 11:12 AM, Tom Lane t...@sss.pgh.pa.us wrote: 1. If a domain type is passed to an ANYARRAY argument, automatically downcast it to its base type (which of course had better then be an array). Does that mean that plpgsql %type variable declarations will see the base type (and miss any constraint checks?). No, this has nothing to do with %type. What's at stake is matching to functions/operators that are declared to take ANYARRAY. #2a seems cleaner to me (superficially). Got an example of a behavior you think is changed? See my response to David Wheeler. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] errno not set in case of libm functions (HPUX)
I have found a problem which is specifically related to HP-UX compiler. All 'libm' functions on HP-UX Integrity server do not set errno by default. For 'errno' setting we should compile the code using +Olibmerrno option. So we should add this option in /src/makefiles/Makefile.hpux. Otherwise we cannot expect this code to work properly [float.c] Datum dacos(PG_FUNCTION_ARGS) { ... errno = 0; result = acos(arg1); if (errno != 0) ereport(ERROR, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg(input is out of range))); ... } Because acos function will not set the errono in case of invalid input, so check will not trigger the error message. I have attached a patch to add this option in HPUX makefile. BTW I have found same kind of discussion without any conclusion here http://archives.postgresql.org/pgsql-hackers/2011-05/msg00046.php -- Ibrar Ahmed diff --git a/src/makefiles/Makefile.hpux b/src/makefiles/Makefile.hpux index 1917d61..f2a8f19 100644 --- a/src/makefiles/Makefile.hpux +++ b/src/makefiles/Makefile.hpux @@ -43,6 +43,12 @@ else CFLAGS_SL = +Z endif + +# HP-UX libm functions on 'Integrity server' do not set errno by default, +# for errno setting, compile with the +Olibmerrno option. + +CFLAGS := +Olibmerrno $(CFLAGS) + # Rule for building a shared library from a single .o file %$(DLSUFFIX): %.o ifeq ($(GCC), yes) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Domains versus polymorphic functions, redux
Noah Misch n...@leadboat.com writes: On Tue, May 24, 2011 at 12:12:55PM -0400, Tom Lane wrote: This is a consequence of the changes I made to fix bug #5717, particularly the issues around ANYARRAY matching discussed here: http://archives.postgresql.org/pgsql-hackers/2010-10/msg01545.php We discussed this a few weeks ago: http://archives.postgresql.org/message-id/20110511093217.gb26...@tornado.gateway.2wire.net What's to recommend #1 over what I proposed then? Seems like a discard of functionality for little benefit. I am unwilling to commit to making #2 work, especially not under time constraints; and you apparently aren't either, since you haven't produced the patch you alluded to at the end of that thread. Even if you had, though, I'd have no confidence that all holes of the sort had been closed. What you're proposing is to ratchet up the implementation requirements for every PL and and every C function declared to accept polymorphic types, and there are a lot of members of both classes that we don't control. I hesitate to say this is so clearly right as to warrant that change. Even if it is right, though, this smells like 9.2 material. Well, I'd been hoping to leave it for later too, but it seems like we have to do something about the ANYARRAY case for 9.1. Making ANYARRAY's response to domains significantly inconsistent with ANYELEMENT's response doesn't seem like a good plan. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Another attempt at vacuum improvements
So, first of all, thanks for putting some effort and thought into this. Despite the large number of improvements in this area in 8.3 and 8.4, this is still a pain point, and it would be really nice to find a way to make some further improvements. On Tue, May 24, 2011 at 2:58 AM, Pavan Deolasee pavan.deola...@gmail.com wrote: So the idea is to separate the index vacuum (removing index pointers to dead tuples) from the heap vacuum. When we do heap vacuum (either by HOT-pruning or using regular vacuum), we can spool the dead line pointers somewhere. To avoid any hot-spots during normal processing, the spooling can be done periodically like the stats collection. What happens if the system crashes after a line pointer becomes dead but before the record of its death is safely on disk? The fact that a previous index vacuum has committed is only sufficient justification for reclaiming the dead line pointers if you're positive that the index vacuum killed the index pointers for *every* dead line pointer. I'm not sure we want to go there; any operation that wants to make a line pointer dead will need to be XLOG'd. Instead, I think we should stick with your original idea and just try to avoid the second heap pass. So to do that, as you say, we can have every operation that creates a dead line pointer note the LSN of the operation in the page. But instead of allocating permanent space in the page header, which would both reduce (admittedly only by 8 bytes) the amount of space available for tuples, and more significantly have the effect of breaking on-disk compatibility, I'm wondering if we could get by with making space for that extra LSN only when it's actually present. In other words, when it's present, we set a bit PD_HAS_DEAD_LINE_PTR_LSN or somesuch, increment pd_upper, and use the extra space to store the LSN. There is an alignment problem to worry about there but that shouldn't be a huge issue. When we vacuum, we remember the LSN before we start. When we finish, if we scanned the indexes and everything completed without error, then we bump the heap's notion (wherever we store it) of the last successful index vacuum. When we vacuum or do HOT cleanup on a page, if the page has a most-recent-dead-line pointer LSN and it precedes the start-of-last-successful-index-vacuum LSN, then we mark all the LP_DEAD tuples as LP_UNUSED and throw away the most-recent-dead-line-pointer LSN. One downside of this approach is that, if we do something like this, it'll become slightly more complicated to figure out where the item pointer array ends. Another issue is that we might find ourselves wanting to extend the item pointer array to add a new item, and unable to do so easily because this most-recent-dead-line-pointer LSN is in the way. If the LSN stored in the page precedes the start-of-last-successful-index-vacuum LSN, and if, further, we can get a buffer cleanup lock on the page, then we can do a HOT cleanup and life is good. Otherwise, we can either (1) just forget about the most-recent-dead-line-pointer LSN - not ideal but not catastrophic either - or (2) if the start-of-last-successful-vacuum-LSN is old enough, we could overwrite an LP_DEAD line pointer in place. Another issue is that this causes problems for temporary and unlogged tables, because no WAL records are generated and, therefore, the LSN does not advance. This is also a problem for GIST indexes; Heikki fixed temporary GIST indexes by generating fake LSNs off of a backend-local counter. Unlogged GIST indexes are currently not supported. I think what we need to do is create an API to which you can pass a relation and get an LSN. If it's a permanent relation, you get a regular LSN. If it's a temporary relation, you get a fake LSN based on a backend-local counter. If it's an unlogged relation, you get a fake LSN based on a shared-memory counter that is reset on restart. If we can encapsulate that properly, it should provide both what we need to make this idea work and allow a somewhat graceful fix for GIST-vs-unlogged problem. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Error compiling sepgsql in PG9.1
The attached patch enables to abort configure script when we run it with '--with-selinux' option, but libselinux is older than minimum requirement to SE-PostgreSQL. As the documentation said, it needs libselinux-2.0.93 at least, because this or later version support selabel_lookup(3) for database object classes; used to initial labeling. The current configure script checks existence of libselinux, but no version checks. (getpeercon_raw(3) has been a supported API for a long term.) The selinux_sepgsql_context_path(3) is a good watermark of libselinux-2.0.93 instead. Thanks, -- NEC Europe Ltd, SAP Global Competence Center KaiGai Kohei kohei.kai...@emea.nec.com -Original Message- From: Devrim GÜNDÜZ [mailto:dev...@gunduz.org] Sent: 21. Mai 2011 07:46 To: Kohei Kaigai Cc: Emanuel Calvo; postgresql Forums; KaiGai Kohei Subject: Re: [GENERAL] Error compiling sepgsql in PG9.1 On Sat, 2011-05-21 at 02:50 +0100, Kohei Kaigai wrote: As documentation said, it needs libselinux 2.0.93 or higher. This version supports selabel_lookup(3) for database object classes. AFAICS, we are not checking it during configure. It might be worth to add libselinux version check in the configure phase. -- Devrim GÜNDÜZ Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz sepgsql-fix-config-version.patch Description: sepgsql-fix-config-version.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pull up aggregate subquery
2011/5/25 Tom Lane t...@sss.pgh.pa.us: Robert Haas robertmh...@gmail.com writes: That was my first thought, too, but then I wondered if I was getting cheap. Yeah, it's certainly possible that we're worrying too much. Usually I only get concerned about added planner logic if it will impact the planning time for simple queries. Simple tends to be in the eye of the beholder, but something with a complicated aggregate subquery is probably not simple by anyone's definition. In this case the sticky point is that there could be multiple possible sets of clauses available to be pushed down, depending on what you assume is the outer relation for the eventual upper-level nestloop. So worst case, you could have not just one parameterized plan to generate in addition to the regular kind, but 2^N of them ... My intention is that if join qual matches subqury Agg's grouping keys then the Var can be pushed down, so I'm not worried about the exponential possibilities of paths growth. And I found the right place to hack, where set_subquery_pathlist() pushes down some baseristrictinfo. We don't have Var in the RestrictInfo now, but I guess we can put them in it somehow before reaching there. Even if I can do it, the effective case is only outer is only one tuple case. As I noted earlier this optimization will complete by executor's cooperation, which is something like gather-param-values-as-array before starting Agg scan. So I'm still thinking which of pulling up and parameterized scan is better. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2 schedule
Robert, -Publish information about the full schedule to some of the more popular mailing lists I think that posting to pgsql-announce and PostgreSQL.org news, and this list would be sufficient. I'm happy to take care of that. -Link to this page more obviously from postgresql.org (fixed redirect URL is probably the right approach) to bless it, and potentially improve its search rank too. I would suggest instead adding a new page to postgresql.org/developer which lists the development schedule, rather than linking to that wiki page. Maybe on this page? http://www.postgresql.org/developer/roadmap -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing overhead of frequent table locks
On Tue, May 24, 2011 at 12:34 PM, Noah Misch n...@leadboat.com wrote: There's a potentially-unbounded delay between when the subject backend reads strong_lock_counts[] and when it sets its fast-path-used flag. (I didn't mean not yet seen in the sense that some memory load would not show the latest value. I just meant that the subject backend may still be taking relevant actions based on its previous load of the value.) We could have the subject set its fast-path-used flag before even checking strong_lock_counts[], then clear the flag when strong_lock_counts[] dissuaded it from proceeding. Maybe that's what you had in mind? I'd like to say yes, but actually, no, I just failed to notice the race condition. It's definitely less appealing if we have to do it that way. Another idea would be to only clear the fast-path-used flags lazily. If backend A inspects the fast-path queue for backend B and finds it completely empty, it clears the flag; otherwise it just stays set indefinitely. That being said, it's a slight extra cost for all fast-path lockers to benefit the strong lockers, so I'm not prepared to guess whether it will pay off. Yeah. Basically this entire idea is about trying to make life easier for weak lockers at the expense of making it more difficult for strong lockers. I think that's a good trade-off in general, but we might need to wait until we have an actual implementation to judge whether we've turned the dial too far. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2 schedule
On Tue, May 24, 2011 at 1:35 PM, Josh Berkus j...@agliodbs.com wrote: Robert, Actually, you're responding to Greg, not me. But +1 for your suggestions. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] errno not set in case of libm functions (HPUX)
Ibrar Ahmed ibrar.ah...@gmail.com writes: I have found a problem which is specifically related to HP-UX compiler. All 'libm' functions on HP-UX Integrity server do not set errno by default. For 'errno' setting we should compile the code using +Olibmerrno option. So we should add this option in /src/makefiles/Makefile.hpux. This patch will break things on my admittedly rather ancient HPUX box: $ cc +Olibmerrno cc: warning 450: Unrecognized option +Olibmerrno. As submitted, it would also break gcc-based builds, though that at least wouldn't be hard to fix. If you want to submit a configure patch to test whether the switch is appropriate, we could consider it. BTW, is it really true that HP decided they could make the compiler's default behavior violate the C standard so flagrantly? I could believe offering a switch that you had to specify to save a few cycles at the cost of nonstandard behavior; but if your report is actually correct, their engineering standards have gone way downhill since I worked there. I wonder whether you are inserting some other nonstandard switch that turns on this effect. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2 schedule
Actually, you're responding to Greg, not me. Sorry. But +1 for your suggestions. Any objections before I post something? Greg? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pull up aggregate subquery
On Tue, May 24, 2011 at 12:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: Oh, I see. I have a general gripe with nested loop plans: we already consider too many of them. IIRC, when I last fooled around with this, the number of nested loop paths that we generate far exceeded the number of merge or hash join paths, and most of those paths suck and are a complete waste of time. Hm, really? My experience is that it's the mergejoin paths that breed like rabbits, because there are so many potential sort orders. *scratches head* Well, I'm pretty sure that's how it looked when I was testing it. I wonder how this could be different for the two of us. Or maybe one of us is confused. Admittedly, I haven't looked at it in a while. But I think this is all fairly unrelated to the case that Hitoshi is on about. As you said earlier, it seems like we'd have to derive both parameterized and unparameterized plans for the subquery, which seems mighty expensive. That was my first thought, too, but then I wondered if I was getting cheap. Yeah, it's certainly possible that we're worrying too much. Usually I only get concerned about added planner logic if it will impact the planning time for simple queries. Simple tends to be in the eye of the beholder, but something with a complicated aggregate subquery is probably not simple by anyone's definition. In this case the sticky point is that there could be multiple possible sets of clauses available to be pushed down, depending on what you assume is the outer relation for the eventual upper-level nestloop. So worst case, you could have not just one parameterized plan to generate in addition to the regular kind, but 2^N of them ... Hmm. Well, 2^N is more than 2. But I bet most of them are boring. Judging by his followup email, Hitoshi Harada seems to think we can just look at the case where we can parameterize on all of the grouping columns. The only other case that seems like it might be interesting is parameterizing on any single one of the grouping columns. I can't get excited about pushing down arbitrary subsets. Of course, even O(N) in the number of grouping columns might be too much, but then we could fall back to just all or nothing. I think the all case by itself would probably extract 90%+ of the benefit, especially since all will often mean the only one there is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] errno not set in case of libm functions (HPUX)
On 05/24/2011 01:44 PM, Tom Lane wrote: Ibrar Ahmedibrar.ah...@gmail.com writes: I have found a problem which is specifically related to HP-UX compiler. All 'libm' functions on HP-UX Integrity server do not set errno by default. For 'errno' setting we should compile the code using +Olibmerrno option. So we should add this option in /src/makefiles/Makefile.hpux. This patch will break things on my admittedly rather ancient HPUX box: $ cc +Olibmerrno cc: warning 450: Unrecognized option +Olibmerrno. As submitted, it would also break gcc-based builds, though that at least wouldn't be hard to fix. If you want to submit a configure patch to test whether the switch is appropriate, we could consider it. BTW, is it really true that HP decided they could make the compiler's default behavior violate the C standard so flagrantly? I could believe offering a switch that you had to specify to save a few cycles at the cost of nonstandard behavior; but if your report is actually correct, their engineering standards have gone way downhill since I worked there. I wonder whether you are inserting some other nonstandard switch that turns on this effect. I have been whining for years about the lack of HP-UX support (both for gcc and their compiler) on the buildfarm. I really really wish HP would come to the party and supply some equipment and software. Failing that, some spare cycles being made available on a machine by someone else who runs it would be good. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] errno not set in case of libm functions (HPUX)
On 24.05.2011 20:44, Tom Lane wrote: BTW, is it really true that HP decided they could make the compiler's default behavior violate the C standard so flagrantly? I could believe offering a switch that you had to specify to save a few cycles at the cost of nonstandard behavior; but if your report is actually correct, their engineering standards have gone way downhill since I worked there. I wonder whether you are inserting some other nonstandard switch that turns on this effect. This (http://docs.hp.com/en/B3901-90015/ch02s07.html) says: +O[no]libmerrno Description: This option enables[disables] support for errno in libm functions. The default is +Onolibmerrno. In C++ C-mode, the default is +Olibmerrno with -Aa option. So the default is indeed non-standard. But I wonder if we should use -Aa instead? The documentation I found for -Aa (http://docs.hp.com/en/B3901-90017/ch02s22.html) says: -Aa The -Aa option instructs the compiler to use Koenig lookup and strict ANSI for scope rules. This option is equivalent to specifying -Wc,-koenig_lookup,on and -Wc,-ansi_for_scope,on. The default is off. Refer to -Ae option for C++ C-mode description. The standard features enabled by -Aa are incompatible with earlier C and C++ features. That sounds like what we want. Apparently that description is not complete, and -Aa changes some other behavior to ANSI C compatible as well, like +Olibmerrno. There's also -AC99, which specifies compiling in C99-mode - I wonder if that sets +Olibmerrno too. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Domains versus polymorphic functions, redux
On Tue, May 24, 2011 at 01:28:38PM -0400, Tom Lane wrote: Noah Misch n...@leadboat.com writes: On Tue, May 24, 2011 at 12:12:55PM -0400, Tom Lane wrote: This is a consequence of the changes I made to fix bug #5717, particularly the issues around ANYARRAY matching discussed here: http://archives.postgresql.org/pgsql-hackers/2010-10/msg01545.php We discussed this a few weeks ago: http://archives.postgresql.org/message-id/20110511093217.gb26...@tornado.gateway.2wire.net What's to recommend #1 over what I proposed then? Seems like a discard of functionality for little benefit. I am unwilling to commit to making #2 work, especially not under time constraints; and you apparently aren't either, since you haven't produced the patch you alluded to at the end of that thread. I took your lack of any response as non-acceptance of the plan I outlined. Alas, the wrong conclusion. I'll send a patch this week. Even if you had, though, I'd have no confidence that all holes of the sort had been closed. What you're proposing is to ratchet up the implementation requirements for every PL and and every C function declared to accept polymorphic types, and there are a lot of members of both classes that we don't control. True. I will not give you that confidence. Those omissions would have to remain bugs to be fixed as they're found. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] errno not set in case of libm functions (HPUX)
On 24.05.2011 20:56, Andrew Dunstan wrote: I have been whining for years about the lack of HP-UX support (both for gcc and their compiler) on the buildfarm. I really really wish HP would come to the party and supply some equipment and software. Failing that, some spare cycles being made available on a machine by someone else who runs it would be good. I'm trying to arrange access to a HP-UX box within EnterpriseDB. No luck this far. Hopefully I'll get a buildfarm animal up in the next week or so, but don't hold your breath... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Alignment padding bytes in arrays vs the planner
On Mon, May 23, 2011 at 1:12 AM, Noah Misch n...@leadboat.com wrote: On Tue, Apr 26, 2011 at 11:51:35PM -0400, Noah Misch wrote: On Tue, Apr 26, 2011 at 07:23:12PM -0400, Tom Lane wrote: [input functions aren't the only problematic source of uninitialized datum bytes] We've run into other manifestations of this issue before. Awhile ago I made a push to ensure that datatype input functions didn't leave any ill-defined padding bytes in their results, as a result of similar misbehavior for simple constants. But this example shows that we'd really have to enforce the rule of no ill-defined bytes for just about every user-callable function's results, which is a pretty ugly prospect. FWIW, when I was running the test suite under valgrind, these were the functions that left uninitialized bytes in datums: array_recv, array_set, array_set_slice, array_map, construct_md_array, path_recv. If the test suite covers this well, we're not far off. (Actually, I only had the check in PageAddItem ... probably needed to be in one or two other places to catch as much as possible.) Adding a memory definedness check to printtup() turned up one more culprit: tsquery_and. *squints* OK, I can't see what's broken. Help? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Alignment padding bytes in arrays vs the planner
On Tue, May 24, 2011 at 02:05:33PM -0400, Robert Haas wrote: On Mon, May 23, 2011 at 1:12 AM, Noah Misch n...@leadboat.com wrote: On Tue, Apr 26, 2011 at 11:51:35PM -0400, Noah Misch wrote: On Tue, Apr 26, 2011 at 07:23:12PM -0400, Tom Lane wrote: [input functions aren't the only problematic source of uninitialized datum bytes] We've run into other manifestations of this issue before. ?Awhile ago I made a push to ensure that datatype input functions didn't leave any ill-defined padding bytes in their results, as a result of similar misbehavior for simple constants. ?But this example shows that we'd really have to enforce the rule of no ill-defined bytes for just about every user-callable function's results, which is a pretty ugly prospect. FWIW, when I was running the test suite under valgrind, these were the functions that left uninitialized bytes in datums: array_recv, array_set, array_set_slice, array_map, construct_md_array, path_recv. ?If the test suite covers this well, we're not far off. ?(Actually, I only had the check in PageAddItem ... probably needed to be in one or two other places to catch as much as possible.) Adding a memory definedness check to printtup() turned up one more culprit: tsquery_and. *squints* OK, I can't see what's broken. Help? QTN2QT() allocates memory for a TSQuery using palloc(). TSQuery contains an array of QueryItem, which contains three bytes of padding between its first and second members. Those bytes don't get initialized, so we have unpredictable content in the resulting datum. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Error compiling sepgsql in PG9.1
2011/5/24 Kohei Kaigai kohei.kai...@emea.nec.com: The attached patch enables to abort configure script when we run it with '--with-selinux' option, but libselinux is older than minimum requirement to SE-PostgreSQL. As the documentation said, it needs libselinux-2.0.93 at least, because this or later version support selabel_lookup(3) for database object classes; used to initial labeling. The current configure script checks existence of libselinux, but no version checks. (getpeercon_raw(3) has been a supported API for a long term.) The selinux_sepgsql_context_path(3) is a good watermark of libselinux-2.0.93 instead. Looks to me like you need to adjust the wording of the error message. Maybe libselinux version 2.0.93 or newer is required, or something like that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] errno not set in case of libm functions (HPUX)
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: So the default is indeed non-standard. But I wonder if we should use -Aa instead? Probably not; at least on older HPUX versions, -Aa turns off access to assorted stuff that we do want, eg long long. man cc on my box saith -Amode Specify the compilation standard to be used by the compiler. mode can be one of the following letters: c(Default) Compile in a mode compatible with HP-UX releases prior to 7.0. (See The C Programming Language, First Edition by Kernighan and Ritchie). This option also defines the symbol _HPUX_SOURCE and allows the user to access macros and typedefs provided by the HPUX Operating System. The default compilation mode may change in future releases. aCompile under ANSI mode (ANSI programming language C standard ISO 9899:1990). When compiling under ANSI mode, the header files would define only those names (macros and typedefs) specified by the Standard. To access macros and typedefs that are not defined by the ANSI Standard but are provided by the HPUX Operating System, define the symbol _HPUX_SOURCE; or use the extension option described below. eExtended ANSI mode. Same as -Aa -D_HPUX_SOURCE +e. This would define the names (macros and typedefs) provided by the HPUX Operating System and, in addition, allow the following extensions: $ characters in identifier names, sized enums, sized bit-fields, and 64-bit integral type long long. Additional extensions may be added to this option in the future. The +e option is elsewhere stated to mean +eEnables HP value-added features while compiling in ANSI C mode, -Aa. This option is ignored with -Ac because these features are already provided. Features enabled: o Long pointers o Integral type specifiers can appear in enum declarations. o The $ character can appear in identifier names. o Missing parameters on intrinsic calls which isn't 100% consistent with what it says under -Ae, so maybe some additional experimentation is called for. But anyway, autoconf appears to think that -Ae is preferable to the combination -Aa -D_HPUX_SOURCE (that choice is coming from autoconf not our own code); so I'm not optimistic that we can get more-standard behavior by overriding that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Alignment padding bytes in arrays vs the planner
On Tue, May 24, 2011 at 2:11 PM, Noah Misch n...@leadboat.com wrote: OK, I can't see what's broken. Help? QTN2QT() allocates memory for a TSQuery using palloc(). TSQuery contains an array of QueryItem, which contains three bytes of padding between its first and second members. Those bytes don't get initialized, so we have unpredictable content in the resulting datum. OK, so I guess this needs to be applied and back-patched to 8.3, then. 8.2 doesn't have this code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Domains versus polymorphic functions, redux
On May 24, 2011, at 10:11 AM, Tom Lane wrote: regression=# select negate(42::pos); ERROR: return type mismatch in function declared to return pos DETAIL: Actual return type is integer. CONTEXT: SQL function negate during inlining If we smashed to base type then this issue would go away. +1 On the other hand it feels like we'd be taking yet another step away from allowing domains to be usefully used in function declarations. I can't put my finger on any concrete consequence of that sort, since what we're talking about here is ANYELEMENT/ANYARRAY functions not functions declared to take domains --- but it sure seems like this would put domains even further away from the status of first-class citizenship in the type system. I agree. It sure seems to me like DOMAINs should act exactly like any other type. I know that has improved over time, and superficially at least, the above will make it seem like more like than it does with the error. But maybe it's time to re-think how domains are implemented? (Not for 9.1, mind.) I mean, why *don't* they act like first class types? Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Alignment padding bytes in arrays vs the planner
Robert Haas robertmh...@gmail.com writes: On Tue, May 24, 2011 at 2:11 PM, Noah Misch n...@leadboat.com wrote: QTN2QT() allocates memory for a TSQuery using palloc(). TSQuery contains an array of QueryItem, which contains three bytes of padding between its first and second members. Those bytes don't get initialized, so we have unpredictable content in the resulting datum. OK, so I guess this needs to be applied and back-patched to 8.3, then. Yeah. I'm in process of doing that, actually. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Alignment padding bytes in arrays vs the planner
On Tue, May 24, 2011 at 2:18 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, May 24, 2011 at 2:11 PM, Noah Misch n...@leadboat.com wrote: QTN2QT() allocates memory for a TSQuery using palloc(). TSQuery contains an array of QueryItem, which contains three bytes of padding between its first and second members. Those bytes don't get initialized, so we have unpredictable content in the resulting datum. OK, so I guess this needs to be applied and back-patched to 8.3, then. Yeah. I'm in process of doing that, actually. Excellent. Are you going to look at MauMau's patch for bug #6011 also? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Domains versus polymorphic functions, redux
David E. Wheeler da...@kineticode.com writes: On May 24, 2011, at 10:11 AM, Tom Lane wrote: On the other hand it feels like we'd be taking yet another step away from allowing domains to be usefully used in function declarations. I agree. It sure seems to me like DOMAINs should act exactly like any other type. I know that has improved over time, and superficially at least, the above will make it seem like more like than it does with the error. But maybe it's time to re-think how domains are implemented? (Not for 9.1, mind.) I mean, why *don't* they act like first class types? Well, if they actually were first-class types, they probably wouldn't be born with an implicit cast to some other type to handle 99% of operations on them ;-). I think the hard part here is having that cake and eating it too, ie, supporting domain-specific functions without breaking the implicit use of the base type's functions. I guess that the question that's immediately at hand is sort of a variant of that, because using a polymorphic function declared to take ANYARRAY on a domain-over-array really is using a portion of the base type's functionality. What we've learned from bug #5717 and the subsequent issues is that using that base functionality without immediately abandoning the notion that the domain has some life of its own (ie, immediately casting to the base type) is harder than it looks. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Domains versus polymorphic functions, redux
On May 24, 2011, at 11:30 AM, Tom Lane wrote: Well, if they actually were first-class types, they probably wouldn't be born with an implicit cast to some other type to handle 99% of operations on them ;-). I think the hard part here is having that cake and eating it too, ie, supporting domain-specific functions without breaking the implicit use of the base type's functions. Yeah. I guess that the question that's immediately at hand is sort of a variant of that, because using a polymorphic function declared to take ANYARRAY on a domain-over-array really is using a portion of the base type's functionality. What we've learned from bug #5717 and the subsequent issues is that using that base functionality without immediately abandoning the notion that the domain has some life of its own (ie, immediately casting to the base type) is harder than it looks. Well, in the ANYELEMENT context (or ANYARRAY), what could be lost by abandoning the notion that the domain has some life of its own? Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Alignment padding bytes in arrays vs the planner
Noah Misch n...@leadboat.com writes: Adding a memory definedness check to printtup() turned up one more culprit: tsquery_and. Patch applied, thanks. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Alignment padding bytes in arrays vs the planner
Robert Haas robertmh...@gmail.com writes: Excellent. Are you going to look at MauMau's patch for bug #6011 also? No. I don't do Windows, so I can't test it. (On general principles, I don't think that hacking write_eventlog the way he did is appropriate; such a function should write the log, not editorialize. But that's up to whoever does commit it.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inconvenient compression options in pg_basebackup
On sön, 2011-05-22 at 16:43 -0400, Magnus Hagander wrote: On Fri, May 20, 2011 at 17:45, Peter Eisentraut pete...@gmx.net wrote: On fre, 2011-05-20 at 14:19 -0400, Magnus Hagander wrote: I suggest we add an argument-less option -z that means compress, and then -Z can be relegated to choosing the compression level. We can't just use -Z without a parameter for that? You can't portably have a command-line option with an optional argument. Ugh. In that case, I'm fine with your suggestion. Quick patch for verification. I chose the naming -z/--gzip to mirror GNU tar. diff --git i/doc/src/sgml/ref/pg_basebackup.sgml w/doc/src/sgml/ref/pg_basebackup.sgml index 8a7b833..ce7eb52 100644 --- i/doc/src/sgml/ref/pg_basebackup.sgml +++ w/doc/src/sgml/ref/pg_basebackup.sgml @@ -169,8 +169,8 @@ PostgreSQL documentation /varlistentry varlistentry - termoption-Z replaceable class=parameterlevel/replaceable/option/term - termoption--compress=replaceable class=parameterlevel/replaceable/option/term + termoption-z/option/term + termoption--gzip/option/term listitem para Enables gzip compression of tar file output. Compression is only @@ -179,6 +179,18 @@ PostgreSQL documentation /para /listitem /varlistentry + + varlistentry + termoption-Z replaceable class=parameterlevel/replaceable/option/term + termoption--compress-level=replaceable class=parameterlevel/replaceable/option/term + listitem + para +Sets the compression level when gzip compression is enabled. +The default is the default compression level of the zlib +library. + /para + /listitem + /varlistentry /variablelist /para para @@ -394,11 +406,11 @@ PostgreSQL documentation /para para - To create a backup of the local server with one maximum compressed + To create a backup of the local server with one compressed tar file for each tablespace, and store it in the directory filenamebackup/filename, showing a progress report while running: screen -prompt$/prompt userinputpg_basebackup -D backup -Ft -Z9 -P/userinput +prompt$/prompt userinputpg_basebackup -D backup -Ft -z -P/userinput /screen /para diff --git i/src/bin/pg_basebackup/pg_basebackup.c w/src/bin/pg_basebackup/pg_basebackup.c index 1f31fe0..7c2cb57 100644 --- i/src/bin/pg_basebackup/pg_basebackup.c +++ w/src/bin/pg_basebackup/pg_basebackup.c @@ -32,7 +32,10 @@ char format = 'p'; /* p(lain)/t(ar) */ char *label = pg_basebackup base backup; bool showprogress = false; int verbose = 0; -int compresslevel = 0; +bool gzip = false; +#ifdef HAVE_LIBZ +int compresslevel = Z_DEFAULT_COMPRESSION; +#endif bool includewal = false; bool fastcheckpoint = false; char *dbhost = NULL; @@ -126,7 +129,8 @@ usage(void) printf(_( -D, --pgdata=DIRECTORY receive base backup into directory\n)); printf(_( -F, --format=p|t output format (plain, tar)\n)); printf(_( -x, --xlog include required WAL files in backup\n)); - printf(_( -Z, --compress=0-9 compress tar output\n)); + printf(_( -z, --gzip compress tar output with gzip\n)); + printf(_( -Z, --compress-level=0-9 compression level\n)); printf(_(\nGeneral options:\n)); printf(_( -c, --checkpoint=fast|spread\n set fast or spread checkpointing\n)); @@ -265,7 +269,7 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum) else { #ifdef HAVE_LIBZ - if (compresslevel 0) + if (gzip) { snprintf(fn, sizeof(fn), %s/base.tar.gz, basedir); ztarfile = gzopen(fn, wb); @@ -289,7 +293,7 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum) * Specific tablespace */ #ifdef HAVE_LIBZ - if (compresslevel 0) + if (gzip) { snprintf(fn, sizeof(fn), %s/%s.tar.gz, basedir, PQgetvalue(res, rownum, 0)); ztarfile = gzopen(fn, wb); @@ -309,7 +313,7 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum) } #ifdef HAVE_LIBZ - if (compresslevel 0) + if (gzip) { if (!ztarfile) { @@ -919,7 +923,8 @@ main(int argc, char **argv) {format, required_argument, NULL, 'F'}, {checkpoint, required_argument, NULL, 'c'}, {xlog, no_argument, NULL, 'x'}, - {compress, required_argument, NULL, 'Z'}, + {gzip, no_argument, NULL, 'z'}, + {compress-level, required_argument, NULL, 'Z'}, {label, required_argument, NULL, 'l'}, {host, required_argument, NULL, 'h'}, {port, required_argument, NULL, 'p'}, @@ -952,7 +957,7 @@ main(int argc, char **argv) } } - while ((c = getopt_long(argc, argv, D:F:l:Z:c:h:p:U:xwWvP, + while ((c = getopt_long(argc, argv, D:F:l:c:h:p:U:xwWvPzZ:, long_options, option_index)) != -1) { switch (c) @@ -978,6 +983,9 @@ main(int argc, char **argv) case 'l': label = xstrdup(optarg); break; + case 'z': +gzip = true; +break; case 'Z': compresslevel =
Re: [HACKERS] Adding an example for replication configuration to pg_hba.conf
Magnus Hagander wrote: As I mentioned offlist, I'd like it in teal please. Applied with some further minor bikeshedding (remove trailing spaces, rewrap so columns aren't wider than 80 chars, etc) Let me just point out that people who have already run initdb during beta will not see this in their pg_hba.conf, nor in their share/pg_hba.conf.sample, even after they have upgraded to a later beta, Oops, yes, I was wrong here. Sorry. unless they run initdb. ?However, we have bumped the catalog version for something else so they should then get this change. Why would they not see it in their share/pg_hba.conf.sample? It will not affect the existing one in $PGDATA, but why wouldn't the installed .sample change? Yes, the problem is the sample will change, but the $PGDATA will not, so anyone doing a diff of the two files to see the localized changes will see the changes that came in as part of that commit. My point is if we change configuration files and then don't bump the catalog version, the share/*.sample files get out of sync with the files in /data, which can be kind of confusing. They would - but what you are saying above is that they would not get out of sync, because the share/*.sample also don't update. Just a mistake in what you said above, or am I missing something? Yes, my mistake. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Domains versus polymorphic functions, redux
David E. Wheeler da...@kineticode.com writes: On May 24, 2011, at 11:30 AM, Tom Lane wrote: I guess that the question that's immediately at hand is sort of a variant of that, because using a polymorphic function declared to take ANYARRAY on a domain-over-array really is using a portion of the base type's functionality. What we've learned from bug #5717 and the subsequent issues is that using that base functionality without immediately abandoning the notion that the domain has some life of its own (ie, immediately casting to the base type) is harder than it looks. Well, in the ANYELEMENT context (or ANYARRAY), what could be lost by abandoning the notion that the domain has some life of its own? I'm starting to think that maybe we should separate the two cases after all. If we force a downcast for ANYARRAY matching, we will fix the loss of functionality induced by the bug #5717 patch, and it doesn't seem like anyone has a serious objection to that. What to do for ANYELEMENT seems to be a bit more controversial, and at least some of the proposals aren't reasonable to do in 9.1 at this stage. Maybe we should just leave ANYELEMENT as-is for the moment, and reconsider that issue later? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding an example for replication configuration to pg_hba.conf
On Tue, May 24, 2011 at 2:48 PM, Bruce Momjian br...@momjian.us wrote: Yes, the problem is the sample will change, but the $PGDATA will not, so anyone doing a diff of the two files to see the localized changes will see the changes that came in as part of that commit. I don't think that's a serious problem. I wouldn't want to make a change like that in a released version, but doing it during beta seems OK. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding an example for replication configuration to pg_hba.conf
Robert Haas robertmh...@gmail.com writes: On Tue, May 24, 2011 at 2:48 PM, Bruce Momjian br...@momjian.us wrote: Yes, the problem is the sample will change, but the $PGDATA will not, so anyone doing a diff of the two files to see the localized changes will see the changes that came in as part of that commit. I don't think that's a serious problem. I wouldn't want to make a change like that in a released version, but doing it during beta seems OK. Given that we've already forced initdb for beta2, it seems like a complete non-issue right now, anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inconvenient compression options in pg_basebackup
Peter Eisentraut pete...@gmx.net writes: Quick patch for verification. I chose the naming -z/--gzip to mirror GNU tar. I would argue that -Z ought to turn on gzip without my having to write -z as well (at least when the argument is greater than zero; possibly -Z0 should be allowed as meaning no compression). Other than that (and the ensuing docs and help changes), looks fine. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Error compiling sepgsql in PG9.1
Robert Haas robertmh...@gmail.com writes: 2011/5/24 Kohei Kaigai kohei.kai...@emea.nec.com: The attached patch enables to abort configure script when we run it with '--with-selinux' option, but libselinux is older than minimum requirement to SE-PostgreSQL. Looks to me like you need to adjust the wording of the error message. Maybe libselinux version 2.0.93 or newer is required, or something like that. Yeah. Applied with that change. BTW, it's not helpful to include the diff of the generated configure script in such patches. The committer will run autoconf for himself, and from a readability standpoint the generated file is quite useless. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #6034: pg_upgrade fails when it should not.
Robert Haas wrote: On Mon, May 23, 2011 at 2:57 PM, Bruce Momjian br...@momjian.us wrote: Robert Haas wrote: On Mon, May 23, 2011 at 8:26 AM, Bruce Momjian br...@momjian.us wrote: Sorry, I was unclear. ?The question is whether the case of _name_ of the locale is significant, meaning can you have two locale names that differ only by case and behave differently? That would seem surprising to me, but I really have no idea. There's the other direction, too: two locales that vary by something more than case, but still have identical behavior. ?Maybe we just decide not to worry about that, but then why worry about this? Well, if we remove the check then people could easily get broken upgrades by upgrading to a server with a different locale. ?A Google search seems to indicate the locale names are case-sensitive so I am thinking the problem is that the user didn't have exact locales, and needs that to use pg_upgrade. I think you misread what I wrote, or I misexplained it, but never mind. Matching locale names case-insensitively sounds reasonable to me, unless someone has reason to believe it will blow up. OK, that's what I needed to hear. I have applied the attached patch, but only to 9.1 because of the risk of breakage. (This was only the first bug report of this, and we aren't 100% certain about the case issue.) -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c new file mode 100644 index 2117b7f..60c1fbb *** a/contrib/pg_upgrade/check.c --- b/contrib/pg_upgrade/check.c *** static void *** 333,345 check_locale_and_encoding(ControlData *oldctrl, ControlData *newctrl) { ! if (strcmp(oldctrl-lc_collate, newctrl-lc_collate) != 0) pg_log(PG_FATAL, old and new cluster lc_collate values do not match\n); ! if (strcmp(oldctrl-lc_ctype, newctrl-lc_ctype) != 0) pg_log(PG_FATAL, old and new cluster lc_ctype values do not match\n); ! if (strcmp(oldctrl-encoding, newctrl-encoding) != 0) pg_log(PG_FATAL, old and new cluster encoding values do not match\n); } --- 333,346 check_locale_and_encoding(ControlData *oldctrl, ControlData *newctrl) { ! /* These are often defined with inconsistent case, so use pg_strcasecmp(). */ ! if (pg_strcasecmp(oldctrl-lc_collate, newctrl-lc_collate) != 0) pg_log(PG_FATAL, old and new cluster lc_collate values do not match\n); ! if (pg_strcasecmp(oldctrl-lc_ctype, newctrl-lc_ctype) != 0) pg_log(PG_FATAL, old and new cluster lc_ctype values do not match\n); ! if (pg_strcasecmp(oldctrl-encoding, newctrl-encoding) != 0) pg_log(PG_FATAL, old and new cluster encoding values do not match\n); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding an example for replication configuration to pg_hba.conf
Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Tue, May 24, 2011 at 2:48 PM, Bruce Momjian br...@momjian.us wrote: Yes, the problem is the sample will change, but the $PGDATA will not, so anyone doing a diff of the two files to see the localized changes will see the changes that came in as part of that commit. I don't think that's a serious problem. I wouldn't want to make a change like that in a released version, but doing it during beta seems OK. Given that we've already forced initdb for beta2, it seems like a complete non-issue right now, anyway. Yes, agreed. I was just pointing it out because people often don't realize the effect this has. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] New/Revised TODO? Gathering actual read performance data for use by planner
In the TODO list is this item: *Modify the planner to better estimate caching effects * Tom mentioned this in his presentation at PGCON, and I also chatted with Tom about it briefly afterwards. Based on last year's discussion of this TODO item, it seems thoughts have been focused on estimating how much data is being satisfied from PG's shared buffers. However, I think that's only part of the problem. Specifically, read performance is going to be affected by: 1. Reads fulfilled from shared buffers. 2. Reads fulfilled from system cache. 3. Reads fulfilled from disk controller cache. 4. Reads from physical media. #4 is further complicated by the type of physical media for that specific block. For example, reads that can be fulfilled from a SSD are going to be much faster than ones that access hard drives (or even slower types of media.) System load is going to impact all of these as well. Therefore, I suggest that an alternative to the above TODO may be to gather performance data without knowing (or more importantly without needing to know) which of the above sources fulfilled the read. This data would probably need to be kept separately for each table or index, as some tables or indexes may be mostly or fully in cache or on faster physical media than others, although in the absence of other data about a specific table or index, data about other relations in the same tablespace might be of some use. Tom mentioned that the cost of doing multiple system time-of-day calls for each block read might be prohibitive, it may also be that the data may also be too coarse on some systems to be truly useful (eg, the epoch time in seconds.) If this data were available, that could mean that successive plans for the same query could have significantly different plans (and thus actual performance), based on what has happened recently, so these statistics would have to be relatively short term and updated frequently, but without becoming computational bottlenecks. The problem is one I'm interested in working on. -- Mike Nolan
[HACKERS] tackling full page writes
While eating good Indian food and talking about aviation accidents on the last night of PGCon, Greg Stark, Heikki Linnakangas, and I found some time to brainstorm about possible ways to reduce the impact of full_page_writes. I'm not sure that these ideas are much good, but for the sake of posterity: 1. Heikki suggested that instead of doing full page writes, we might try to write only the parts of the page that have changed. For example, if we had 16 bits to play with in the page header (which we don't), then we could imagine the page as being broken up into 16 512-byte chunks, one per bit. Each time we update the page, we write whatever subset of the 512-byte chunks we're actually modifying, except for any that have been written since the last checkpoint. In more detail, when writing a WAL record, if a checkpoint has intervened since the page LSN, then we first clear all 16 bits, reset the bits for the chunks we're modifying, and XLOG those chunks. If no checkpoint has intervened, then we set the bits for any chunks that we are modifying and for which the corresponding bits aren't yet set; and XLOG the corresponding chunks. As I think about it a bit more, we'd need to XLOG not only the parts of the page we actually modifying, but any that the WAL record would need to be correct on replay. (It was further suggested that, in our grand tradition of bad naming, we could name this feature partial full page writes and enable it either with a setting of full_page_writes=partial, or better yet, add a new GUC partial_full_page_writes. The beauty of the latter is that it's completely ambiguous what happens when full_page_writes=off and partial_full_page_writes=on. Actually, we could invert the sense and call it disable_partial_full_page_writes instead, which would probably remove all hope of understanding. This all seemed completely hilarious when we were talking about it, and we weren't even drunk.) 2. The other fairly obvious alternative is to adjust our existing WAL record types to be idempotent - i.e. to not rely on the existing page contents. For XLOG_HEAP_INSERT, we currently store the target tid and the tuple contents. I'm not sure if there's anything else, but we would obviously need the offset where the new tuple should be written, which we currently infer from reading the existing page contents. For XLOG_HEAP_DELETE, we store just the TID of the target tuple; we would certainly need to store its offset within the block, and maybe the infomask. For XLOG_HEAP_UPDATE, we'd need the old and new offsets and perhaps also the old and new infomasks. Assuming that's all we need and I'm not missing anything (which I won't bet on), that means we'd be adding, say, 4 bytes per insert or delete and 8 bytes per update. So, if checkpoints are spread out widely enough that there will be more than ~2K operations per page between checkpoints, then it makes more sense to just do a full page write and call it good. If not, this idea might have legs. 3. Going a bit further, Greg proposed the idea of ripping out our current WAL infrastructure altogether and instead just having one WAL record that says these byte ranges on this page changed to have these new contents. That's elegantly simple, but I'm afraid it would bloat the records quite a bit. For example, as Heikki pointed out, HEAP_XLOG_DELETE relies on the XID in the record header to figure out what to write, and all the heap-modification operations implicitly specify the visibility map change when they specify the heap change. We currently have a flag to indicate whether the visibility map actually requires an update, but it's just one bit. However, one possible application of this concept is that we could add something like this in along with our existing WAL record types. It might be useful, for example, for third-party index AMs, which are currently pretty much out of luck. That's about as far as we got. Though I haven't convinced anyone else yet, I still think there's some merit to the idea of just writing the portion of the page that precedes pd_upper. WAL records would have to assume that the tuple data might be clobbered, but they could rely on the early portion of the page to be correct. AFAICT, that would be OK for all of the existing WAL records except for XLOG_HEAP2_CLEAN (i.e. vacuum), with the exception that - prior to the minimum recovery point - they'd need to apply their changes unconditionally rather than considering the page LSN. Tom has argued that won't work, but I'm not sure he's convinced anyone else yet... Anyone else have good ideas? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] about EDITOR_LINENUMBER_SWITCH
On lör, 2011-05-21 at 20:39 -0400, Robert Haas wrote: On Sat, May 21, 2011 at 5:47 PM, Peter Eisentraut pete...@gmx.net wrote: I noticed the 9.1 release notes claim that the new EDITOR_LINENUMBER_SWITCH thing is an environment variable, whereas it is actually a psql variable. It's probably the result of drift between the original patch and what was eventually committed. IIRC, Pavel had it as an environment variable originally, but Tom and I didn't feel the feature was important enough to merit that treatment. I think it's not really a matter of importance, it's a matter of making things work correctly. I have a shell configuration that sets different environment variables, including editor, depending on what directory I'm in. Now I think that all the editors in question use the + syntax, but anyone else with something like that slightly out of the ordinary would be stuck. The other problem is if I change the editor here, I have to change this other piece there. Note that you cannot even specify the editor itself in psqlrc. Another thought is that this whole thing could be done away with if we just allowed people to pass through arbitrary options to the editor, like \edit file.sql +50 -a -b -c For powerusers, this could have interesting possibilities. That's an intriguing possibility. But part of the point of the original feature was to be able to say: \ef somefunc 10 ...and end up on line 10 of somefunc, perhaps in response to an error message complaining about that line. I don't think your proposal would address that. Well, you'd write \ef somefunc +10 instead. Or something else, depending on the editor, but then you'd know what to write, since under the current theory you'd have to have configured it previously. Using the +10 syntax also looks a bit clearer, in my mind. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] about EDITOR_LINENUMBER_SWITCH
On sön, 2011-05-22 at 06:30 +0200, Pavel Stehule wrote: A idea with other options are interesting. More usable can be store these option inside psql variable (be consistent with current state). Maybe in EDITOR_OPTIONS ? There isn't really a need for that, since if you want to pass options to your editor, you can stick them in the EDITOR variable. The idea would be more to pass options per occasion. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] about EDITOR_LINENUMBER_SWITCH
On Tue, May 24, 2011 at 4:36 PM, Peter Eisentraut pete...@gmx.net wrote: That's an intriguing possibility. But part of the point of the original feature was to be able to say: \ef somefunc 10 ...and end up on line 10 of somefunc, perhaps in response to an error message complaining about that line. I don't think your proposal would address that. Well, you'd write \ef somefunc +10 instead. But that would not put you on line 10 of the function. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] minor patch submission: CREATE CAST ... AS EXPLICIT
On lör, 2011-05-21 at 15:46 +0200, Fabien COELHO wrote: Hello, Please find attached a minor stylish patch. It compiles and the update test cases work for me. Description: Add AS EXPLICIT to CREATE CAST This gives a name to the default case of CREATE CAST, which creates a cast which must be explicitely invoked. From a language definition perspective, it is helpful to have a name for every case instead of an implicit fallback, without any word to describe it. See for instance CREATE USER CREATEDB/NOCREATEDB or CREATE RULE ... DO ALSO/INSTEAD for similar occurences of naming default cases. Oddly enough, we did add the DO ALSO syntax much later, and no one complained about that, as far as I recall. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Latch implementation that wakes on postmaster death on both win32 and Unix
Attached is the latest revision of the latch implementation that monitors postmaster death, plus the archiver client that now relies on that new functionality and thereby works well without a tight PostmasterIsAlive() polling loop. On second thought, it is reasonable for the patch to be evaluated with the archiver changes. Any problems that we'll have with latch changes are likely problems that all WL_POSTMASTER_DEATH latch clients will have, so we might as well include the simplest such client initially. Once I have buy-in on the latch changes, the archiver work becomes uncontroversial, I think. The lifesign terminology has been dropped. We now close() the file descriptor that represents ownership - the write end of our anonymous pipe - in each child backend directly in the forking machinery (the thin fork() wrapper for the non-EXEC_BACKEND case), through a call to ReleasePostmasterDeathWatchHandle(). We don't have to do that on Windows, and we don't. I've handled the non-win32 EXEC_BACKEND case, which I understand just exists for testing purposes. I've done the usual BackendParameters stuff. A ReleasePostmasterDeathWatchHandle() call is unnecessary on win32 (the function doesn't exist there - the need to call it on Unix is a result of its implementation). I'd like to avoid having calls to it in each auxiliary process. It should be called in a single sweet spot that doesn't put any burden on child process authors to remember to call it themselves. Disappointingly, and despite a big effort, there doesn't seem to be a way to have the win32 WaitForMultipleObjects() call wake on postmaster death in addition to everything else in the same way that select() does, so there are now two blocking calls, each in a thread of its own (when the latch code is interested in postmaster death - otherwise, it's single threaded as before). The threading stuff (in particular, the fact that we used a named pipe in a thread where the name of the pipe comes from the process PID) is inspired by win32 signal emulation, src/backend/port/win32/signal.c . You can easily observe that it works as advertised on Windows by starting Postgres with archiving, using task manager to monitor processes, and doing the following to the postmaster (assuming it has a PID of 1234). This is the Windows equivalent of kill -9 : C:\Users\Petertaskkill /pid 1234 /F You'll see that it takes about a second for the archiver to exit. All processes exit. Thoughts? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index e71090f..b1d38f5 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -10150,7 +10150,7 @@ retry: /* * Wait for more WAL to arrive, or timeout to be reached */ - WaitLatch(XLogCtl-recoveryWakeupLatch, 500L); + WaitLatch(XLogCtl-recoveryWakeupLatch, WL_LATCH_SET | WL_TIMEOUT, 500L); ResetLatch(XLogCtl-recoveryWakeupLatch); } else diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c index 6dae7c9..c60986c 100644 --- a/src/backend/port/unix_latch.c +++ b/src/backend/port/unix_latch.c @@ -94,6 +94,7 @@ #include miscadmin.h #include storage/latch.h +#include storage/pmsignal.h #include storage/shmem.h /* Are we currently in WaitLatch? The signal handler would like to know. */ @@ -108,6 +109,15 @@ static void initSelfPipe(void); static void drainSelfPipe(void); static void sendSelfPipeByte(void); +/* + * Constants that represent which of a pair of fds given + * to pipe() is watched and owned in the context of + * dealing with postmaster death + */ +#define POSTMASTER_FD_WATCH 0 +#define POSTMASTER_FD_OWN 1 + +extern int postmaster_alive_fds[2]; /* * Initialize a backend-local latch. @@ -188,22 +198,22 @@ DisownLatch(volatile Latch *latch) * backend-local latch initialized with InitLatch, or a shared latch * associated with the current process by calling OwnLatch. * - * Returns 'true' if the latch was set, or 'false' if timeout was reached. + * Returns bit field indicating which condition(s) caused the wake-up. */ -bool -WaitLatch(volatile Latch *latch, long timeout) +int +WaitLatch(volatile Latch *latch, int wakeEvents, long timeout) { - return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout) 0; + return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout); } /* * Like WaitLatch, but will also return when there's data available in - * 'sock' for reading or writing. Returns 0 if timeout was reached, - * 1 if the latch was set, 2 if the socket became readable or writable. + * 'sock' for reading or writing. + * + * Returns bit field indicating which condition(s) caused the wake-up. */ int -WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead, - bool forWrite, long timeout) +WaitLatchOrSocket(volatile
[HACKERS] Should partial dumps include extensions?
There's a complaint here http://archives.postgresql.org/pgsql-general/2011-05/msg00714.php about the fact that 9.1 pg_dump always dumps CREATE EXTENSION commands for all loaded extensions. Should we change that? A reasonable compromise might be to suppress extensions in the same cases where we suppress procedural languages, ie if --schema or --table was used (see include_everything switch in pg_dump.c). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] eviscerating the parser
Robert Haas wrote: On Sun, May 22, 2011 at 1:38 PM, Joshua Berkus j...@agliodbs.com wrote: Another point is that parsing overhead is quite obviously not the reason for the massive performance gap between one core running simple selects on PostgreSQL and one core running simple selects on MySQL. Even if I had (further) eviscerated the parser to cover only the syntax those queries actually use, it wasn't going to buy more than a couple points. I don't know if you say Jignesh's presentation, but there seems to be a lot of reason to believe that we are lock-bound on large numbers of concurrent read-only queries. I didn't see Jignesh's presentation, but I'd come to the same conclusion (with some help from Jeff Janes and others): http://archives.postgresql.org/pgsql-hackers/2010-11/msg01643.php http://archives.postgresql.org/pgsql-hackers/2010-11/msg01665.php We did also recently discuss how we might improve the behavior in this case: http://archives.postgresql.org/pgsql-hackers/2011-05/msg00787.php ...and ensuing discussion. However, in this case, there was only one client, so that's not the problem. I don't really see how to get a big win here. If we want to be 4x faster, we'd need to cut time per query by 75%. That might require 75 different optimizations averaging 1% a piece, most likely none of them trivial. I do confess I'm a bit confused as to why prepared statements help so much. That is increasing the throughput by 80%, which is equivalent to decreasing time per query by 45%. That is a surprisingly big number, and I'd like to better understand where all that time is going. Prepared statements are pre-parsed/rewritten/planned, but I can't see how decreasing the parser size would affect those other stages, and certainly not 45%. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2 schedule
On mån, 2011-05-23 at 22:44 -0400, Greg Smith wrote: -Given that work in August is particularly difficult to line up with common summer schedules around the world, having the other 1 month gap in the schedule go there makes sense. You might want to add a comment on the schedule page about the June/July/August timing, because it looks like a typo, and the meeting minutes are also inconsistent how they talk about June and July. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] about EDITOR_LINENUMBER_SWITCH
Robert Haas robertmh...@gmail.com writes: On Tue, May 24, 2011 at 4:36 PM, Peter Eisentraut pete...@gmx.net wrote: That's an intriguing possibility. But part of the point of the original feature was to be able to say: \ef somefunc 10 ...and end up on line 10 of somefunc, perhaps in response to an error message complaining about that line. I don't think your proposal would address that. Well, you'd write \ef somefunc +10 instead. But that would not put you on line 10 of the function. Right. It would also increase the cognitive load on the user to have to remember the command-line go-to-line-number switch for his editor. So I don't particularly want to redesign this feature. However, I can see the possible value of letting EDITOR_LINENUMBER_SWITCH be set from the same place that you set EDITOR, which would suggest that we allow the value to come from an environment variable. I'm not sure whether there is merit in allowing both that source and ~/.psqlrc, though possibly for Windows users it might be easier if ~/.psqlrc worked. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] minor patch submission: CREATE CAST ... AS EXPLICIT
Peter Eisentraut pete...@gmx.net writes: On lör, 2011-05-21 at 15:46 +0200, Fabien COELHO wrote: From a language definition perspective, it is helpful to have a name for every case instead of an implicit fallback, without any word to describe it. See for instance CREATE USER CREATEDB/NOCREATEDB or CREATE RULE ... DO ALSO/INSTEAD for similar occurences of naming default cases. Oddly enough, we did add the DO ALSO syntax much later, and no one complained about that, as far as I recall. Sure, but CREATE RULE is entirely locally-grown syntax, so there is no argument from standards compliance to consider there. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] about EDITOR_LINENUMBER_SWITCH
Robert Haas robertmh...@gmail.com writes: On Sat, May 21, 2011 at 5:47 PM, Peter Eisentraut pete...@gmx.net wrote: I noticed the 9.1 release notes claim that the new EDITOR_LINENUMBER_SWITCH thing is an environment variable, whereas it is actually a psql variable. This is perhaps sort of a Freudian slip. It's probably the result of drift between the original patch and what was eventually committed. IIRC, Pavel had it as an environment variable originally, but Tom and I didn't feel the feature was important enough to merit that treatment. BTW, the above is merest historical revisionism: there was never a version of the patch that did it that way. AFAICS the idea started here: http://archives.postgresql.org/pgsql-hackers/2010-08/msg00089.php to which you immediately asked whether it should be an environmental variable, and I said no on what might be considered thin grounds: http://archives.postgresql.org/pgsql-hackers/2010-08/msg00182.php I can't see any real objection other than complexity to having it look for a psql variable and then an environment variable. Or we could drop the psql variable part of that, if it seems too complicated. Also, while we're on the subject, I'm not real sure why we don't allow the code to provide a default value when EDITOR has a well-known value like vi or emacs. As long as there is a way to override that, where's the harm in a default? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] about EDITOR_LINENUMBER_SWITCH
On Tue, May 24, 2011 at 5:35 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sat, May 21, 2011 at 5:47 PM, Peter Eisentraut pete...@gmx.net wrote: I noticed the 9.1 release notes claim that the new EDITOR_LINENUMBER_SWITCH thing is an environment variable, whereas it is actually a psql variable. This is perhaps sort of a Freudian slip. It's probably the result of drift between the original patch and what was eventually committed. IIRC, Pavel had it as an environment variable originally, but Tom and I didn't feel the feature was important enough to merit that treatment. BTW, the above is merest historical revisionism: there was never a version of the patch that did it that way. Even if you were correct, that's a snarky way to put it, and the point is trivial anyway. But I don't think I'm imagining the getenv() call in this version of the patch: http://archives.postgresql.org/pgsql-hackers/2010-07/msg01253.php Also, while we're on the subject, I'm not real sure why we don't allow the code to provide a default value when EDITOR has a well-known value like vi or emacs. As long as there is a way to override that, where's the harm in a default? Well, the question is how many people it'll help. Some people might have a full pathname, others might called it vim... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cannot build docs of 9.1 on Windows
Andrew, From: Andrew Dunstan and...@dunslane.net builddoc.bat failed on my system and reading it made my head hurt. So I did what I've done with other bat files and rewrote it in Perl. The result is attached. It works for me, and should be a dropin replacement. Just put it in the src/tools/msvc directory and run perl builddoc.pl. Please test it and if it works for you we'll use it and make builddoc.bat a thin wrapper like build.bat and vcregress.bat. It worked successfully! doc\src\sgml\html directory and its contents was created, and the HTML contents appear to be correct. Thank you very much. The output of perl builddoc.pl was as follows: -- perl mk_feature_tables.pl YES ../../../src/backend/catalog/sql_feature_packages.txt ../../../src/backend/catalog/sql_features.txt features-supported.sgmlperl mk_feature_tables.pl NO ../../../src/backend/catalog/sql_feature_packages.txt ../../../src/backend/catalog/sql_features.txt features-unsupported.sgml perl generate-errcodes-table.pl ../../../src/backend/utils/errcodes.txt errcodes-table.sgml Running first build... D:\pgdev\doctool/openjade-1.3.1/bin/openjade -V html-index -wall -wno-unused-param -wno-empty -D . -c D:\pgdev\doctool/docbook-dsssl-1.79/catalog -d stylesheet.dsl -i output-html -t sgml postgres.sgml 21 | findstr /V DTDDECL catalog entries are not supported Running collateindex... perl D:\pgdev\doctool/docbook-dsssl-1.79/bin/collateindex.pl -f -g -i bookindex -o bookindex.sgml HTML.index Processing HTML.index... 2158 entries loaded... 0 entries ignored... Done. Running second build... D:\pgdev\doctool/openjade-1.3.1/bin/openjade -wall -wno-unused-param -wno-empty -D . -c D:\pgdev\doctool/docbook-dsssl-1.79/catalog -d stylesheet.dsl -t sgml -i output-html -i include-index postgres.sgml 21 | findstr /V DTDDECL catalog entries are not supported Docs build complete. -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] adding a new column in IDENTIFY_SYSTEM
On Fri, May 20, 2011 at 12:50 PM, Magnus Hagander mag...@hagander.net wrote: Yes. It might be useful to note it, and then ust make an override flag. My pointm, though, was that doing it for walreceiver is more important and a more logical first step. ok, patch attached. -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte y capacitación de PostgreSQL diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 6be5a14..2235c7f 100644 *** a/doc/src/sgml/protocol.sgml --- b/doc/src/sgml/protocol.sgml *** The commands accepted in walsender mode *** 1315,1321 listitem para Requests the server to identify itself. Server replies with a result ! set of a single row, containing three fields: /para para --- 1315,1321 listitem para Requests the server to identify itself. Server replies with a result ! set of a single row, containing four fields: /para para *** The commands accepted in walsender mode *** 1356,1361 --- 1356,1372 /para /listitem /varlistentry + + varlistentry + term +xlogversion + /term + listitem + para +Current version of xlog page format. + /para + /listitem + /varlistentry /variablelist /para diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c index 0831b1b..ca39654 100644 *** a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c --- b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c *** *** 21,26 --- 21,27 #include libpq-fe.h #include access/xlog.h + #include access/xlog_internal.h #include miscadmin.h #include replication/walreceiver.h #include utils/builtins.h *** libpqrcv_connect(char *conninfo, XLogRec *** 83,88 --- 84,90 char standby_sysid[32]; TimeLineID primary_tli; TimeLineID standby_tli; + uint16 primary_xlp_magic; PGresult *res; char cmd[64]; *** libpqrcv_connect(char *conninfo, XLogRec *** 114,120 the primary server: %s, PQerrorMessage(streamConn; } ! if (PQnfields(res) != 3 || PQntuples(res) != 1) { int ntuples = PQntuples(res); int nfields = PQnfields(res); --- 116,122 the primary server: %s, PQerrorMessage(streamConn; } ! if (PQnfields(res) != 4 || PQntuples(res) != 1) { int ntuples = PQntuples(res); int nfields = PQnfields(res); *** libpqrcv_connect(char *conninfo, XLogRec *** 127,133 --- 129,137 } primary_sysid = PQgetvalue(res, 0, 0); primary_tli = pg_atoi(PQgetvalue(res, 0, 1), 4, 0); + primary_xlp_magic = atoi(PQgetvalue(res, 0, 2)); + PQclear(res); /* * Confirm that the system identifier of the primary is the same as ours. */ *** libpqrcv_connect(char *conninfo, XLogRec *** 135,141 GetSystemIdentifier()); if (strcmp(primary_sysid, standby_sysid) != 0) { - PQclear(res); ereport(ERROR, (errmsg(database system identifier differs between the primary and standby), errdetail(The primary's identifier is %s, the standby's identifier is %s., --- 139,144 *** libpqrcv_connect(char *conninfo, XLogRec *** 147,159 * recovery target timeline. */ standby_tli = GetRecoveryTargetTLI(); - PQclear(res); if (primary_tli != standby_tli) ereport(ERROR, (errmsg(timeline %u of the primary does not match recovery target timeline %u, primary_tli, standby_tli))); ThisTimeLineID = primary_tli; /* Start streaming from the point requested by startup process */ snprintf(cmd, sizeof(cmd), START_REPLICATION %X/%X, startpoint.xlogid, startpoint.xrecoff); --- 150,171 * recovery target timeline. */ standby_tli = GetRecoveryTargetTLI(); if (primary_tli != standby_tli) ereport(ERROR, (errmsg(timeline %u of the primary does not match recovery target timeline %u, primary_tli, standby_tli))); ThisTimeLineID = primary_tli; + /* + * Check that the primary has a compatible XLOG_PAGE_MAGIC + */ + if (primary_xlp_magic != XLOG_PAGE_MAGIC) + { + ereport(ERROR, + (errmsg(XLOG pages are not compatible between primary and standby), + errhint(Verify PostgreSQL versions on both, primary and standby.))); + } + /* Start streaming from the point requested by startup process */ snprintf(cmd, sizeof(cmd), START_REPLICATION %X/%X, startpoint.xlogid, startpoint.xrecoff); diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 470e6d1..392cf94 100644 *** a/src/backend/replication/walsender.c --- b/src/backend/replication/walsender.c *** IdentifySystem(void) *** 279,289
Re: [HACKERS] SSI predicate locking on heap -- tuple or row?
On Tue, May 24, 2011 at 04:18:37AM -0500, Kevin Grittner wrote: These proofs show that there is no legitimate cycle which could cause an anomaly which the move from row-based to tuple-based logic will miss. They don't prove that the change will generate all the same serialization failures; and in fact, some false positives are eliminated by the change. Yes, that's correct. That's related to the part in the proof where I claimed T3 couldn't have a conflict out *to some transaction T0 that precedes T1*. I originally tried to show that T3 couldn't have any conflicts out that T2 didn't have, which would mean we got the same set of serialization failures, but that's not true. In fact, it's not too hard to come up with an example where there would be a serialization failure with the row version links, but not without. However, because the rw-conflict can't be pointing to a transaction that precedes T1 in the serial order, it won't create a cycle. In other words, there are serialization failures that won't happen anymore, but they were false positives. Dan -- Dan R. K. Ports MIT CSAILhttp://drkp.net/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2 schedule
On 05/24/2011 05:03 PM, Peter Eisentraut wrote: On mån, 2011-05-23 at 22:44 -0400, Greg Smith wrote: -Given that work in August is particularly difficult to line up with common summer schedules around the world, having the other1 month gap in the schedule go there makes sense. You might want to add a comment on the schedule page about the June/July/August timing, because it looks like a typo, and the meeting minutes are also inconsistent how they talk about June and July. Yes, I was planning to (and just did) circle back to the minutes to make everything match up. It's now self-consistent, same dates as the schedule, and explains the rationale better. I'm not sure how to address the feeling of typo you have on the schedule page beyond that. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] adding a new column in IDENTIFY_SYSTEM
On Wed, May 25, 2011 at 8:26 AM, Jaime Casanova ja...@2ndquadrant.com wrote: On Fri, May 20, 2011 at 12:50 PM, Magnus Hagander mag...@hagander.net wrote: Yes. It might be useful to note it, and then ust make an override flag. My pointm, though, was that doing it for walreceiver is more important and a more logical first step. ok, patch attached. Why is the check of WAL version required for streaming replication? As Tom said, if the version is different between two servers, the check of system identifier fails first. No? + primary_xlp_magic = atoi(PQgetvalue(res, 0, 2)); You wrongly get the third field (i.e., current xlog location) as the WAL version. You should call PQgetvalue(res, 0, 3), instead. errdetail(Expected 1 tuple with 3 fields, got %d tuples with %d fields., You need to change the above message. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2 schedule
On 05/24/2011 01:35 PM, Josh Berkus wrote: I would suggest instead adding a new page to postgresql.org/developer which lists the development schedule, rather than linking to that wiki page. Maybe on this page? http://www.postgresql.org/developer/roadmap Now that I look at the roadmap page again, I think all that would really be needed here is to tweak its wording a bit. If the description on there of the link to the wiki looked like this: General development information A wiki page about various aspects of the PostgreSQL development process, including detailed schedules and submission guidelines I think that's enough info to keep there. Putting more information back onto the main site when it can live happily on the wiki seems counterproductive to me; if there's concerns about things like vandalism, we can always lock the page. I could understand the argument that it looks more professional to have it on the main site, but perception over function only goes so far for me. The idea of adding a link back to the wiki from the https://commitfest.postgresql.org/ page would complete being able to navigate among the three major sites here, no matter which people started at. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] adding a new column in IDENTIFY_SYSTEM
On Tue, May 24, 2011 at 8:52 PM, Fujii Masao masao.fu...@gmail.com wrote: + primary_xlp_magic = atoi(PQgetvalue(res, 0, 2)); You wrongly get the third field (i.e., current xlog location) as the WAL version. You should call PQgetvalue(res, 0, 3), instead. errdetail(Expected 1 tuple with 3 fields, got %d tuples with %d fields., You need to change the above message. Fixed. About you comments on the check... if you read the thread, you will find that the whole reason for the field is future improvement, but everyone wanted some use of the field now... so i made a patch to use it in pg_basebackup before the transfer starts and avoid time and bandwith waste but Magnus prefer this in walreceiver... -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte y capacitación de PostgreSQL diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 6be5a14..2235c7f 100644 *** a/doc/src/sgml/protocol.sgml --- b/doc/src/sgml/protocol.sgml *** The commands accepted in walsender mode *** 1315,1321 listitem para Requests the server to identify itself. Server replies with a result ! set of a single row, containing three fields: /para para --- 1315,1321 listitem para Requests the server to identify itself. Server replies with a result ! set of a single row, containing four fields: /para para *** The commands accepted in walsender mode *** 1356,1361 --- 1356,1372 /para /listitem /varlistentry + + varlistentry + term +xlogversion + /term + listitem + para +Current version of xlog page format. + /para + /listitem + /varlistentry /variablelist /para diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c index 0831b1b..c3f3571 100644 *** a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c --- b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c *** *** 21,26 --- 21,27 #include libpq-fe.h #include access/xlog.h + #include access/xlog_internal.h #include miscadmin.h #include replication/walreceiver.h #include utils/builtins.h *** libpqrcv_connect(char *conninfo, XLogRec *** 83,88 --- 84,90 char standby_sysid[32]; TimeLineID primary_tli; TimeLineID standby_tli; + uint16 primary_xlp_magic; PGresult *res; char cmd[64]; *** libpqrcv_connect(char *conninfo, XLogRec *** 114,120 the primary server: %s, PQerrorMessage(streamConn; } ! if (PQnfields(res) != 3 || PQntuples(res) != 1) { int ntuples = PQntuples(res); int nfields = PQnfields(res); --- 116,122 the primary server: %s, PQerrorMessage(streamConn; } ! if (PQnfields(res) != 4 || PQntuples(res) != 1) { int ntuples = PQntuples(res); int nfields = PQnfields(res); *** libpqrcv_connect(char *conninfo, XLogRec *** 122,133 PQclear(res); ereport(ERROR, (errmsg(invalid response from primary server), ! errdetail(Expected 1 tuple with 3 fields, got %d tuples with %d fields., ntuples, nfields))); } primary_sysid = PQgetvalue(res, 0, 0); primary_tli = pg_atoi(PQgetvalue(res, 0, 1), 4, 0); /* * Confirm that the system identifier of the primary is the same as ours. */ --- 124,137 PQclear(res); ereport(ERROR, (errmsg(invalid response from primary server), ! errdetail(Expected 1 tuple with 4 fields, got %d tuples with %d fields., ntuples, nfields))); } primary_sysid = PQgetvalue(res, 0, 0); primary_tli = pg_atoi(PQgetvalue(res, 0, 1), 4, 0); + primary_xlp_magic = atoi(PQgetvalue(res, 0, 3)); + PQclear(res); /* * Confirm that the system identifier of the primary is the same as ours. */ *** libpqrcv_connect(char *conninfo, XLogRec *** 135,141 GetSystemIdentifier()); if (strcmp(primary_sysid, standby_sysid) != 0) { - PQclear(res); ereport(ERROR, (errmsg(database system identifier differs between the primary and standby), errdetail(The primary's identifier is %s, the standby's identifier is %s., --- 139,144 *** libpqrcv_connect(char *conninfo, XLogRec *** 147,159 * recovery target timeline. */ standby_tli = GetRecoveryTargetTLI(); - PQclear(res); if (primary_tli != standby_tli) ereport(ERROR, (errmsg(timeline %u of the primary does not match recovery target timeline %u, primary_tli, standby_tli))); ThisTimeLineID = primary_tli; /* Start streaming from the point requested by startup process */ snprintf(cmd, sizeof(cmd), START_REPLICATION %X/%X, startpoint.xlogid, startpoint.xrecoff); ---