Re: [HACKERS] Why do we still have commit_delay and commit_siblings?
On 14 May 2012 00:45, Robert Haas robertmh...@gmail.com wrote: On Sun, May 13, 2012 at 7:17 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: Have I missed something? Why do we keep around this foot-gun that now appears to me to be at best useless and at worst harmful? I can see why the temptation to keep this setting around used to exist, since it probably wasn't too hard to get good numbers from extremely synthetic pgbench runs, but I cannot see why the new adaptive implementation wouldn't entirely shadow the old one even in that situation. It seems that, with the new code, when there are a lot of people trying to commit very frequently, they tend to divide themselves into two gangs: everybody in one gang commits, then everyone in the other gang commits, then everyone in the first gang commits again, and so on. Assuming that the transactions themselves require negligible processing time, this provides 50% of the theoretically optimum throughput. Keeping a parameter without any clue as to whether it has benefit is just wasting people's time. We don't ADD parameters based on supposition, why should we avoid removing parameters that have no measured benefit? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we still have commit_delay and commit_siblings?
On Mon, May 14, 2012 at 2:07 AM, Simon Riggs si...@2ndquadrant.com wrote: Keeping a parameter without any clue as to whether it has benefit is just wasting people's time. No, arguing that we should remove a parameter because it's useless when you haven't bothered to test whether or not it actually is useless is wasting people's time. We don't ADD parameters based on supposition, why should we avoid removing parameters that have no measured benefit? If they have no actual benefit, of course we should remove them. If they have no measured benefit because no one has bothered to measure, that's not a reason to remove them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we still have commit_delay and commit_siblings?
On 14.05.2012 02:17, Peter Geoghegan wrote: One illuminating way of quickly explaining the new group commit code is that it also inserts a delay at approximately the same place (well, more places now, since the delay was previously inserted only at the xact.c callsite of XLogFlush(), and there are plenty more sites than that), only that delay is now just about perfectly adaptive. That isn't quite the full truth, since we *also* have the benefit of *knowing* that there is an active leader/flusher at all times we're delayed. The unusual semantics of LWLockAcquireOrWait() result in the new 9.2 code almost always just causing a delay for most backends when group commit matters (that is, a delay before they reach their final, productive iteration of the for(;;) loop for the commit, usually without them ever actually acquiring the exclusive lock/XlogWrite()'ing). That doesn't seem like an accurate explanation of how the code works. It doesn't insert a deliberate delay anywhere. At a high level, the algorithm is exactly the same as before. However, the new code improves the concurrency of noticing that the WAL has been flushed. If you had a machine where context switches are infinitely fast and has zero contention from accessing shared memory, the old and new code would behave the same. It was an impressive improvement, but the mechanism is completely different from commit_delay and commit_siblings. That said, I wouldn't mind removing commit_delay and commit_siblings. They're pretty much impossible to tune correctly, assuming they work as advertised. Some hard data would be nice, though, as Robert suggested. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we still have commit_delay and commit_siblings?
On 14 May 2012 07:30, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: That said, I wouldn't mind removing commit_delay and commit_siblings. They're pretty much impossible to tune correctly, assuming they work as advertised. Some hard data would be nice, though, as Robert suggested. Those parameters were already hard to get any benefit from, even in a benchmark. In a wide range of cases/settings they produce clear degradation. Any thorough testing would involve a range of different hardware types, so its unlikely to ever occur. So lets just move on. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we still have commit_delay and commit_siblings?
On Mon, May 14, 2012 at 8:17 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, May 14, 2012 at 2:07 AM, Simon Riggs si...@2ndquadrant.com wrote: Keeping a parameter without any clue as to whether it has benefit is just wasting people's time. No, arguing that we should remove a parameter because it's useless when you haven't bothered to test whether or not it actually is useless is wasting people's time. It's most certainly not, IMHO. Discussing it here is *not* a waste of time. Or if any, it's a waste of time for a couple of people. If we leave it in, and it's useless, we waste the time of thousands of users. The choice between those two should be obvious. We don't ADD parameters based on supposition, why should we avoid removing parameters that have no measured benefit? If they have no actual benefit, of course we should remove them. If they have no measured benefit because no one has bothered to measure, that's not a reason to remove them. Another option might be to as a first step remove them from the .conf file or have a deprecated section, maybe? But if we do that, people aren't likely to use them anyway... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Analyzing foreign tables memory problems
Noah Misch wrote: 1) Expose WIDTH_THRESHOLD in commands/vacuum.h and add documentation so that the authors of foreign data wrappers are aware of the problem and can avoid it on their side. This would be quite simple. Seems reasonable. How would the FDW return an indication that a value was non-NULL but removed due to excess width? The FDW would return a value of length WIDTH_THRESHOLD+1 that is long enough to be recognized as too long, but not long enough to cause a problem. Here is a simple patch for that. It feels to me like a undue hack to ask FDW authors to synthesize such values. It's easy enough for data types such as text/bytea. In general, though, simple truncation may not produce a valid value of the type. That shouldn't matter, since the next action taken on the value should be to discard it, but it's fragile. Can we do better? Just thinking out loud, we could provide an extern Datum AnalyzeWideValue; and direct FDW authors to use that particular datum. It could look like a toasted datum of external size WIDTH_THRESHOLD+1 but bear va_toastrelid == InvalidOid. Then, if future code movement leads us to actually examine one of these values, we'll get an early, hard error. That would be very convenient indeed. Even better would be a function extern Datum createAnalyzeWideValue(integer width) so that row width calculations could be more accurate. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Update comments for PGPROC/PGXACT split
On 13.05.2012 23:52, Noah Misch wrote: Many comment references to PGPROC and MyProc should now refer to PGXACT and MyPgXact. This patch attempts to cover all such cases. In some places, a comment refers collectively to all the xid-related fields, which span both structures. I variously changed those to refer to either or both structures depending on how I judged the emphasis. The PGXACT patch extracted the PGPROC struct formerly embedded in each GlobalTransaction. I updated a few comments accordingly. The procarray.c leading comment described the procarray as unsorted, but it is sorted by memory address; I removed that word. I include two typo-only fixes on related but already-accurate lines. Thanks, applied! -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WalSndWakeup() and synchronous_commit=off
On Friday, May 11, 2012 08:45:23 PM Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: Its the only place though which knows whether its actually sensible to wakeup the walsender. We could make it return whether it wrote anything and do the wakeup at the callers. I count 4 different callsites which would be an annoying duplication but I don't really see anything better right now. Another point here is that XLogWrite is not only normally called with the lock held, but inside a critical section. I see no reason to take the risk of doing signal sending inside critical sections. BTW, a depressingly large fraction of the existing calls to WalSndWakeup are also inside critical sections, generally for no good reason that I can see. For example, in EndPrepare(), why was the call placed where it is and not down beside SyncRepWaitForLSN? Hm. While I see no real problem moving it out of the lock I don't really see a way to cleanly outside critical sections everywhere. The impact of doing so seems to be rather big to me. The only externally visible place where it actually is known whether we write out data and thus do the wakeup is XLogInsert, XLogFlush and XLogBackgroundFlush. The first two of those are routinely called inside a critical section. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Gsoc2012 idea, tablesample
* Qi Huang (huangq...@hotmail.com) wrote: Thanks, guys, for your hot discussion. I'll explore the ANALYZE command first and try make SYSTEM sampling work. Other parts, I'll look at them later. That sounds like the right first steps to me. Reviewing this discussion, it was my thought to create a new node, ala seqscan, which implemented analyze's algorithm for scanning the table. The eventual idea being that analyze would actually use it in the future. There was mention up-thread about just calling the analyze code. Personally, I'd rather we make analyze more SQL like (once we have this functionality implemented) than make things which are supposed to be SQL call out into analyze bits. Thoughts? Other opinions? Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] hot standby PSQL 9.1 Windows 2008 Servers
Hi All, I have implemented hot standby for PostgreSQL with a group of two Primary and Standby in windows 2008 servers. Currently below are the settings: 1. Archiving is enabled on primary, stored on network storage. 2. Asynchronous Streaming replication from primary to standby. wal-senders=5, wal_keep_segments=32. 3. I am checking heartbeat of primary from standby, when it is down i am restarting service by changing postgresql.conf and recovery.conf settings. I do base backup only first time on standby when it is going to be replicated. when ever primary goes down, standby becomes primary and primary becomes standby when primary comes up. When primary becomes standby I am restoring data from WAL archive and start postgres service streaming replication to connect to primary. This setup is working. I have tested for few days in my network with huge data input to the primary and restarting the servers multiple times. I have observed these below specific errors in standby server and standby server is not replicating in anymore. *1. Invalid primary and secondary checkpoints. Very rare But happend 2. contrecord is requested by 0/DF20 -- Major one 3. record with zero length at 0/DF78, invalid record length at 0/DF78* These errors are shown up when primary switching to standby by recovering data from archive and standby stops replicating. *I am not able to start the service for that I should kill all the process of postgres and then do base backup and then start service on stnadby.* I don't have any prior experience working with postgreSQL. Please tell me what are the most reliable settings for Hot standby with streaming replication to work ? two 2008 servers and 1 network storage available. How to avoid above errors ?? -- View this message in context: http://postgresql.1045698.n5.nabble.com/hot-standby-PSQL-9-1-Windows-2008-Servers-tp5708637.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we still have commit_delay and commit_siblings?
On Sun, May 13, 2012 at 4:17 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: This code is our pre-9.2 group commit implementation, pretty much in its entirety: if (CommitDelay 0 enableFsync MinimumActiveBackends(CommitSiblings)) pg_usleep(CommitDelay); A semantic issue, I guess, but I would say that all the code that keeps the xlogctl-LogwrtRqst updated and reads from it is also part of the group commit implementation. This code is placed directly before the RecordTransactionCommit() call of XLogFlush(). It seeks to create a delay of commit_delay immediately prior to flushing when MinimumActiveBackends(CommitSiblings), in the hope that that delay will be enough that when we do eventually reach XLogFlush(), we can fastpath out of it due to Postgres by then having already flushed up to the XLogRecPtr we need flushed. I think it is more useful to put it the other way around. The hope is that when we eventually reach XLogFlush, we will find that other people have added their commit records, so that we can fsync theirs as well as our, letting them fast path out later on. Really if someone else is already doing the sleep, there is no reason for the current process to do so as well. All that will do is delay the time before the current process wakes up and realizes it has already been fsynced. Instead it should block on the other sleeping process. But that didn't help much if any when I tried it a couple years ago. It might work better now. ... The original group commit patch that Simon and I submitted deprecated these settings, and I fail to understand why the committed patch didn't do that too. These days, the only way this code could possibly result in a fastpath is here, at transam/xlog.c:2094, with the potentially stale (and likely too stale to be useful when new group commit is in play) LogwrtResult value: /* done already? */ if (XLByteLE(record, LogwrtResult.Flush)) break; Why is that likely too stale to be useful? It was updated just 4 lines previous. Now, previously, we could also fastpath a bit later, at about transam/xlog.c:2114: /* Got the lock */ LogwrtResult = XLogCtl-LogwrtResult; if (!XLByteLE(record, LogwrtResult.Flush)) { . } /* control might have fastpathed and missed the above block. We're done now. */ ...but now control won't even reach here unless we have the exclusive lock on WALWriteLock (i.e. we're the leader backend during 9.2's group commit), so fastpathing out of XLogFlush() becomes very rare in 9.2 . One illuminating way of quickly explaining the new group commit code is that it also inserts a delay at approximately the same place (well, more places now, since the delay was previously inserted only at the xact.c callsite of XLogFlush(), and there are plenty more sites than that), only that delay is now just about perfectly adaptive. That isn't quite the full truth, since we *also* have the benefit of *knowing* that there is an active leader/flusher at all times we're delayed. I don't get what you are saying. The new code does not have any more delays than the old code did, there is still only one pg_usleep(CommitDelay), and it is still in the same place. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we still have commit_delay and commit_siblings?
On Sun, May 13, 2012 at 11:07 PM, Simon Riggs si...@2ndquadrant.com wrote: Keeping a parameter without any clue as to whether it has benefit is just wasting people's time. We don't ADD parameters based on supposition, why should we avoid removing parameters that have no measured benefit? Using pgbench -T30 -c 2 -j 2 on a 2 core laptop system, with a scale that fits in shared_buffers: --commit-delay=2000 --commit-siblings=0 tps = 162.924783 (excluding connections establishing) --commit-delay=0 --commit-siblings=0 tps = 89.237578 (excluding connections establishing) Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we still have commit_delay and commit_siblings?
On Mon, May 14, 2012 at 3:15 AM, Magnus Hagander mag...@hagander.net wrote: On Mon, May 14, 2012 at 8:17 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, May 14, 2012 at 2:07 AM, Simon Riggs si...@2ndquadrant.com wrote: Keeping a parameter without any clue as to whether it has benefit is just wasting people's time. No, arguing that we should remove a parameter because it's useless when you haven't bothered to test whether or not it actually is useless is wasting people's time. It's most certainly not, IMHO. Discussing it here is *not* a waste of time. Or if any, it's a waste of time for a couple of people. If we leave it in, and it's useless, we waste the time of thousands of users. The choice between those two should be obvious. Discussing it in general is not a waste of time, but the argument that we should remove it because there's no evidence we should keep it is completely backwards. We should add OR remove things based on evidence, not the absence of evidence. There is certainly room for discussion about what amount of evidence is adequate, but I do not think zero is the right number. Now, interestingly, Jeff Janes just did some testing, and it shows almost a 2x speedup. I think that's a much better starting point for a productive discussion. Does that change your mind at all? Is it too small a boost to be relevant? Too artificial in some other way? It doesn't seem impossible to me that the recent group commit changes made it *easier* to get a benefit out of these settings than it was before. It may be that with the old implementation, it was hopeless to get any kind of improvement out of these settings, but it no longer is. Or maybe they're still hopeless. I don't have a strong opinion about that, and welcome discussion. But I'm always going to be opposed to adding or removing things on the basis of what we didn't test. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we still have commit_delay and commit_siblings?
On 14 May 2012 15:09, Robert Haas robertmh...@gmail.com wrote: I don't have a strong opinion about that, and welcome discussion. But I'm always going to be opposed to adding or removing things on the basis of what we didn't test. The subject of the thread is Why do we still have commit_delay and commit_siblings?. I don't believe that anyone asserted that we should remove the settings without some amount of due-diligence testing. Simon said that thorough testing on many types of hardware was not practical, which, considering that commit_delay is probably hardly ever (never?) used in production, I'd have to agree with. With all due respect, for someone that doesn't have a strong opinion on the efficacy of commit_delay in 9.2, you seemed to have a strong opinion on the standard that would have to be met in order to deprecate it. I think we all could stand to give each other the benefit of the doubt more. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq URL syntax vs SQLAlchemy
Alex a...@commandprompt.com writes: Peter Eisentraut pete...@gmx.net writes: I have been reviewing how our new libpq URL syntax compares against existing implementations of URL syntaxes in other drivers or higher-level access libraries. In the case of SQLAlchemy, there is an incompatibility regarding how Unix-domain sockets are specified. First, here is the documentation on that: http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html The recommended way to access a server over a Unix-domain socket is to leave off the host, as in: postgresql://user:password@/dbname In libpq, this is parsed as host='/dbname', no database. Ah, good catch: thanks for heads up. I believe this was introduced lately in the dev cycle when we've noticed that users will have to specify some defaults explicitly to be able to override other defaults, while avoiding the whole ?keyword=value... business. I'll give this another look and will get back with a proposal to fix this in form of a patch. Upon closer inspection of the issue I came to believe that the proper fix is to drop support for special treatment of host part starting with slash altogether. Attached is a patch to do that. While the following type of URIs is still valid, the interpretation is now different and is standards-conforming: the part after the double-slash is treated as path specification and not authority (host.) postgres:///path-spec Since the path from a URI translates into dbname connection option, the only meaningful use of this type of URIs is the following: postgres:///mydb The host part in this case is empty (it is hidden between the // and the following /,) thus local socket connection is employed for this type of URIs. To specify non-standard path to the local sockets directory use the familiar URI parameter: postgres:///db?host=/path/to/socket/dir Also, if my reading of the RFC is correct, the username, password and port specifiers may be omitted even if the corresponding designators are present in URI, so we need to remove some checks on empty URI parts. The test input and expected output files, the docs and code comments are also updated, of course. Finally, to complete the RFC compliance, I've added code to properly handle percent-encoding in query parameter keywords. At this point, I feel rather silly that we've produced and committed an almost compliant version, which still requires quite a bit of patching to become an RFC-conforming implementation... -- Regards, Alex libpq-uri-rfc-compliance.patch Description: libpq-uri-rfc-compliance.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Draft release notes complete
On Sun, May 13, 2012 at 09:01:03PM +0100, Peter Geoghegan wrote: On 12 May 2012 01:37, Robert Haas robertmh...@gmail.com wrote: Right. It's not a new feature; it's a performance improvement. We've had group commit for a long time; it just didn't work very well before. And it's not batching the commits better; it's reducing the lock contention around realizing that the batched commit has happened. This understanding of group commit is technically accurate, but the practical implications of the new code are that lots of people benefit from group commit, potentially to rather a large degree, where before they had exactly no benefit from group commit. We never officially called group commit group commit outside of git commit messages before. Therefore, it is sort of like we didn't have group commit before but now we do, and it's an implementation that's probably as effective as that of any of our competitors. It is for that reason that I suggested group commit get a more prominent billing, and that it actually be officially referred to as group commit. I'm glad that the release notes now actually refer to group commit. Now, I realise that as one of the authors of the feature I am open to accusations of lacking objectivity - clearly it isn't really my place to try and influence the feature's placement, and this is the last I will say on the matter unless someone else brings it up again. I just think that pedantically characterising this as an improvement to our existing group commit implementation within a document that will be read like a press release is a bad decision, especially since our competitors never had a group commit implementation that was far inferior to their current implementation. The assumption will be that it's a small improvement that's hardly worth noticing at all. Thanks for the summary. I know we talk about group commit, but I wasn't aware that it had not been exposed to our general users. I agree we need to reword the item as you suggested. So this group commit happens even if users don't change these? #commit_delay = 0 # range 0-10, in microseconds #commit_siblings = 5# range 1-1000 So the new release item wording will be: Add group commit capability for sessions that commit at the same time This is the git commit message: Make group commit more effective. When a backend needs to flush the WAL, and someone else is already flushing the WAL, wait until it releases the WALInsertLock and check if we still need to do the flush or if the other backend already did the work for us, before acquiring WALInsertLock. This helps group commit, because when the WAL flush finishes, all the backends that were waiting for it can be woken up in one go, and the can all concurrently observe that they're done, rather than waking them up one by one in a cascading fashion. This is based on a new LWLock function, LWLockWaitUntilFree(), which has peculiar semantics. If the lock is immediately free, it grabs the lock and returns true. If it's not free, it waits until it is released, but then returns false without grabbing the lock. This is used in XLogFlush(), so that when the lock is acquired, the backend flushes the WAL, but if it's not, the backend first checks the current flush location before retrying. Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although this patch as committed ended up being very different from that. (Heikki Linnakangas) Is that commit message inaccurate? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Draft release notes complete
On May 14, 2012, at 9:06 AM, Bruce Momjian br...@momjian.us wrote: So the new release item wording will be: Add group commit capability for sessions that commit at the same time This is the git commit message: Make group commit more effective. When a backend needs to flush the WAL, and someone else is already flushing the WAL, wait until it releases the WALInsertLock and check if we still need to do the flush or if the other backend already did the work for us, before acquiring WALInsertLock. This helps group commit, because when the WAL flush finishes, all the backends that were waiting for it can be woken up in one go, and the can all concurrently observe that they're done, rather than waking them up one by one in a cascading fashion. This is based on a new LWLock function, LWLockWaitUntilFree(), which has peculiar semantics. If the lock is immediately free, it grabs the lock and returns true. If it's not free, it waits until it is released, but then returns false without grabbing the lock. This is used in XLogFlush(), so that when the lock is acquired, the backend flushes the WAL, but if it's not, the backend first checks the current flush location before retrying. Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although this patch as committed ended up being very different from that. (Heikki Linnakangas) Is that commit message inaccurate? No, I think it's actually more accurate than the proposed release note wording. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Draft release notes complete
On Mon, May 14, 2012 at 9:06 AM, Bruce Momjian br...@momjian.us wrote: On Sun, May 13, 2012 at 09:01:03PM +0100, Peter Geoghegan wrote: On 12 May 2012 01:37, Robert Haas robertmh...@gmail.com wrote: Right. It's not a new feature; it's a performance improvement. We've had group commit for a long time; it just didn't work very well before. And it's not batching the commits better; it's reducing the lock contention around realizing that the batched commit has happened. This understanding of group commit is technically accurate, but the practical implications of the new code are that lots of people benefit from group commit, potentially to rather a large degree, where before they had exactly no benefit from group commit. We never officially called group commit group commit outside of git commit messages before. Therefore, it is sort of like we didn't have group commit before but now we do, and it's an implementation that's probably as effective as that of any of our competitors. It is for that reason that I suggested group commit get a more prominent billing, and that it actually be officially referred to as group commit. I'm glad that the release notes now actually refer to group commit. Now, I realise that as one of the authors of the feature I am open to accusations of lacking objectivity - clearly it isn't really my place to try and influence the feature's placement, and this is the last I will say on the matter unless someone else brings it up again. I just think that pedantically characterising this as an improvement to our existing group commit implementation within a document that will be read like a press release is a bad decision, especially since our competitors never had a group commit implementation that was far inferior to their current implementation. The assumption will be that it's a small improvement that's hardly worth noticing at all. Thanks for the summary. I know we talk about group commit, but I wasn't aware that it had not been exposed to our general users. I agree we need to reword the item as you suggested. So this group commit happens even if users don't change these? #commit_delay = 0 # range 0-10, in microseconds #commit_siblings = 5 # range 1-1000 If a bunch of people are standing around waiting for a door to unlock and they are mutually aware of each other, it has never been the case (or at least not for years) that the first person through the door would systematically slam it in everyone else's face. Is this enough to qualify as group commit? If so, group commit has always (again, at least for years) been there. The new code simply makes it less likely that the group will trip over each others feet as they all stream through the door. The commit_delay settings cover the case where the door unlocks, and you open it, but then perhaps you stand there for an a few minutes holding it open in case someone else happens to show up. This is pretty much orthogonal to the prior case. You can not wait for new people to show up, but trip over the feet of the people already there. Or you can wait for new people to show up, then trip over them. Or not trip over them, with or without waiting for new arrival. (For the analogy to work, this particular door refuses to unlock more than once every 5 minutes. Maybe it is for a very slow elevator) This is the git commit message: Make group commit more effective. When a backend needs to flush the WAL, and someone else is already flushing the WAL, wait until it releases the WALInsertLock and check if we still need to do the flush or if the other backend already did the work for us, before acquiring WALInsertLock. This helps group commit, because when the WAL flush finishes, all the backends that were waiting for it can be woken up in one go, and the can all concurrently observe that they're done, rather than waking them up one by one in a cascading fashion. This is based on a new LWLock function, LWLockWaitUntilFree(), which has peculiar semantics. If the lock is immediately free, it grabs the lock and returns true. If it's not free, it waits until it is released, but then returns false without grabbing the lock. This is used in XLogFlush(), so that when the lock is acquired, the backend flushes the WAL, but if it's not, the backend first checks the current flush location before retrying. Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although this patch as committed ended up being very different from that. (Heikki Linnakangas) Is that commit message inaccurate? I think the commit message is accurate, other than saying WALInsertLock where it meant WALWriteLock. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq URL syntax vs SQLAlchemy
- Цитат от Alex Shulgin (a...@commandprompt.com), на 14.05.2012 в 18:16 - Alex a...@commandprompt.com writes: The host part in this case is empty (it is hidden between the // and the following /,) thus local socket connection is employed for this type of URIs. To specify non-standard path to the local sockets directory use the familiar URI parameter: postgres:///db?host=/path/to/socket/dir And why are we calling host the parameter that specifies the path for socket dir - it is not host and could be confused with the host part of the URI (the part between // and /). Why do not call it path ? So it will become: postgres:///db?path=/path/to/socket/dir Best regards -- Luben Karavelov
Re: [HACKERS] WalSndWakeup() and synchronous_commit=off
On Mon, May 14, 2012 at 6:32 PM, Andres Freund and...@2ndquadrant.com wrote: On Friday, May 11, 2012 08:45:23 PM Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: Its the only place though which knows whether its actually sensible to wakeup the walsender. We could make it return whether it wrote anything and do the wakeup at the callers. I count 4 different callsites which would be an annoying duplication but I don't really see anything better right now. Another point here is that XLogWrite is not only normally called with the lock held, but inside a critical section. I see no reason to take the risk of doing signal sending inside critical sections. BTW, a depressingly large fraction of the existing calls to WalSndWakeup are also inside critical sections, generally for no good reason that I can see. For example, in EndPrepare(), why was the call placed where it is and not down beside SyncRepWaitForLSN? Hm. While I see no real problem moving it out of the lock I don't really see a way to cleanly outside critical sections everywhere. The impact of doing so seems to be rather big to me. The only externally visible place where it actually is known whether we write out data and thus do the wakeup is XLogInsert, XLogFlush and XLogBackgroundFlush. The first two of those are routinely called inside a critical section. So what about moving the existing calls of WalSndWakeup() out of a critical section and adding new call of WalSndWakeup() into XLogBackgroundFlush()? Then all WalSndWakeup()s are called outside a critical section and after releasing WALWriteLock. I attached the patch. Regards, -- Fujii Masao move_walsndwakeup_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq URL syntax vs SQLAlchemy
karave...@mail.bg writes: - Цитат от Alex Shulgin (a...@commandprompt.com), на 14.05.2012 в 18:16 - Alex a...@commandprompt.com writes: The host part in this case is empty (it is hidden between the // and the following /,) thus local socket connection is employed for this type of URIs. To specify non-standard path to the local sockets directory use the familiar URI parameter: postgres:///db?host=/path/to/socket/dir And why are we calling host the parameter that specifies the path for socket dir - it is not host and could be confused with the host part of the URI (the part between // and /). Why do not call it path ? So it will become: postgres:///db?path=/path/to/socket/dir We call it that way since we rely on existing libpq code to interpret the value of every parameter in the URI (well, almost: with notable exception of translating ssl=true for JDBC compatibility.) I don't think anyone would confuse host part of the URI with URI parameter ?host=... if we care to express things clearly in the documentation (which we do I believe.) Existing implementations, like that mentioned by Peter in the top message of this thread (SQLAlchemy or was it psycopg2?) already use this notation, so I don't think we can or should do anything about this, i.e. there's little point in renaming to path or merely supporting it as an alternative syntax. -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] incorrect handling of the timeout in pg_receivexlog
On Fri, May 11, 2012 at 11:43 PM, Magnus Hagander mag...@hagander.net wrote: Should we go down the easy way and just reject connections when the flag is mismatching between the client and the server (trivial to do - see the attached patch)? + char *tmpparam; You forgot to add const before char, which causes a compile-time warning. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Strange issues with 9.2 pg_basebackup replication
On May 13, 2012, at 3:08 PM, Josh Berkus wrote: More issues: promoting intermediate standby breaks replication. To be a bit blunt here, has anyone tested cascading replication *at all* before this? Josh, do you have scripts that you're using to do this testing? If so can you post them somewhere? AFAIK we don't have any regression tests for all this replication stuff, but ISTM that we need some... -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Ensure age() returns a stable value rather than the latest value
On lör, 2012-05-12 at 12:59 -0400, Tom Lane wrote: Now it's entirely likely that there is nobody out there relying on such a thing, but nonetheless this is a compatibility break, and an unnecessary one IMO. You haven't shown any convincing reason why we need to change the behavior of age() on master servers at all. Recall that this thread originally arose out of age() being called by a monitoring tool. It would be nice if repeatedly calling age() on an otherwise idle database would not change the result. Currently, you would never get a stable state on such a check, and moreover, you would not only get different results but different long-term behavior between master and standby. Now this is not that important and can be accommodated in the respective tools, but it is kind of weird. It would be like a check for disk space losing one byte at every check, even if you got it back later. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Ensure age() returns a stable value rather than the latest value
Peter Eisentraut pete...@gmx.net writes: On lör, 2012-05-12 at 12:59 -0400, Tom Lane wrote: Now it's entirely likely that there is nobody out there relying on such a thing, but nonetheless this is a compatibility break, and an unnecessary one IMO. You haven't shown any convincing reason why we need to change the behavior of age() on master servers at all. Recall that this thread originally arose out of age() being called by a monitoring tool. It would be nice if repeatedly calling age() on an otherwise idle database would not change the result. Currently, you would never get a stable state on such a check, and moreover, you would not only get different results but different long-term behavior between master and standby. Hm. Interesting argument, but why exactly would you expect that age() would work differently from, say, wall clock time? And how likely is it that a database that requires monitoring is going to have exactly zero transactions over a significant length of time? (In any case, my primary beef at the moment is not with whether it's a good idea to change age()'s behavior going forward, but rather with having back-patched such a change.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Draft release notes complete
On 14 May 2012 17:06, Bruce Momjian br...@momjian.us wrote: So this group commit happens even if users don't change these? #commit_delay = 0 # range 0-10, in microseconds #commit_siblings = 5 # range 1-1000 Yes, that's right - the new group commit is not configurable, and is used all the time. I believe that very few people ever change these other settings in production, because it is very risky to tweak them. Read the description of them in Greg Smith's book for a good example of what I mean, or read the docs - it is rare that adding delay by increasing this parameter will actually improve performance. There may actually be no one that's set commit_delay 0 at all. All these settings do is insert a sleep of commit_delay microseconds, in the hope that the call to XLogFlush() will then find that doing work is unnecessary due to some other session having got there first. So the new release item wording will be: Add group commit capability for sessions that commit at the same time I'd say that's almost what I'd like to see, because commit_delay falls far short of the definition of group commit established by other RDBMSs, which is, I assume, why the term was never used for advocacy purposes. The old facility could result in batching of commits specifically by artificially adding latency so that after the delay the sessions found they could fastpath. While it worked under artificial conditions, I think it resulted in relatively small improvements next to new group commit's order-of-magnitude increase in throughput that was demonstrated for a maximally commit-bound workload. The mere ability to notice that an XLogFlush() call is unnecessary and fastpath out could be argued to be an aboriginal group commit, predating even commit_delay, as could skipping duplicate fsync() requests in XLogWrite(), which I think Jeff pointed out, but I don't think anyone actually takes this position. What I'd suggest is something like: Add group commit capability so that when a session commits, other, concurrent sessions with pending commits will later automatically check if their commit has been satisfied by the original request (or even some other, intervening request) before actually proceeding. This results in a large increase in transaction throughput for commit-bound workloads, as under these circumstances the actual number of flush requests will be considerably lower than that of earlier versions of PostgreSQL. Now, granted, the number of actual flush requests being so high wasn't a problem because of any actual number of follow-through duplicate fsync() system calls (or writes) - XLogWrite() notices them, and probably has forever (although I read suggestion that some MySQL flavours might have actually been stupid enough to do so pre group commit). But in order to be able to notice this, backends previously had to get the WALWriteLock exclusively. I don't think that these are the kind of qualifications that belong here though. People rightfully only care about the practical implications, and whether or not we're competitive in various respects, such as whether or not we tick the group commit box, and whether or not we have pretty graphs that are comparable to those of other database systems. This is the git commit message: Make group commit more effective. When a backend needs to flush the WAL, and someone else is already flushing the WAL, wait until it releases the WALInsertLock and check if we still need to do the flush or if the other backend already did the work for us, before acquiring WALInsertLock. This helps group commit, because when the WAL flush finishes, all the backends that were waiting for it can be woken up in one go, and the can all concurrently observe that they're done, rather than waking them up one by one in a cascading fashion. This is based on a new LWLock function, LWLockWaitUntilFree(), which has peculiar semantics. If the lock is immediately free, it grabs the lock and returns true. If it's not free, it waits until it is released, but then returns false without grabbing the lock. This is used in XLogFlush(), so that when the lock is acquired, the backend flushes the WAL, but if it's not, the backend first checks the current flush location before retrying. Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although this patch as committed ended up being very different from that. (Heikki Linnakangas) Is that commit message inaccurate? I wouldn't say that it's inaccurate. However, if we were to deprecate commit_delay and commit_siblings, that would be a mere matter of removing this code (and the GUC stuff itself): if (CommitDelay 0 enableFsync MinimumActiveBackends(CommitSiblings)) pg_usleep(CommitDelay); This code is just before one call (of several) to XLogFlush(). The guts of the new group commit are in that
Re: [HACKERS] Draft release notes complete
On Mon, May 14, 2012 at 9:56 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Mon, May 14, 2012 at 9:06 AM, Bruce Momjian br...@momjian.us wrote: This is the git commit message: Make group commit more effective. When a backend needs to flush the WAL, and someone else is already flushing the WAL, wait until it releases the WALInsertLock and check if we still need to do the flush or if the other backend already did the work for us, before acquiring WALInsertLock. This helps group commit, because when the WAL flush finishes, all the backends that were waiting for it can be woken up in one go, and the can all concurrently observe that they're done, rather than waking them up one by one in a cascading fashion. This is based on a new LWLock function, LWLockWaitUntilFree(), which has peculiar semantics. If the lock is immediately free, it grabs the lock and returns true. If it's not free, it waits until it is released, but then returns false without grabbing the lock. This is used in XLogFlush(), so that when the lock is acquired, the backend flushes the WAL, but if it's not, the backend first checks the current flush location before retrying. Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although this patch as committed ended up being very different from that. (Heikki Linnakangas) Is that commit message inaccurate? I think the commit message is accurate, other than saying WALInsertLock where it meant WALWriteLock. Sorry, wrong number of negations. I think the commit message is accurate, other than Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Ensure age() returns a stable value rather than the latest value
On mån, 2012-05-14 at 15:11 -0400, Tom Lane wrote: Hm. Interesting argument, but why exactly would you expect that age() would work differently from, say, wall clock time? And how likely is it that a database that requires monitoring is going to have exactly zero transactions over a significant length of time? Yes, it will be a marginal case in practice, but it's something that a curious DBA might wonder about. But I think your example how age() behaves relative to an INSERT statement is more important. (In any case, my primary beef at the moment is not with whether it's a good idea to change age()'s behavior going forward, but rather with having back-patched such a change.) Certainly we should leave it alone there. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Draft release notes complete
On Wed, May 9, 2012 at 10:11 PM, Bruce Momjian br...@momjian.us wrote: I have completed my draft of the 9.2 release notes, and committed it to git. The beta release announcement is on postgresql.org with a direct link to the release notes. The notes lead off with: NARRATIVE HERE. Major enhancements include: MAJOR LIST HERE That looks a little unfinished -- this is exactly where many people land to see what's coming with the new release. Need some help putting together the major feature list? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Ensure age() returns a stable value rather than the latest value
On 14 May 2012 20:05, Peter Eisentraut pete...@gmx.net wrote: On lör, 2012-05-12 at 12:59 -0400, Tom Lane wrote: Now it's entirely likely that there is nobody out there relying on such a thing, but nonetheless this is a compatibility break, and an unnecessary one IMO. You haven't shown any convincing reason why we need to change the behavior of age() on master servers at all. Recall that this thread originally arose out of age() being called by a monitoring tool. It would be nice if repeatedly calling age() on an otherwise idle database would not change the result. Currently, you would never get a stable state on such a check, and moreover, you would not only get different results but different long-term behavior between master and standby. That's how it works now. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch pg_is_in_backup()
Sorry for the the double post but it seems that my previous reply doesn't reach the pgsql-hacker list. So here is the new patches that limit lines to 80 characters. Regards, Le 02/05/2012 19:53, Gabriele Bartolini a écrit : Hi Gilles, Sorry for the delay. Il 03/04/12 14:21, Gilles Darold ha scritto: +1, this is also my point of view. I have looked at the patch that contains both pg_is_in_backup() and pg_backup_start_time(). From a functional point of view it looks fine to me. I was thinking of adding the BackupInProgress() at the beginning of pg_backup_start_time(), but the AllocateFile() function already make sure the file exists. I have performed some basic testing of both functions and tried to inject invalid characters in the start time field of the backup_label file and it is handled (with an exception) by the server. Cool. I spotted though some formatting issues, in particular indentation and multi-line comments. Some rows are longer than 80 chars. Please resubmit with these cosmetic changes and it is fine with me. Thank you. Cheers, Gabriele -- Gilles Darold Administrateur de bases de données http://dalibo.com - http://dalibo.org diff -ru postgresql-head/doc/src/sgml/func.sgml postgresql/doc/src/sgml/func.sgml --- postgresql-head/doc/src/sgml/func.sgml 2012-04-03 10:44:43.979702643 +0200 +++ postgresql/doc/src/sgml/func.sgml 2012-04-03 11:21:15.075701574 +0200 @@ -14467,6 +14467,9 @@ primarypg_stop_backup/primary /indexterm indexterm +primarypg_backup_start_time/primary + /indexterm + indexterm primarypg_switch_xlog/primary /indexterm indexterm @@ -14532,6 +14535,13 @@ /row row entry +literalfunctionpg_backup_start_time()/function/literal +/entry + entrytypetimestamp with time zone/type/entry + entryGet start time of an online exclusive backup in progress./entry + /row + row + entry literalfunctionpg_switch_xlog()/function/literal /entry entrytypetext/type/entry diff -ru postgresql-head/src/backend/access/transam/xlogfuncs.c postgresql/src/backend/access/transam/xlogfuncs.c --- postgresql-head/src/backend/access/transam/xlogfuncs.c 2012-04-03 10:44:44.227702645 +0200 +++ postgresql/src/backend/access/transam/xlogfuncs.c 2012-04-03 11:21:15.075701574 +0200 @@ -29,6 +29,7 @@ #include utils/numeric.h #include utils/guc.h #include utils/timestamp.h +#include storage/fd.h static void validate_xlog_location(char *str); @@ -563,3 +564,67 @@ PG_RETURN_NUMERIC(result); } + +/* + * Returns start time of an online exclusive backup. + * + * When there's no exclusive backup in progress, the function + * returns NULL. + */ +Datum +pg_backup_start_time(PG_FUNCTION_ARGS) +{ + TimestampTzxtime; + FILE*lfp; + charfline[MAXPGPATH]; + char backup_start_time[30]; + + /* + * See if label file is present + */ + lfp = AllocateFile(BACKUP_LABEL_FILE, r); + if (lfp == NULL) + { + if (errno != ENOENT) + ereport(ERROR, +(errcode_for_file_access(), +errmsg(could not read file \%s\: %m, + BACKUP_LABEL_FILE))); + PG_RETURN_NULL(); + } + + /* + * Parse the file to find the the START TIME line. + */ + backup_start_time[0] = '\0'; + while (fgets(fline, sizeof(fline), lfp) != NULL) + { + if (sscanf(fline, START TIME: %25[^\n]\n, + backup_start_time) == 1) + break; + } + + /* + * Close the backup label file. + */ + if (ferror(lfp) || FreeFile(lfp)) { + ereport(ERROR, + (errcode_for_file_access(), + errmsg(could not read file \%s\: %m, + BACKUP_LABEL_FILE))); + } + + if (strlen(backup_start_time) == 0) { + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg(invalid data in file \%s\, + BACKUP_LABEL_FILE))); + } + + /* + * Convert the time string read from file to TimestampTz form. + */ + xtime = DatumGetTimestampTz( + DirectFunctionCall3(timestamptz_in, + CStringGetDatum(backup_start_time), + ObjectIdGetDatum(InvalidOid),Int32GetDatum(-1)) + ); + + PG_RETURN_TIMESTAMPTZ(xtime); +} + diff -ru postgresql-head/src/include/access/xlog_internal.h postgresql/src/include/access/xlog_internal.h --- postgresql-head/src/include/access/xlog_internal.h 2012-04-03 10:44:45.051702642 +0200 +++ postgresql/src/include/access/xlog_internal.h 2012-04-03 11:21:15.075701574 +0200 @@ -282,5 +282,6 @@ extern Datum pg_xlog_replay_resume(PG_FUNCTION_ARGS); extern Datum pg_is_xlog_replay_paused(PG_FUNCTION_ARGS); extern Datum pg_xlog_location_diff(PG_FUNCTION_ARGS); +extern Datum pg_backup_start_time(PG_FUNCTION_ARGS); #endif /* XLOG_INTERNAL_H */ diff -ru postgresql-head/src/include/catalog/pg_proc.h postgresql/src/include/catalog/pg_proc.h --- postgresql-head/src/include/catalog/pg_proc.h 2012-04-03 10:44:45.051702642 +0200 +++ postgresql/src/include/catalog/pg_proc.h 2012-04-03 11:21:15.075701574 +0200
Re: [HACKERS] Bugs in our Windows socket code
I wrote: I've been trying to figure out why my recent attempt to latch-ify the stats collector didn't work well on the Windows buildfarm machines. ... What I intend to try doing about this is making pgwin32_waitforsinglesocket detach its event object from the socket before returning, ie WSAEventSelect(socket, NULL, 0). This will hopefully decouple it a bit better from WaitLatchOrSocket. Well, I tried that, and it made things better on some of the Windows buildfarm critters, but not all. I then added some debug printouts to pgstat.c, and after a few go-rounds with that I have learned the following things: * In the original state of the latch-ified stats collector patch, control never actually reaches the WaitLatchOrSocket call at all, until the end of the regression test run when the postmaster sends a shutdown signal. pgstat.c only iterates the inner loop containing the recv() call, because the forced blocking behavior in pgwin32_recv causes it to wait at the recv() call until data is available. However, on the machines where the regression tests fails, somewhere during the proceedings recv() just stops returning at all. This can be seen for example at http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=narwhaldt=2012-05-14%2004%3A00%3A06 Now the really interesting point here is that given that WaitLatchOrSocket isn't being reached, there is hardly any difference between the pre-patched code and the patched code: before, we called pgwin32_waitforsinglesocket from pgstat.c and then did a recv() when it said OK, and in the patched code a similar call happens internally to pgwin32_recv. As far as I can see, the only material difference there is that pgstat.c specified a finite timeout when calling pgwin32_waitforsinglesocket while pgwin32_recv uses an infinite timeout. I now believe that the only reason the pre-patched code appeared to work is that sometimes it would time out and retry. With the infinite timeout in place, after whatever goes wrong goes wrong, it just hangs up until a signal event arrives. * I then modified pgstat.c to force pgwin32_noblock on while calling pgwin32_recv(). This caused pgstat's new loop logic to start behaving as intended, that is it would loop calling recv() as long as data was available and then fall out to the WaitLatchOrSocket call. However, it was still capable of randomly freezing at the WaitLatchOrSocket call, as in http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=narwhaldt=2012-05-14%2016%3A00%3A05 So whatever is wrong affects both the code in pgwin32_waitforsinglesocket and that in WaitLatchOrSocket. * Getting desperate, I then changed the WaitLatchOrSocket call to specify a two-second timeout, which is the same as we had been using in the pgwin32_waitforsinglesocket call before all this started. And behold, narwhal passes the regression tests again: http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=narwhaldt=2012-05-14%2022%3A00%3A05stg=check This trace shows positively that the stats collector occasionally fails to receive incoming data, causing it to stop dead until a timeout occurs (look for pgstat: wait result 0x8 log entries). After that it's again able to receive data. My conclusion is that the WSAEventSelect coding pattern we're using tickles something seriously broken in Windows Server 2003, causing it to sometimes lose FD_READ events. The other flavors of Windows we have in the buildfarm seem to perform more nearly according to Microsoft's documentation. Having already spent way more time on this than I cared to, my intention is to use the hack solution of making pgstat.c call WaitLatchOrSocket with a two-second timeout on Windows only, while using an infinite timeout elsewhere. I don't feel too awful about this because anybody who's expecting minimal power consumption needs to be using something other than Windows anyway. However, I'm now more convinced than ever that our Windows socket code is rickety to the point that it will fall over anytime somebody looks at it cross-eyed. I think that we need to get rid of the technique of detaching and re-attaching event objects to sockets; at least some versions of Windows just don't work with that. And we definitely need to get rid of that global variable pgwin32_noblock. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers