Re: [HACKERS] Use of sync() [was Re: Potential Large Performance Gain in WAL synching]
Tom Lane [EMAIL PROTECTED] writes: Doug McNaught [EMAIL PROTECTED] writes: In my understanding, it means all currently dirty blocks in the file cache are queued to the disk driver. The queued writes will eventually complete, but not necessarily before sync() returns. I don't think subsequent write()s will block, unless the system is low on buffers and has to wait until dirty blocks are freed by the driver. We don't need later write()s to block. We only need them to not hit disk before the sync-queued writes hit disk. So I guess the question boils down to what queued to the disk driver means --- has the order of writes been determined at that point? It's certainy possible that new write(s) get put into the queue alongside old ones--I think the Linux block layer tries to do this when it can, for one. According to the manpage, Linux used to wait until everything was written to return from sync(), though I don't *think* it does anymore. But that's not mandated by the specs. So I don't think we can rely on such behavior (not reordering writes across a sync()), though it will probably happen in practice a lot of the time. AFAIK there isn't anything better than sync() + sleep() as far as the specs go. Yes, it kinda sucks. ;) -Doug ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] Use of sync() [was Re: Potential Large Performance Gain in WAL synching]
[EMAIL PROTECTED] (Tom Lane) writes: [snip] On a filesystem that does have that kind of problem, can't you avoid it just by using O_DSYNC on the WAL files? Then there's no need to call fsync() at all, except during checkpoints (which actually issue sync() not fsync(), anyway). This comment on using sync() instead of fsync() makes me slightly worried since sync() doesn't in any way guarantee that all data is written immediately. E.g. on *BSD with softupdates, it doesn't even guarantee that data is written within some deterministic time as far as I know (*). With a quick check of the code I found /* * mdsync() -- Sync storage. * */ int mdsync() { sync(); if (IsUnderPostmaster) sleep(2); sync(); return SM_SUCCESS; } which is ugly (imho) even if sync() starts an immediate and complete file system flush (which I don't think it does with softupdates). It seems to be used only by /* * FlushBufferPool * * Flush all dirty blocks in buffer pool to disk * at the checkpoint time * */ void FlushBufferPool(void) { BufferSync(); smgrsync(); /* calls mdsync() */ } so the question that remains is what kinds of guarantees FlushBufferPool() really expects and needs from smgrsync() ? If smgrsync() is called to make up for lack of fsync() calls in BufferSync(), I'm getting really worried :-) _ Mats Lofkvist [EMAIL PROTECTED] (*) See for example http://groups.google.com/groups?th=bfc8a0dc5373ed6e ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Use of sync() [was Re: Potential Large Performance Gain in WAL synching]
Mats Lofkvist [EMAIL PROTECTED] writes: [ mdsync is ugly and not completely reliable ] Yup, it is. Do you have a better solution? fsync is not the answer, since the checkpoint process has no way to know what files may have been touched since the last checkpoint ... and even if it could find that out, a string of retail fsync calls would kill performance, cf. Curtis Faith's complaint. In practice I am not sure there is a problem. The local man page for sync() says The writing, although scheduled, is not necessarily complete upon return from sync. Now if scheduled means will occur before any subsequently-commanded write occurs then we're fine. I don't know if that's true though ... regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Use of sync() [was Re: Potential Large Performance Gain in WAL synching]
Tom Lane [EMAIL PROTECTED] writes: In practice I am not sure there is a problem. The local man page for sync() says The writing, although scheduled, is not necessarily complete upon return from sync. Now if scheduled means will occur before any subsequently-commanded write occurs then we're fine. I don't know if that's true though ... In my understanding, it means all currently dirty blocks in the file cache are queued to the disk driver. The queued writes will eventually complete, but not necessarily before sync() returns. I don't think subsequent write()s will block, unless the system is low on buffers and has to wait until dirty blocks are freed by the driver. -Doug ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Use of sync() [was Re: Potential Large Performance Gain in WAL synching]
Doug McNaught [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: In practice I am not sure there is a problem. The local man page for sync() says The writing, although scheduled, is not necessarily complete upon return from sync. Now if scheduled means will occur before any subsequently-commanded write occurs then we're fine. I don't know if that's true though ... In my understanding, it means all currently dirty blocks in the file cache are queued to the disk driver. The queued writes will eventually complete, but not necessarily before sync() returns. I don't think subsequent write()s will block, unless the system is low on buffers and has to wait until dirty blocks are freed by the driver. We don't need later write()s to block. We only need them to not hit disk before the sync-queued writes hit disk. So I guess the question boils down to what queued to the disk driver means --- has the order of writes been determined at that point? regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster