Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching
> > Keep in mind that we support platforms without O_DSYNC. I am not > > sure whether there are any that don't have O_SYNC either, but I am > > fairly sure that we measured O_SYNC to be slower than fsync()s on > > some platforms. This measurement is quite understandable, since the current software does 8k writes, and the OS only has a chance to write bigger blocks in the write+fsync case. In the O_SYNC case you need to group bigger blocks yourself. (bigger blocks are essential for max IO) I am still convinced, that writing bigger blocks would allow the fastest solution. But reading the recent posts the solution might only be to change the current "loop foreach dirty 8k WAL buffer write 8k" to one or two large write calls. Andreas ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching
> You are confusing WALWriteLock with WALInsertLock. A > transaction-committing flush operation only holds the former. > XLogInsert only needs the latter --- at least as long as it > doesn't need to write. Well that make things better than I thought. We still end up with a disk write for each transaction though and I don't see how this can ever get better than (Disk RPM)/ 60 transactions per second, since commit fsyncs are serialized. Every fsync will have to wait almost a full revolution to reach the end of the log. As a practial matter then everyone will use commit_delay to improve this. > This will pessimize performance except in the case where WAL traffic > is very heavy, because it means you don't commit until the block > containing your commit record is filled. What if you are the only > active backend? We could handle this using a mechanism analogous to the current commit delay. If there are more than commit_siblings other processes running then do the write automatically after commit_delay seconds. This would make things no more pessimistic than the current implementation but provide the additional benefit of allowing the LogWriter to write in optimal sizes if there are many transactions. The commit_delay method won't be as good in many cases. Consider a update scenario where a larger commit delay gives better throughput. A given transaction will flush after commit_delay milliseconds. The delay is very unlikely to result in a scenario where the dirty log buffers are the optimal size. As a practical matter I think this would tend to make the writes larger than they would otherwise have been and this would unnecessarily delay the commit on the transaction. > I do not, however, see any > value in forcing all the WAL writes to be done by a single process; > which is essentially what you're saying we should do. That just adds > extra process-switch overhead that we don't really need. I don't think that an fsync will ever NOT cause the process to get switched out so I don't see how another process doing the write would result in more overhead. The fsync'ing process will block on the fsync, so there will always be at least one process switch (probably many) while waiting for the fsync to comlete since we are talking many milliseconds for the fsync in every case. > > The log file would be opened O_DSYNC, O_APPEND every time. > > Keep in mind that we support platforms without O_DSYNC. I am not > sure whether there are any that don't have O_SYNC either, but I am > fairly sure that we measured O_SYNC to be slower than fsync()s on > some platforms. Well there is no reason that the logwriter couldn't be doing fsyncs instead of O_DSYNC writes in those cases. I'd leave this switchable using the current flags. Just change the semantics a bit. - Curtis ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching
Tom Lane <[EMAIL PROTECTED]> writes: > "Curtis Faith" <[EMAIL PROTECTED]> writes: > > The log file would be opened O_DSYNC, O_APPEND every time. > > Keep in mind that we support platforms without O_DSYNC. I am not > sure whether there are any that don't have O_SYNC either, but I am > fairly sure that we measured O_SYNC to be slower than fsync()s on > some platforms. And don't we preallocate WAL files anyway? So O_APPEND would be irrelevant? -Doug ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching
"Curtis Faith" <[EMAIL PROTECTED]> writes: > Assume Transaction A which writes a lot of buffers and XLog entries, > so the Commit forces a relatively lengthy fsynch. > Transactions B - E block not on the kernel lock from fsync but on > the WALWriteLock. You are confusing WALWriteLock with WALInsertLock. A transaction-committing flush operation only holds the former. XLogInsert only needs the latter --- at least as long as it doesn't need to write. Thus, given adequate space in the WAL buffers, transactions B-E do not get blocked by someone else who is writing/syncing in order to commit. Now, as the code stands at the moment there is no event other than commit or full-buffers that prompts a write; that means that we are likely to run into the full-buffer case more often than is good for performance. But a background writer task would fix that. > Back-end servers would not issue fsync calls. They would simply block > waiting until the LogWriter had written their record to the disk, i.e. > until the sync'd block # was greater than the block that contained the > XLOG_XACT_COMMIT record. The LogWriter could wake up committed back- > ends after its log write returns. This will pessimize performance except in the case where WAL traffic is very heavy, because it means you don't commit until the block containing your commit record is filled. What if you are the only active backend? My view of this is that backends would wait for the background writer only when they encounter a full-buffer situation, or indirectly when they are trying to do a commit write and the background guy has the WALWriteLock. The latter serialization is unavoidable: in that scenario, the background guy is writing/flushing an earlier page of the WAL log, and we *must* have that down to disk before we can declare our transaction committed. So any scheme that tries to eliminate the serialization of WAL writes will fail. I do not, however, see any value in forcing all the WAL writes to be done by a single process; which is essentially what you're saying we should do. That just adds extra process-switch overhead that we don't really need. > The log file would be opened O_DSYNC, O_APPEND every time. Keep in mind that we support platforms without O_DSYNC. I am not sure whether there are any that don't have O_SYNC either, but I am fairly sure that we measured O_SYNC to be slower than fsync()s on some platforms. > The nice part is that the WALWriteLock semantics could be changed to > allow the LogWriter to write to disk while WALWriteLocks are acquired > by back-end servers. As I said, we already have that; you are confusing WALWriteLock with WALInsertLock. > Many transactions would commit on the same fsync (now really a write > with O_DSYNC) and we would get optimal write throughput for the log > system. How are you going to avoid pessimizing the few-transactions case? regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org