"Curtis Faith" <[EMAIL PROTECTED]> writes:
> Assume Transaction A which writes a lot of buffers and XLog entries,
> so the Commit forces a relatively lengthy fsynch.
> Transactions B - E block not on the kernel lock from fsync but on
> the WALWriteLock.
You are confusing WALWriteLock with WALInsertLock. A
transaction-committing flush operation only holds the former.
XLogInsert only needs the latter --- at least as long as it
doesn't need to write.
Thus, given adequate space in the WAL buffers, transactions B-E do not
get blocked by someone else who is writing/syncing in order to commit.
Now, as the code stands at the moment there is no event other than
commit or full-buffers that prompts a write; that means that we are
likely to run into the full-buffer case more often than is good for
performance. But a background writer task would fix that.
> Back-end servers would not issue fsync calls. They would simply block
> waiting until the LogWriter had written their record to the disk, i.e.
> until the sync'd block # was greater than the block that contained the
> XLOG_XACT_COMMIT record. The LogWriter could wake up committed back-
> ends after its log write returns.
This will pessimize performance except in the case where WAL traffic
is very heavy, because it means you don't commit until the block
containing your commit record is filled. What if you are the only
My view of this is that backends would wait for the background writer
only when they encounter a full-buffer situation, or indirectly when
they are trying to do a commit write and the background guy has the
WALWriteLock. The latter serialization is unavoidable: in that
scenario, the background guy is writing/flushing an earlier page of
the WAL log, and we *must* have that down to disk before we can declare
our transaction committed. So any scheme that tries to eliminate the
serialization of WAL writes will fail. I do not, however, see any
value in forcing all the WAL writes to be done by a single process;
which is essentially what you're saying we should do. That just adds
extra process-switch overhead that we don't really need.
> The log file would be opened O_DSYNC, O_APPEND every time.
Keep in mind that we support platforms without O_DSYNC. I am not
sure whether there are any that don't have O_SYNC either, but I am
fairly sure that we measured O_SYNC to be slower than fsync()s on
> The nice part is that the WALWriteLock semantics could be changed to
> allow the LogWriter to write to disk while WALWriteLocks are acquired
> by back-end servers.
As I said, we already have that; you are confusing WALWriteLock
> Many transactions would commit on the same fsync (now really a write
> with O_DSYNC) and we would get optimal write throughput for the log
How are you going to avoid pessimizing the few-transactions case?
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?