subject:"Re\: \[HACKERS\] Proposed LogWriter Scheme, WAS\: Potential Large Performance Gain in WAL synching"

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching

2002-10-07 Thread Zeugswetter Andreas SB SD



> > Keep in mind that we support platforms without O_DSYNC.  I am not
> > sure whether there are any that don't have O_SYNC either, but I am
> > fairly sure that we measured O_SYNC to be slower than fsync()s on
> > some platforms.

This measurement is quite understandable, since the current software 
does 8k writes, and the OS only has a chance to write bigger blocks in the
write+fsync case. In the O_SYNC case you need to group bigger blocks yourself.
(bigger blocks are essential for max IO)

I am still convinced, that writing bigger blocks would allow the fastest
solution. But reading the recent posts the solution might only be to change
the current "loop foreach dirty 8k WAL buffer write 8k" to one or two large 
write calls.  

Andreas

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching

2002-10-05 Thread Curtis Faith


> You are confusing WALWriteLock with WALInsertLock.  A
> transaction-committing flush operation only holds the former.
> XLogInsert only needs the latter --- at least as long as it
> doesn't need to write.

Well that make things better than I thought. We still end up with
a disk write for each transaction though and I don't see how this
can ever get better than (Disk RPM)/ 60 transactions per second,
since commit fsyncs are serialized. Every fsync will have to wait
almost a full revolution to reach the end of the log.

As a practial matter then everyone will use commit_delay to
improve this.
 
> This will pessimize performance except in the case where WAL traffic
> is very heavy, because it means you don't commit until the block
> containing your commit record is filled.  What if you are the only
> active backend?

We could handle this using a mechanism analogous to the current
commit delay. If there are more than commit_siblings other processes
running then do the write automatically after commit_delay seconds.

This would make things no more pessimistic than the current
implementation but provide the additional benefit of allowing the
LogWriter to write in optimal sizes if there are many transactions.

The commit_delay method won't be as good in many cases. Consider
a update scenario where a larger commit delay gives better throughput.
A given transaction will flush after commit_delay milliseconds. The
delay is very unlikely to result in a scenario where the dirty log 
buffers are the optimal size.

As a practical matter I think this would tend to make the writes
larger than they would otherwise have been and this would
unnecessarily delay the commit on the transaction.

> I do not, however, see any
> value in forcing all the WAL writes to be done by a single process;
> which is essentially what you're saying we should do.  That just adds
> extra process-switch overhead that we don't really need.

I don't think that an fsync will ever NOT cause the process to get
switched out so I don't see how another process doing the write would
result in more overhead. The fsync'ing process will block on the
fsync, so there will always be at least one process switch (probably
many) while waiting for the fsync to comlete since we are talking
many milliseconds for the fsync in every case.

> > The log file would be opened O_DSYNC, O_APPEND every time.
> 
> Keep in mind that we support platforms without O_DSYNC.  I am not
> sure whether there are any that don't have O_SYNC either, but I am
> fairly sure that we measured O_SYNC to be slower than fsync()s on
> some platforms.

Well there is no reason that the logwriter couldn't be doing fsyncs
instead of O_DSYNC writes in those cases. I'd leave this switchable
using the current flags. Just change the semantics a bit.

- Curtis

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching

2002-10-05 Thread Doug McNaught


Tom Lane <[EMAIL PROTECTED]> writes:

> "Curtis Faith" <[EMAIL PROTECTED]> writes:

> > The log file would be opened O_DSYNC, O_APPEND every time.
> 
> Keep in mind that we support platforms without O_DSYNC.  I am not
> sure whether there are any that don't have O_SYNC either, but I am
> fairly sure that we measured O_SYNC to be slower than fsync()s on
> some platforms.

And don't we preallocate WAL files anyway?  So O_APPEND would be
irrelevant?

-Doug

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching

2002-10-05 Thread Tom Lane

"Curtis Faith" <[EMAIL PROTECTED]> writes:
> Assume Transaction A which writes a lot of buffers and XLog entries,
> so the Commit forces a relatively lengthy fsynch.

> Transactions B - E block not on the kernel lock from fsync but on
> the WALWriteLock. 

You are confusing WALWriteLock with WALInsertLock.  A
transaction-committing flush operation only holds the former.
XLogInsert only needs the latter --- at least as long as it
doesn't need to write.

Thus, given adequate space in the WAL buffers, transactions B-E do not
get blocked by someone else who is writing/syncing in order to commit.

Now, as the code stands at the moment there is no event other than
commit or full-buffers that prompts a write; that means that we are
likely to run into the full-buffer case more often than is good for
performance.  But a background writer task would fix that.

> Back-end servers would not issue fsync calls. They would simply block
> waiting until the LogWriter had written their record to the disk, i.e.
> until the sync'd block # was greater than the block that contained the
> XLOG_XACT_COMMIT record. The LogWriter could wake up committed back-
> ends after its log write returns.

This will pessimize performance except in the case where WAL traffic
is very heavy, because it means you don't commit until the block
containing your commit record is filled.  What if you are the only
active backend?

My view of this is that backends would wait for the background writer
only when they encounter a full-buffer situation, or indirectly when
they are trying to do a commit write and the background guy has the
WALWriteLock.  The latter serialization is unavoidable: in that
scenario, the background guy is writing/flushing an earlier page of
the WAL log, and we *must* have that down to disk before we can declare
our transaction committed.  So any scheme that tries to eliminate the
serialization of WAL writes will fail.  I do not, however, see any
value in forcing all the WAL writes to be done by a single process;
which is essentially what you're saying we should do.  That just adds
extra process-switch overhead that we don't really need.

> The log file would be opened O_DSYNC, O_APPEND every time.

Keep in mind that we support platforms without O_DSYNC.  I am not
sure whether there are any that don't have O_SYNC either, but I am
fairly sure that we measured O_SYNC to be slower than fsync()s on
some platforms.

> The nice part is that the WALWriteLock semantics could be changed to
> allow the LogWriter to write to disk while WALWriteLocks are acquired
> by back-end servers.

As I said, we already have that; you are confusing WALWriteLock
with WALInsertLock.

> Many transactions would commit on the same fsync (now really a write
> with O_DSYNC) and we would get optimal write throughput for the log
> system.

How are you going to avoid pessimizing the few-transactions case?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching

Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching

4 matches

Site Navigation

Mail list logo

Footer information