>In particular, it would seriously degrade performance if the WAL file
> isn't on its own spindle but has to share bandwidth with
> data file access.

If the OS is stupid I could see this happening. But if there are
buffers and some sort of elevator algorithm the I/O won't happen
at bad times.

I agree with you though that writing for every single insert probably
does not make sense. There should be some blocking of writes. The
optimal size would have to be derived empirically.

> What we really want, of course, is "write on every revolution where
> there's something worth writing" --- either we've filled a WAL blovk
> or there is a commit pending.  But that just gets us back into the
> same swamp of how-do-you-guess-whether-more-commits-will-arrive-soon.
> I don't see how an extra process makes that problem any easier.

The whole point of the extra process handling all the writes is so
that it can write on every revolution, if there is something to
write. It doesn't need to care if more commits will arrive soon.

> BTW, it would seem to me that aio_write() buys nothing over plain write()
> in terms of ability to gang writes.  If we issue the write at time T
> and it completes at T+X, we really know nothing about exactly when in
> that interval the data was read out of our WAL buffers.  We cannot
> assume that commit records that were stored into the WAL buffer during
> that interval got written to disk.

Why would we need to make that assumption? The only thing we'd need to
know is that a given write succeeded meaning that commits before that
write are done.

The advantage to aio_write in this scenario is when writes cross track
boundaries or when the head is in the wrong spot. If we write
in reasonable blocks with aio_write the write might get to the disk
before the head passes the location for the write.

Consider a scenario where:

    Head is at file offset 10,000.

    Log contains blocks 12,000 - 12,500

    ..time passes..

    Head is now at 12,050

    Commit occurs writing block 12,501

In the aio_write case the write would already have been done for blocks  
12,000 to 12,050 and would be queued up for some additional blocks up to
potentially 12,500. So the write for the commit could occur without an
additional rotation delay. We are talking 85 to 200 milliseconds
delay for this rotation on a single disk. I don't know how often this
happens in actual practice but it might occur as often as every other

- Curtis

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Reply via email to