>In particular, it would seriously degrade performance if the WAL file > isn't on its own spindle but has to share bandwidth with > data file access.
If the OS is stupid I could see this happening. But if there are buffers and some sort of elevator algorithm the I/O won't happen at bad times. I agree with you though that writing for every single insert probably does not make sense. There should be some blocking of writes. The optimal size would have to be derived empirically. > What we really want, of course, is "write on every revolution where > there's something worth writing" --- either we've filled a WAL blovk > or there is a commit pending. But that just gets us back into the > same swamp of how-do-you-guess-whether-more-commits-will-arrive-soon. > I don't see how an extra process makes that problem any easier. The whole point of the extra process handling all the writes is so that it can write on every revolution, if there is something to write. It doesn't need to care if more commits will arrive soon. > BTW, it would seem to me that aio_write() buys nothing over plain write() > in terms of ability to gang writes. If we issue the write at time T > and it completes at T+X, we really know nothing about exactly when in > that interval the data was read out of our WAL buffers. We cannot > assume that commit records that were stored into the WAL buffer during > that interval got written to disk. Why would we need to make that assumption? The only thing we'd need to know is that a given write succeeded meaning that commits before that write are done. The advantage to aio_write in this scenario is when writes cross track boundaries or when the head is in the wrong spot. If we write in reasonable blocks with aio_write the write might get to the disk before the head passes the location for the write. Consider a scenario where: Head is at file offset 10,000. Log contains blocks 12,000 - 12,500 ..time passes.. Head is now at 12,050 Commit occurs writing block 12,501 In the aio_write case the write would already have been done for blocks 12,000 to 12,050 and would be queued up for some additional blocks up to potentially 12,500. So the write for the commit could occur without an additional rotation delay. We are talking 85 to 200 milliseconds delay for this rotation on a single disk. I don't know how often this happens in actual practice but it might occur as often as every other time. - Curtis ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]