I present a number of connected proposals 1. Earlier, I suggested that the sync rep code would allow us to redesign the way we write WAL, using ideas from group commit. My proposal is that when when a backend needs to flush WAL to local disk it will be added to a SHMQUEUE exactly the same as when we flush WAL to sync standby. The WALWriter will be woken by latch and then perform the actual work. When complete WALWriter will wake the queue in order, so there is a natural group commit effect. The WAL queue will be protected by a new lock WALFlushRequestLock, which should be much less heavily contended than the way we do things now. Notably this approach will mean that all waiters get woken quickly, without having to wait for the queue of WALWriteLock requests to drain down, so commit will be marginally quicker. On almost idle systems this will give very nearly the same response time as having each backend write WAL directly. On busy systems this will give optimal efficiency by having WALWriter working in a very tight loop to perform the I/O instead of queuing itself to get the WALWriteLock with all the other backends. It will also allow piggybacking of commits even when WALInsertLock is not available.
2. A further idea is to use the same queue to reduce contention on accessing the ProcArray and Clog at end of transaction also. That would not be part of the initial work, but I'd want to bear in mind that possibility in the design stage at least if there were any choices to make. 3. In addition, we will send the WAL to standby servers as soon as it has been written, not flushed. As part of the chunk header the WALSender would include the known WAL flush ptr. So we would be sending WAL data to the standby ahead of it being flushed, but then only applying data up the flush ptr. This would mean we don't flush WAL fully and then send it, we partially overlap those operations to give us the option of saying we don't want to fsync remotely for additional speed (DRBD 'B' mode). 4. I'm tempted by the thought to make backends write their commit records but not flush them, which fits in with the above. 5. And we would finally get rid of the group commit parameters. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers