On Thu, Dec 1, 2011 at 9:58 AM, Jeff Janes <jeff.ja...@gmail.com> wrote:
> Waiting until the other one completes is how it currently is
> implemented, but is it necessary from a correctness view?  It seems
> like the WALWriteLock only needs to protect the write, and not the
> sync (assuming the sync method allows those to be separate actions),
> and that there could be multiple fsync requests from different
> processes pending at the same time without a correctness problem.

I've wondered about that, too.  At least on Linux, the overhead of a
system call seems to be pretty low - e.g. the ridiculous number of
lseek calls we do on a pgbench -S doesn't seem create much overhead
until the inode mutex starts to become contended; and that problem
should be fixed in Linux 3.2.  But I'm not sure if system calls are
similarly cheap on all platforms, or even if it's true on Linux for
fsync() in particular.

There's another possible approach here, too: instead of waiting to set
hint bits until the commit record hits the disk, we could allow the
hint bits to set immediately on the condition that we don't write it
out until the commit record hits the disk.  Bumping the page LSN would
do that, but I think that might be problematic since setting hint bits
isn't WAL-logged.  If so, we could possibly fix that by storing a
second LSN for the page out of line, e.g. in the buffer descriptor.
That might be even faster than speeding up the WAL flush.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to