[ Charset ISO-8859-1 unsupported, converting... ]
> I got some information from Stephen Tweedie on this - please keep him
> "Cc:" as he's not on this list
> 
> ************************************************************************
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> 
> > I was talking to a Linux user yesterday, and he said that performance
> > using the xfs file system is pretty bad.  He believes it has to do with
> > the fact that fsync() on log-based file systems requires more writes.
> 
> 
> Performance doing what?  XFS has known performance problems doing
> unlinks and truncates, but not synchronous IO.  The user should be
> using fdatasync() for databases, btw, not fsync().

This is hugely helpful.  In PostgreSQL 7.1, we do use fdatasync() by
default it is available on a platform.


> First, XFS, ext3 and reiserfs are *NOT* log-based filesystems.  They
> are journaling filesystems.  They have a log, but they are not
> log-based because they do not store data permanently in a log
> structure.  Berkeley LFS, Sprite and Spiralog are log-based
> filesystems.

Sorry, I get those mixed up.

> > With a standard BSD/ext2 file system, WAL writes can stay on the same
> > cylinder to perform fsync.  Is that true of log-based file systems?
> 
> Not true on ext2 or BSD.  Write-aheads are _usually_ close to the
> inode, but not always.  For true log-based filesystems, writes are
> always completely sequential, so the issue just goes away.  For
> journaling filesystems, depending on the setup there may be a seek to
> the journal involved, but some journaling filesystems can use a
> separate disk for the journal so no seek is required.
> 
> > I know xfs and reiser are both log based.  Do we need to be concerned
> > about PostgreSQL performance on these file systems?  I use BSD FFS with
> > soft updates here, so it doesn't affect me.
> 
> A database normally preallocates its data files and then performs most
> of its writes using update-in-place.  In such cases, fsync() is almost
> always the wrong thing to be doing --- the data writes have changed
> nothing in the inode except for the timestamps, and there's no need to
> flush the timestamps to disk for every write.  fdatasync() is
> designed for this --- if the only inode change is timestamps,
> fdatasync() will skip the seek to the inode and will only update the
> data.  If any significant inode fields have been changed, then a full
> flush is done.

We do pre-allocate our log file space in chunks to avoid inode/block
index writes.

> Using fdatasync, most filesystems will incur no seeks for data flush,
> regardless of whether the filesystem is journaling or not.

Thanks.  That is a big help.  I wonder if people reporting performance
problems were using 7.0.3.  We only added fdatasync() in 7.1.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  [EMAIL PROTECTED]               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Reply via email to