Re: [HACKERS] Fwd: Is the fsync() fake on FreeBSD6.1?

2006-09-24 Thread Ron Mayer
Andrew - Supernews wrote:
 
 Whether the underlying device lies about the write completion is another
 matter. All current SCSI disks have WCE enabled by default, which means
 that they will lie about write completion if FUA was not set in the
 request, which FreeBSD never sets. (It's not possible to get correct
 results by having fsync() somehow selectively set FUA, because that would
 leave previously-completed requests in the cache.)
 
 WCE can be disabled on either a temporary or permanent basis by changing
 the appropriate modepage. It's possible that Linux does this automatically,
 or sets FUA on all writes, though that would surprise me considerably;
 however I disclaim any knowledge of Linux internals.


The Linux SATA driver author Jeff Garzik suggests [note 1] that
The ability of a filesystem or fsync(2) to cause a [FLUSH|SYNC] CACHE
 command to be generated has only been present in the most recent [as of
 mid 2005] 2.6.x kernels.  See the write barrier stuff that people
 have been discussing.  Furthermore, read-after-write implies nothing
 at all.  The only way to you can be assured that your data has hit
 the platter is
   (1) issuing [FLUSH|SYNC] CACHE, or
   (2) using FUA-style disk commands
 It sounds like your test (or reasoning) is invalid.



Before those min-2005 2.6.x kernels apparently fsync on linux didn't
really try to flush caches even when drives supported it (which
apparently most actually do if the requests are actually sent).

[note 1] http://lkml.org/lkml/2005/5/15/82

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Fwd: Is the fsync() fake on FreeBSD6.1?

2006-09-23 Thread Tom Lane
Andrew - Supernews [EMAIL PROTECTED] writes:
 Whether the underlying device lies about the write completion is another
 matter. All current SCSI disks have WCE enabled by default, which means
 that they will lie about write completion if FUA was not set in the
 request, which FreeBSD never sets.

Huh?  The entire point of the SCSI command set is that it's not
necessary to lie about write completion for performance reasons, because
the architecture has always supported the concept of multiple requests
in-flight concurrently.  Has the disk drive industry gotten a whole lot
stupider in the fifteen years since I last wrote a SCSI driver?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Fwd: Is the fsync() fake on FreeBSD6.1?

2006-09-23 Thread Andrew - Supernews
On 2006-09-23, Tom Lane [EMAIL PROTECTED] wrote:
 Andrew - Supernews [EMAIL PROTECTED] writes:
 Whether the underlying device lies about the write completion is another
 matter. All current SCSI disks have WCE enabled by default, which means
 that they will lie about write completion if FUA was not set in the
 request, which FreeBSD never sets.

 Huh?  The entire point of the SCSI command set is that it's not
 necessary to lie about write completion for performance reasons, because
 the architecture has always supported the concept of multiple requests
 in-flight concurrently.

I seem to recall we've had this conversation previously.

 Has the disk drive industry gotten a whole lot
 stupider in the fifteen years since I last wrote a SCSI driver?

Quite possibly, yes.

I certainly would never claim that WCE is a good idea, or that having it
enabled by default is a good idea, I merely report the _fact_ that it is
indeed enabled by default on every SCSI drive that I have recently
encountered (over several different vendors).

On my database machines I am careful to disable it (and check that this
does indeed take effect). I would recommend that others do likewise. The
performance impact of disabling WCE is not serious (other than removing
the unsafe speed gains of course).

Since posting the previous response I've been directed to a document that
seems to imply that Linux drivers now attempt to handle write-order
guarantees by introducing the concept of a write barrier, i.e. a write
request which must complete after all previous writes and before all
subsequent ones.  Achieving this requires different strategies depending
on whether the underlying device allows command-queueing and/or exposes a
useful cache flush command; the implication of this is that for SCSI disks
with WCE, the linux driver will actually send SYNCHRONIZE CACHE when doing
a write barrier (which could be expensive of course). If (and I have no
idea if this is true) fsync() is implemented by means of such a barrier,
then this implies that an fsync()-heavy workload will perform much worse
on Linux when WCE is enabled than when it is disabled, since in the latter
case the driver will not issue SYNCHRONIZE CACHE and will simply ensure
that the relevent writes are all completed.

It would be interesting to see benchmarks of this.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Fwd: Is the fsync() fake on FreeBSD6.1?

2006-09-22 Thread mark
On Fri, Sep 22, 2006 at 01:52:02PM -0400, Jim Nasby wrote:
 I thought folks might be interested in this... note in particular the  
 comment about linux.
...
 From: Greg 'groggy' Lehey [EMAIL PROTECTED]
 Date: June 26, 2006 11:34:12 PM EDT
 To: leo huang [EMAIL PROTECTED]
 Cc: freebsd-performance@freebsd.org
 Subject: Re: Is the fsync() fake on FreeBSD6.1?
 ...
 My understanding from the last time I looked at the code was that
 fsync does the right thing:
 
  The fsync() system call causes all modified data and  
 attributes of fd to
  be moved to a permanent storage device.  This normally results  
 in all in-
  core modified copies of buffers for the associated file to be  
 written to
  a disk.
 
 This is not the case for Linux, where fsync syncs the entire file
 system.  That could explain some of the performance difference, but
 not all of it.  I suppose it's worth noting that, in general, people
 report much better performance with MySQL on Linux than on FreeBSD.

I see Greg's comment as contradictory. People see better performance with
MySQL on Linux than on FreeBSD, fsync() on Linux syncs the whole file
system?

I don't believe that fsync() on Linux syncs the whole file system
either.  This sounds made up, or a confusion with 'sync'. Perhaps
people @FreeBSD.org are not as familiar with Linux.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Fwd: Is the fsync() fake on FreeBSD6.1?

2006-09-22 Thread Tom Lane
[EMAIL PROTECTED] writes:
 I don't believe that fsync() on Linux syncs the whole file system
 either.

Indeed.  I'd disregard this as coming from someone who knows much
less than he thinks.

(The most likely explanation for his results, I expect, is that FreeBSD
is trying to fsync and the disk drive is lying to it, whereas on his
comparison Linux machine the drive is not configured to lie about
write-complete.)

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Fwd: Is the fsync() fake on FreeBSD6.1?

2006-09-22 Thread AgentM


On Sep 22, 2006, at 15:00 , [EMAIL PROTECTED] wrote:


On Fri, Sep 22, 2006 at 01:52:02PM -0400, Jim Nasby wrote:

I thought folks might be interested in this... note in particular the
comment about linux.

...

From: Greg 'groggy' Lehey [EMAIL PROTECTED]
Date: June 26, 2006 11:34:12 PM EDT
To: leo huang [EMAIL PROTECTED]
Cc: freebsd-performance@freebsd.org
Subject: Re: Is the fsync() fake on FreeBSD6.1?
...
My understanding from the last time I looked at the code was that
fsync does the right thing:

The fsync() system call causes all modified data and
attributes of fd to
be moved to a permanent storage device.  This normally results
in all in-
core modified copies of buffers for the associated file to be
written to
a disk.


This is probably the same issue that the hackers encountered on  
Darwin- namely fsync() flushes the kernel cache, but a further  
function call was needed to flush the hard drive buffers. This meets  
the standard's definition of fsync because the data is indeed moved  
to the device, but it happens to just be the device's buffer instead  
of non-volatile storage.


-M

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Fwd: Is the fsync() fake on FreeBSD6.1?

2006-09-22 Thread Andrew - Supernews
On 2006-09-22, Jim Nasby [EMAIL PROTECTED] wrote:
 I thought folks might be interested in this... note in particular the  
 comment about linux.

I don't believe that either person in that discussion knows what they are
really talking about.

fsync() on FreeBSD does, as is required, force any modified data for the
file, plus any metadata, plus any modifications to any parent directories,
to the underlying disk device and waits for that device to report the
write as complete.

Whether the underlying device lies about the write completion is another
matter. All current SCSI disks have WCE enabled by default, which means
that they will lie about write completion if FUA was not set in the
request, which FreeBSD never sets. (It's not possible to get correct
results by having fsync() somehow selectively set FUA, because that would
leave previously-completed requests in the cache.)

WCE can be disabled on either a temporary or permanent basis by changing
the appropriate modepage. It's possible that Linux does this automatically,
or sets FUA on all writes, though that would surprise me considerably;
however I disclaim any knowledge of Linux internals.

On FreeBSD, this command will disable WCE permanently on a SCSI drive:

echo 'WCE: 0' | camcontrol modepage daXX -m 8 -P3 -e

(use -P0 to disable it only temporarily, or you can use just the second of
those commands alone to interactively edit the mode page)

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---(end of broadcast)---
TIP 6: explain analyze is your friend