Re: [zfs-discuss] Interaction between ZFS intent log and mmap'd files

2012-07-04 Thread Bob Friesenhahn

On Tue, 3 Jul 2012, James Litchfield wrote:


Agreed - msync/munmap is the only guarantee.


I don't see that the munmap definition assures that anything is 
written to disk.  The system is free to buffer the data in RAM as 
long as it likes without writing anything at all.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Interaction between ZFS intent log and mmap'd files

2012-07-04 Thread Nico Williams
On Wed, Jul 4, 2012 at 11:14 AM, Bob Friesenhahn
bfrie...@simple.dallas.tx.us wrote:
 On Tue, 3 Jul 2012, James Litchfield wrote:
 Agreed - msync/munmap is the only guarantee.

 I don't see that the munmap definition assures that anything is written to
 disk.  The system is free to buffer the data in RAM as long as it likes
 without writing anything at all.

Oddly enough the manpages at the Open Group don't make this clear.  So
I think it may well be advisable to use msync(3C) before munmap() on
MAP_SHARED mappings.  However, I think all implementors should, and
probably all do (Linux even documents that it does) have an implied
msync(2) when doing a munmap(2).  I really makes no sense at all to
have munmap(2) not imply msync(3C).

(That's another thing, I don't see where the standard requires that
munmap(2) be synchronous.  I think it'd be nice to have an mmap(2)
option for requesting whether munmap(2) of the same mapping be
synchronous or asynchronous.  Async munmap(2) - no need to mount
cross-calls, instead allowing to mapping to be torn down over time.
Doing a synchronous msync(3C), then a munmap(2) is a recipe for going
real slow, but if munmap(2) does not portably guarantee an implied
msync(3C), then would it be safe to do an async msync(2) then
munmap(2)??)

Nico
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Interaction between ZFS intent log and mmap'd files

2012-07-04 Thread John Martin

On 07/04/12 16:47, Nico Williams wrote:


I don't see that the munmap definition assures that anything is written to
disk.  The system is free to buffer the data in RAM as long as it likes
without writing anything at all.


Oddly enough the manpages at the Open Group don't make this clear.  So
I think it may well be advisable to use msync(3C) before munmap() on
MAP_SHARED mappings.  However, I think all implementors should, and
probably all do (Linux even documents that it does) have an implied
msync(2) when doing a munmap(2).  I really makes no sense at all to
have munmap(2) not imply msync(3C).


This assumes msync() has the behavior you expect.  See:

  http://pubs.opengroup.org/onlinepubs/009695399/functions/msync.html

In particular, the paragraph starting with For mappings to files, 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Interaction between ZFS intent log and mmap'd files

2012-07-04 Thread Stefan Ring
 I really makes no sense at all to
 have munmap(2) not imply msync(3C).

Why not? munmap(2) does basically the equivalent of write(2). In the
case of write, that is: a later read from the same location will see
the written data, unless another write happens in-between. If power
goes down following the write, all bets are off. And translated to
munmap: a subsequent call to mmap(2) that makes the previously
munmap-ped region available will make visible everything stored to the
region prior to the munmap call. If power goes down following the
munmap, all bets are off. In both cases, if you want your data to
persist across power losses, use sync -- fsync or msync.

If only the syncing variants were available, disk accesses would be
significantly slower, and disks would thrash rather audibly all the
time.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Interaction between ZFS intent log and mmap'd files

2012-07-04 Thread Peter Jeremy
On 2012-Jul-05 06:47:36 +1000, Nico Williams n...@cryptonector.com wrote:
On Wed, Jul 4, 2012 at 11:14 AM, Bob Friesenhahn
bfrie...@simple.dallas.tx.us wrote:
 On Tue, 3 Jul 2012, James Litchfield wrote:
 Agreed - msync/munmap is the only guarantee.

 I don't see that the munmap definition assures that anything is written to
 disk.  The system is free to buffer the data in RAM as long as it likes
 without writing anything at all.

Oddly enough the manpages at the Open Group don't make this clear.

They don't specify the behaviour on write(2) or close(2) either.  All
this means is that there is no guarantee that munmap(2) (or write(2)
or close(2)) will immediately flush the data to stable storage.

  So
I think it may well be advisable to use msync(3C) before munmap() on
MAP_SHARED mappings.

If you want to be certain that your changes will be flushed to stable
storage by a particular point in your program execution then you must
call msync(MS_SYNC) before munmap(2).

  However, I think all implementors should, and
probably all do (Linux even documents that it does) have an implied
msync(2) when doing a munmap(2).

There's nothing in the standard requiring this behaviour and it will
adversely impact performance in the general case so I would expect
that implementors _wouldn't_ force msync(2) on munmap(2).  FreeBSD
definitely doesn't.  As for Linux, I keep finding cases where, if a
standard doesn't mandate specific behaviour, Linux will implement (and
document) different behaviour to the way other OSs behave in the same
situation.

  I really makes no sense at all to
have munmap(2) not imply msync(3C).

Actually, it makes no more sense for munmap(2) to imply msync(2) than
it does for close(2) [which is functionally equivalent] to imply
fsync(2) - ie none at all.

(That's another thing, I don't see where the standard requires that
munmap(2) be synchronous.

http://pubs.opengroup.org/onlinepubs/009695399/functions/munmap.html
states Further references to these pages shall result in the
generation of a SIGSEGV signal to the process.  It's difficult to
see how to implement this behaviour unless munmap(2) is synchronous.

 Async munmap(2) - no need to mount
cross-calls, instead allowing to mapping to be torn down over time.
Doing a synchronous msync(3C), then a munmap(2) is a recipe for going
real slow, but if munmap(2) does not portably guarantee an implied
msync(3C), then would it be safe to do an async msync(2) then
munmap(2)??)

I don't understand what you are trying to achieve here.  munmap(2)
should be a relatively cheap operation so there is very little to be
gained by making it asynchronous.  Can you please explain a scenario
where munmap(2) would be slow (other than cases where implementors
have deliberately and unnecessarily made it slow).  I agree that
msync(MS_SYNC) is slow but if you want a guarantee that your data is
securely written to stable storage then you need to wait for that
stable storage.  msync(MS_ASYNC) should have no impact on a later
munmap(2) and it should always be safe to call msync(MS_ASYNC) before
munmap(2) (in fact, it's a good idea to maximise portability).

-- 
Peter Jeremy


pgp7hDyys4IEu.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss