Re: [zfs-discuss] Interaction between ZFS intent log and mmap'd files
> Actually, a write to memory for a memory mapped file is more similar to > write(2). If two programs have the same file mapped then the effect on the > memory they share is instantaneous because it is the same physical memory. > A mmapped file becomes shared memory as soon as it is mapped at least twice. True, for some interpretation of "instantaneous". It does not establish a happens-before relationship though, as store-munmap/mmap-load does. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interaction between ZFS intent log and mmap'd files
On Wed, 4 Jul 2012, Stefan Ring wrote: I really makes no sense at all to have munmap(2) not imply msync(3C). Why not? munmap(2) does basically the equivalent of write(2). In the case of write, that is: a later read from the same location will see the written data, unless another write happens in-between. If power Actually, a write to memory for a memory mapped file is more similar to write(2). If two programs have the same file mapped then the effect on the memory they share is instantaneous because it is the same physical memory. A mmapped file becomes shared memory as soon as it is mapped at least twice. It is pretty common for a system of applications to implement shared memory via memory mapped files with the mapped memory used for read/write. This is a precursor to POSIX's shm_open(3RT) which produces similar functionality without a known file in the filesystem Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interaction between ZFS intent log and mmap'd files
On Wed, 4 Jul 2012, Nico Williams wrote: Oddly enough the manpages at the Open Group don't make this clear. So I think it may well be advisable to use msync(3C) before munmap() on MAP_SHARED mappings. However, I think all implementors should, and probably all do (Linux even documents that it does) have an implied msync(2) when doing a munmap(2). I really makes no sense at all to have munmap(2) not imply msync(3C). As long as the system has a way to track which dirty pages map to particular files (Solaris historically does), it should not be necessary to synchronize the mapping to the underlying store simply due to munmap. It may be more efficient not do to that. The same pages may be mapped and unmapped many times by applications. In fact, several applications may memory map the same file so they access the same pages and it seems wrong to flush to underlying store simply because one of the applications no longer references the page. Since mmap() on zfs breaks the traditional coherent memory/filesystem that Solaris enjoyed prior to zfs, it may be that some rules should be different when zfs is involved because of its redundant use of memory (zfs ARC and VM page). Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interaction between ZFS intent log and mmap'd files
On 2012-Jul-05 06:47:36 +1000, Nico Williams wrote: >On Wed, Jul 4, 2012 at 11:14 AM, Bob Friesenhahn > wrote: >> On Tue, 3 Jul 2012, James Litchfield wrote: >>> Agreed - msync/munmap is the only guarantee. >> >> I don't see that the munmap definition assures that anything is written to >> "disk". The system is free to buffer the data in RAM as long as it likes >> without writing anything at all. > >Oddly enough the manpages at the Open Group don't make this clear. They don't specify the behaviour on write(2) or close(2) either. All this means is that there is no guarantee that munmap(2) (or write(2) or close(2)) will immediately flush the data to stable storage. > So >I think it may well be advisable to use msync(3C) before munmap() on >MAP_SHARED mappings. If you want to be certain that your changes will be flushed to stable storage by a particular point in your program execution then you must call msync(MS_SYNC) before munmap(2). > However, I think all implementors should, and >probably all do (Linux even documents that it does) have an implied >msync(2) when doing a munmap(2). There's nothing in the standard requiring this behaviour and it will adversely impact performance in the general case so I would expect that implementors _wouldn't_ force msync(2) on munmap(2). FreeBSD definitely doesn't. As for Linux, I keep finding cases where, if a standard doesn't mandate specific behaviour, Linux will implement (and document) different behaviour to the way other OSs behave in the same situation. > I really makes no sense at all to >have munmap(2) not imply msync(3C). Actually, it makes no more sense for munmap(2) to imply msync(2) than it does for close(2) [which is functionally equivalent] to imply fsync(2) - ie none at all. >(That's another thing, I don't see where the standard requires that >munmap(2) be synchronous. http://pubs.opengroup.org/onlinepubs/009695399/functions/munmap.html states "Further references to these pages shall result in the generation of a SIGSEGV signal to the process." It's difficult to see how to implement this behaviour unless munmap(2) is synchronous. > Async munmap(2) -> no need to mount >cross-calls, instead allowing to mapping to be torn down over time. >Doing a synchronous msync(3C), then a munmap(2) is a recipe for going >real slow, but if munmap(2) does not portably guarantee an implied >msync(3C), then would it be safe to do an async msync(2) then >munmap(2)??) I don't understand what you are trying to achieve here. munmap(2) should be a relatively cheap operation so there is very little to be gained by making it asynchronous. Can you please explain a scenario where munmap(2) would be slow (other than cases where implementors have deliberately and unnecessarily made it slow). I agree that msync(MS_SYNC) is slow but if you want a guarantee that your data is securely written to stable storage then you need to wait for that stable storage. msync(MS_ASYNC) should have no impact on a later munmap(2) and it should always be safe to call msync(MS_ASYNC) before munmap(2) (in fact, it's a good idea to maximise portability). -- Peter Jeremy pgp7hDyys4IEu.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interaction between ZFS intent log and mmap'd files
> I really makes no sense at all to > have munmap(2) not imply msync(3C). Why not? munmap(2) does basically the equivalent of write(2). In the case of write, that is: a later read from the same location will see the written data, unless another write happens in-between. If power goes down following the write, all bets are off. And translated to munmap: a subsequent call to mmap(2) that makes the previously munmap-ped region available will make visible everything stored to the region prior to the munmap call. If power goes down following the munmap, all bets are off. In both cases, if you want your data to persist across power losses, use sync -- fsync or msync. If only the syncing variants were available, disk accesses would be significantly slower, and disks would thrash rather audibly all the time. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interaction between ZFS intent log and mmap'd files
On 07/04/12 16:47, Nico Williams wrote: I don't see that the munmap definition assures that anything is written to "disk". The system is free to buffer the data in RAM as long as it likes without writing anything at all. Oddly enough the manpages at the Open Group don't make this clear. So I think it may well be advisable to use msync(3C) before munmap() on MAP_SHARED mappings. However, I think all implementors should, and probably all do (Linux even documents that it does) have an implied msync(2) when doing a munmap(2). I really makes no sense at all to have munmap(2) not imply msync(3C). This assumes msync() has the behavior you expect. See: http://pubs.opengroup.org/onlinepubs/009695399/functions/msync.html In particular, the paragraph starting with "For mappings to files, ...". ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interaction between ZFS intent log and mmap'd files
On Wed, Jul 4, 2012 at 11:14 AM, Bob Friesenhahn wrote: > On Tue, 3 Jul 2012, James Litchfield wrote: >> Agreed - msync/munmap is the only guarantee. > > I don't see that the munmap definition assures that anything is written to > "disk". The system is free to buffer the data in RAM as long as it likes > without writing anything at all. Oddly enough the manpages at the Open Group don't make this clear. So I think it may well be advisable to use msync(3C) before munmap() on MAP_SHARED mappings. However, I think all implementors should, and probably all do (Linux even documents that it does) have an implied msync(2) when doing a munmap(2). I really makes no sense at all to have munmap(2) not imply msync(3C). (That's another thing, I don't see where the standard requires that munmap(2) be synchronous. I think it'd be nice to have an mmap(2) option for requesting whether munmap(2) of the same mapping be synchronous or asynchronous. Async munmap(2) -> no need to mount cross-calls, instead allowing to mapping to be torn down over time. Doing a synchronous msync(3C), then a munmap(2) is a recipe for going real slow, but if munmap(2) does not portably guarantee an implied msync(3C), then would it be safe to do an async msync(2) then munmap(2)??) Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interaction between ZFS intent log and mmap'd files
On Tue, 3 Jul 2012, James Litchfield wrote: Agreed - msync/munmap is the only guarantee. I don't see that the munmap definition assures that anything is written to "disk". The system is free to buffer the data in RAM as long as it likes without writing anything at all. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interaction between ZFS intent log and mmap'd files
Agreed - msync/munmap is the only guarantee. On 07/ 3/12 08:47 AM, Nico Williams wrote: On Tue, Jul 3, 2012 at 9:48 AM, James Litchfield wrote: On 07/02/12 15:00, Nico Williams wrote: You can't count on any writes to mmap(2)ed files hitting disk until you msync(2) with MS_SYNC. The system should want to wait as long as possible before committing any mmap(2)ed file writes to disk. Conversely you can't expect that no writes will hit disk until you msync(2) or munmap(2). Driven by fsflush which will scan memory (in chunks) looking for dirty, unlocked, non-kernel pages to flush to disk. Right, but one just cannot count on that -- it's not part of the API specification. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interaction between ZFS intent log and mmap'd files
On Tue, Jul 3, 2012 at 9:48 AM, James Litchfield wrote: > On 07/02/12 15:00, Nico Williams wrote: >> You can't count on any writes to mmap(2)ed files hitting disk until >> you msync(2) with MS_SYNC. The system should want to wait as long as >> possible before committing any mmap(2)ed file writes to disk. >> Conversely you can't expect that no writes will hit disk until you >> msync(2) or munmap(2). > > Driven by fsflush which will scan memory (in chunks) looking for dirty, > unlocked, non-kernel pages to flush to disk. Right, but one just cannot count on that -- it's not part of the API specification. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interaction between ZFS intent log and mmap'd files
inline On 07/02/12 15:00, Nico Williams wrote: On Mon, Jul 2, 2012 at 3:32 PM, Bob Friesenhahn wrote: On Mon, 2 Jul 2012, Iwan Aucamp wrote: I'm interested in some more detail on how ZFS intent log behaves for updated done via a memory mapped file - i.e. will the ZIL log updates done to an mmap'd file or not ? I would to expect these writes to go into the intent log unless msync(2) is used on the mapping with the MS_SYNC option. You can't count on any writes to mmap(2)ed files hitting disk until you msync(2) with MS_SYNC. The system should want to wait as long as possible before committing any mmap(2)ed file writes to disk. Conversely you can't expect that no writes will hit disk until you msync(2) or munmap(2). Driven by fsflush which will scan memory (in chunks) looking for dirty, unlocked, non-kernel pages to flush to disk. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interaction between ZFS intent log and mmap'd files
On Mon, Jul 2, 2012 at 3:32 PM, Bob Friesenhahn wrote: > On Mon, 2 Jul 2012, Iwan Aucamp wrote: >> I'm interested in some more detail on how ZFS intent log behaves for >> updated done via a memory mapped file - i.e. will the ZIL log updates done >> to an mmap'd file or not ? > > > I would to expect these writes to go into the intent log unless msync(2) is > used on the mapping with the MS_SYNC option. You can't count on any writes to mmap(2)ed files hitting disk until you msync(2) with MS_SYNC. The system should want to wait as long as possible before committing any mmap(2)ed file writes to disk. Conversely you can't expect that no writes will hit disk until you msync(2) or munmap(2). Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interaction between ZFS intent log and mmap'd files
On Mon, 2 Jul 2012, Iwan Aucamp wrote: I'm interested in some more detail on how ZFS intent log behaves for updated done via a memory mapped file - i.e. will the ZIL log updates done to an mmap'd file or not ? I would to expect these writes to go into the intent log unless msync(2) is used on the mapping with the MS_SYNC option. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Interaction between ZFS intent log and mmap'd files
I'm interested in some more detail on how ZFS intent log behaves for updated done via a memory mapped file - i.e. will the ZIL log updates done to an mmap'd file or not ? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss