Thanks, Darren.
I've taken the liberty of reordering my follow-up for clarity. Just to
be clear, I'm not critiquing ZFS, just trying to learn it by pushing at
some of the corner cases of filesystem-system interactions. I'm trying
to figure out the implications of various ZFS features when used in
various ways.
Does that mean that paging out dirty mmap pages go to new places and
require metadata updates as well?
Yes.
Indeed, given that ZFS always writes new places (except for uberblock as
you noted), then that does make the snapshots easier and accounts for
there being no explicit code to "set COW on disk blocks", or such.
From a ufs/vxfs background, the thought of allocating additional disk
storage to page out to a memory mapped file or of changing metadata to
point to new blocks and hence needing to write out metadata as part of
paging out modified file data seems foreign to me. If the metadata
writes can be bundled into the same transaction, perhaps there's no more
serialization latency on a pageout... ? Is this also another case where
one might get ENOSPC when one doesn't on other filesystems (paging out
to an existing MMF in a full ZFS pool)?
From the comments in the source tour about the ZIL, I did note the
statement that file contents do not go through the ZIL unless needed for
O_DSYNC or fsynch() semantics, so I wasn't sure how else they might be
different.
How do snapshots interact with open files or files with pages in the
OpenSolaris page cache?
I don't believe they do. Are you thinking of something in particular?
I am generally interested in understanding file consistency and cache
coherency. I'd like to know what exactly is being snapshotted and what
is consistent within a snapshot.
I subsequently saw an earlier thread on "ZFS consistency guarantee"
(http://www.opensolaris.org/jive/thread.jspa;?messageID=124809) where
you and others pointed out that application state is not consistent at a
snapshot unless the application has been quiesced or otherwise brought
to a consistent state. Even then, I'm curious about the interaction
with the OpenSolaris page cache...
As is generally known and is explained well by Roch Bourbonnais in
http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine, NFS places an extra
requirement for committing writes to stable storage upon file close.
For local filesystems, a close() will complete without all modified file
data being written to disk yet. Does all such file data get into a
snapshot, or only data that has happened to be pushed out to disk by the
time of the snapshot? (e.g. local open(), write(), close(), snapshot).
It does look to me from the comment and call to zil_suspend from within
dmu_objset_snapshot_one that any changes that have made it to the
filesystem will get flushed out and included in the snapshot. This
should apply to any metadata operations that have completed such as
link, unlink, etc. But if the VM system is caching file contents after
a close (or at least nobody has pushed it out yet), is there any way to
guarantee that such data makes it into the snapshot?
In my earlier question I was also thinking about about other cases such
as MMF where application has written to page with or without msync() or
open file after write() but no fsync(). Depending upon how data of
closed files are handled, those may be moot.
Eric
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss