Re: [zfs-discuss] pools are zfs file systems?

Jeff Bonwick Sat, 15 Jul 2006 01:33:37 -0700

> Can anyone tell me why pool created with zpool are also zfs file systems
> (and mounted) which can be used for storing files? It would have been
> more transparent if pool would not allow the storage of files.


Grab a cup of coffee and get comfortable.  Ready?

Oh, what a can of worms this was.

During development, when we first added support for nested datasets,
the idea was that only leaf datasets would contain data; the interior
nodes would just be containers to group things and provide inheritance.

That worked pretty well, but it had some annoying properties.
For example, consider creating a pool with a container for home
directories, with a dataset per user within the home container.
Here's what you'd type in the original model:

        zpool create tank <devices>
        zfs create tank/home

Oops!  I've already made my first mistake.  I didn't specify that
tank/home should be a container, so I can't make datasets within it.
I should have specified the -c option to make a container:

        zfs create -c tank/home
        zfs set mountpoint=/export/home tank/home

OK, so now I'm ready to start populating /export/home with users:

        zfs create tank/home/ann --> creates /export/home/ann
        zfs create tank/home/bob --> creates /export/home/bob

All well and good.  Question: what happens if you type this?

        touch /export/home/foo

Stop.  Try to guess before reading any further.

Well?

Recall that tank/home is a container, so /export/home is not a ZFS
filesystem (at least, not until ZFS root) -- it's just a directory
in UFS, so foo is created in your UFS root filesystem!  Probably not
what you had in mind when you carved off the /export/home namespace
for ZFS.  We got several complaints about this behavior in beta.

OK, so what options did we have?

(1) Live with it.  But people really disliked the 'falling through
    to UFS' behavior.  It also made zpool import/export vulnerable
    to failure because the namespace wasn't fully self-contained.

(2) Make /export/home a special kind of mountpoint, redolent of
    automounter, such that the only thing it allows you to do is create
    ZFS mountpoints within it.  Possible, but strange, and the more
    you think about the semantic edge cases the less attractive it is.

(3) Eliminate the distinction between containers and leaf datasets.
    This is what we actually did.
    
An implicit side-effect of option (3) is that the topmost dataset (tank)
is not just a container, but a real filesystem.  Again we had choices:

(A) Make it a special case, so that "tank" has container-like semantics.

(B) Make it like every other ZFS filesystem.  This is what we did.

Admittedly, it seems weird that the zpool command creates a filesystem.
This violates the otherwise clean separation between pools and datasets.

On the other hand, one thing that always irritated me about the old
code is that you had to type *two* commands before you had a usable
filesystem.  For a first-time user, that's annoying.  It means you
have to absorb more ZFS-speak to get started.

One other irritation is that putting data in interior-node datasets makes
the accounting more complex.  With the container model it was obvious:
the space in tank/home was simply the sum of all its children, because
tank/home *itself* consumed no space.  With the everything's-a-filesystem
model there are two distinct numbers to report: the space consumed by the
dataset and its descendants, and the space consumed by the dataset itself.

Still, all things considered, this approach solved many more problems
than it created.  It's less error-prone; it's more self-contained;
it's more flexible (you can always create children of an existing
dataset); it eliminates an entire abstraction (dataset containers)
from the administrative model; and it provides the holy grail (mine,
anyway) of creating a mirrored filesystem with a single command.

Jeff

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] pools are zfs file systems?

Reply via email to