On 2/20/06, Jerry Jelinek <Gerald.Jelinek at sun.com> wrote:
> Mike,
> Mike Gerdts wrote:
> >> There is already an RFE open for this:
> >>
> >> 6383119 RFE: add support for using zfs clones when cloning a zone
> >
> >
> > Sigh... I so wish that RFE's (with relevant status information) were
> > accessible outside of Sun.
> I thought the bugs and RFEs were all accessible on opensolaris?
> I was able to pull up 6383119 on the opensolaris bug screen but
> I am not sure how easy it is to search for bugs/rfes.  When I
> put in "zones" & "zfs" under Solaris/Utilities, I got almost
> 1300 hits so that isn't very helpful although 6383119 was the first
> one on the list.

Ahh... my search was "zoneadm AND clone AND zfs" that yielded no
results.  Had it been "zone" instead of "zoneadm" I would have found
it.  (It looks like it is important to use AND rather than & to narrow

> I think we'll have a note posted about those discussions so everyone can see
> what kind of ideas we have internally and maybe get ideas for new projects
> that they want to start working themselves, if they are interested.  I haven't
> actually sent out the formal announcement about the zfs cloning to the alias
> yet because we are still having some discussions about what the final design
> should be (what options to offer, how it will actually behave, etc.).  I
> should be able to send that out later in the week though.  I do have most
> of the first cut at the code changes done, modulo some of these minor tweaks
> like options and things, so it is really far along.  We have been talking 
> about
> talking about integrating zfs clones since we started the initial cloning
> changes last November which is why it is almost done.

I look forward to this.

> > FWIW, with S10 GA and update 1 I have been doing an (unsupported) version of
> > cloning where I create an SVM soft partition for each zone, then use dd to
> > copy the data.  This has brought zone creation down from about 5 minutes
> > with really fast disks to less than one with most any disk.
> Yes, that is a good technique too.  The only worry is the manual editing of
> the zones index file that you have to do.  That is guaranteed to break in the
> near future.  We are working on a supported technique for that using the
> attach/detach -F option.  I emailed that proposal out to zones-discuss
> a while back.  The code is in final review and should be integrating any
> day now.

Yeah, I kinda expected that it would break at some point in time.  The
hope was that the ability to clone using zfs or some other improvement
would help out when that breakage happened.

> We are really interested in you (and others) getting involved with new ideas
> and features for zones and solaris in general.  I sure hope this hasn't put 
> you
> off in any way.  We would be really interested to hear what things you are
> doing and what other ideas you have for new features for zones.  If would be
> great to see those posted on zones-discuss so we can all see what needs to be
> done and who wants to work on it.

Before I get into what is good for my business, I feel I need a bit of
a disclaimer.  Code that I contribute to OpenSolaris is based upon
personal interest and is a separate pursuit from the work that I
perform for my employer.  My employer does not endorse or support my
development activity with OpenSolaris.  All equipment used to develop
OpenSolaris is my own.

Now, here is what makes sense for my business.  I would be happy to
provide you with an LSDC contract number to tip the scales in favor of
developing these features.

1. Currently the creation of the first zone on a machine (inheriting
/usr and /opt) takes longer than a jumpstart that installs a flar of
SUNWCXall.  This, of course, is only counting the actual install time,
not the POST and OS boot.  It would be helpful to be able to have
flars of standard zones that can be deployed as template or standard
zones.  These would likely contain the /etc/zones/<zone/template
name>.xml file.  Of course, there needs to be patch and package
compatibility between the OS that is running and the zone flar that is
applied.  This could be tricky, but I am confident it is not
insurmountable.  zonecfg and/or zoneadm should be enhanced to support

2. If copying a zone happens using a cpio mechanism, it should use the
same options as flash archives.  This would allow me to exclude
particular directories or files.  This is especially important to be
able to exclude any sparse files (e.g. /var/adm/lastlog), tmp files,
etc.  Bug 4480319

3. If cpio is used to clone zones, cpio should be enhanced to preserve
sparse files.  In my business, there are UIDs clustered at 0 - 64k,
100 million, 212 million, 500 million, and 900 million.  This causes
lastlog to increase in disk usage from negligible to gigabytes. 
Similar things would likely happen with ufs quotas files.  This
problem currently affects live upgrade, flash archives, and the nevada
b33 implementation of "zoneadm clone".  At one point I filed an RFE to
use GNU cpio with --sparse but it has not made it very far.

4.  "zoneadm boot" should accept options other than "-s".  In
particular, I should be able to boot to other milestones, including
arbitrary milestones that are site defined.  Networking should not be
up if booted to milestone/none.  (See #8 below).

5.  There should be a sysidcfg mechanism (maybe there is, but poorly
documented) that allows me to use multiple name services.  It is quite
likely that DNS the most commonly used name service for hosts and the
least common name service for anything else.  As it is now, I have to
choose between using a manual sysidcfg process or using an unsupported
automated process.

6.  By default, zones should inherit sysidcfg information from the
global zone or the template/source zone if being cloned.  This implies
that there needs to be a way to sys-almost-unconfig to clear out
things like ssh keys that really should be different between zones. 
However, I am fairly confident that there is no need to perform a
manual sysidcfg in most zones.

7.  There should be a way to specify commands that are run prior to
booting a zone and as part of the zone boot process.  For example, if
a zone is configured on multiple hosts that are attached to shared
storage, you may need to import a pool, diskset, diskgroup and mount
file systems prior to initiating the boot.  As the boot is happening,
there may be other tasks that are important to perform.  For example,
if the global zone and local zones are on different subnets, the
default route for the local zones subnets needs to be brought up after
the local zone IP is up but before the local zone starts to try to
access nework services like NFS and name services.

For one "zone farm" that we have, I re-wrote the zone startup process
in the global zone to do things like ping the addresses used by zones
before booting to help protect against the same zones running on
multiple machines.  It also does things like setting default routes if

8.  It should be possible to boot a zone with networking disabled. 
This would allow for things like staging a zone that is currently
running on another machine.  Certain operations (e.g. JASS, importing
services) need to happen as part of the zone initialization process
and should be able to be done when the same zone (really, a different
zone with the same hostname and IP running on a different machine) is
up on another machine.

As it is, if I am "migrating" a zone between machines, I have to
either experience many minutes of downtime or use zonecfg to remove
all networks, boot and perform initialization, shutdown, and use
zonecfg to replace the networks.  If I use my workaround, I can
routinely migrate zones between machines with less than 1 minute of
application downtime (assuming app starts reasonably fast).

9.  Rapid cloning of zones is very good stuff, as is automatic zfs
filesystem creation and deletion.  Assuming that ZFS makes it into
Solaris 10 (update 2?  Please?  As a supported "extra value" package
for early adopters?) these features should be backported to Solaris
10.  In my environment this is a feature that would accelerate the
adoption of ZFS.

> Thanks a lot for following up on this and working with us on zones,
> Jerry

My pleasure.  I really enjoy digging into this stuff.  As you are
meeting this week, if you have any questions, feel free to get in
touch with me.  I'll send you a private email with my work contact
info.  I would be happy to join in for a bit by phone if that would be

>         This fast-track enhances the Solaris Zones [1] subsystem
>         to address an existing RFE[2] to provide integrated support
>         for using ZFS clones when cloning zones.
>         Patch binding is requested for these sub-commands and the stability
>         of the interfaces is "evolving".  However, support for placing
>         zone roots on a ZFS filesystem will not be provided in a patch
>         release until bug 6356600 is fixed [3].
>         PSARC/2005/711 introduced support for cloning zones by copying
>         the data from a source zonepath to a target zonepath.  The case
>         extends that support to use ZFS clones whenever possible in place
>         of actually copying the data.

Excellent idea.  I especially like the automatic portion.  Does this
imply anything about "zoneadm install" as well?

>         The zonepath cloning technique will be automatic.  We will determine
>         what kind of filesystem the source zonepath and target zonepath is on.
>         If both are UFS or one is ZFS and the other UFS, the code will use the
>         existing cpio copy technique.  If both are ZFS we will take a snapshot

Please consider bumping up the priority to fix bug 4480319.  This adds
the third major Solaris utility that is severly impacted by this bug
in my environment.

>         of the source zonepath and then use a ZFS clone to set up the target
>         zonepath.  The destination zone zonepath value will be used to name
>         the ZFS clone.  This zonepath must also be in the same zpool or we 
> will
>         automatically fall back to using cpio to copy the data.  In fact, this
>         will be the behavior in all cases if use of a ZFS clone fails.  The
>         name of the snapshot(s) will be SUNWzoneX (where X is a unique id to
>         distinguish between multiple snapshots).
>         You might want to clone a source zone many times and we don't
>         want to take a snapshot each time.  PSARC/2005/711 introduced
>         an optional "-m method" parameter to the "zoneadm clone" subcommand
>         with the intention that this would be extended to support ZFS
>         clones.  The only current value for -m is "-m copy".  We will
>         extend the -m parameter to allow a "snapshot" value.  The following
>         argument would then be the name of an existing snapshot of
>         a zonepath filesystem.  This snapshot could be one we have taken
>         from an earlier clone or one that the sysadmin took themselves.

I like the idea of one snapshot.

>         For example:
>                 # zoneadm -z new-zone clone -m snapshot foo/acct at SUNWzone1
>         This case introduces a slight change in the behavior described in
>         PSARC/2005/711.  PSARC/2005/711 stated that the default behavior
>         for zone cloning was to "copy".  This case modifies that to make
>         the default behavior "do the right thing".  Now, if you specify
>         "-m copy" the zonepath will be copied, not ZFS cloned, even if
>         the source could be cloned.  If you don't specify a -m
>         option, the code will "do the right thing".  Since the code for
>         PSARC/2005/711 is not released yet, outside of Nevada and OpenSolaris,
>         this minor change in behavior should not be a factor.
>         Cloning a zone by using ZFS clones depends upon the source zonepath
>         being its own ZFS filesystem.  This is actually behavior that we
>         want to encourage since ZFS filesystems are cheap and offer other
>         benefits.  The "zoneadm install" subcommand will be extended to
>         automatically create a ZFS filesystem for the zonepath, if possible,
>         when you initially install a zone.  If the zonepath is not on a ZFS
>         filesystem then of course we won't do that.
>         It is possible that the sys-admin might not want to automatically
>         create a ZFS filesystem when installing a zone.  We will add an
>         optional -z parameter to "zoneadm install" which means that a ZFS
>         filesystem should not be created.  If the zonepath is not on
>         a ZFS filesystem in the first place, then this parameter will have
>         no effect.
>         When a zone is uninstalled, if the zonepath is a ZFS filesystem,
>         instead of using "rm -rf" to remove the zone root, we will destroy
>         the ZFS filesystem.
>         zoneadm subcommands
>                 clone [-m copy | snapshot]      Evolving
>                 install [-z]                    Evolving
>         The ZFS snapshot namespace
>         XXX/XXX at SUNWzoneX                       Stable
>         libzfs                                  Consolidation Private
>         (this is already imported by zones to support ZFS datasets within 
> zones)
> 1. PSARC 2002/174 Virtualization and Namespace Isolation in Solaris
> 2. RFE: add support for using zfs clones when cloning a zone Bugid 6383119
>         http://monaco.sfbay/detail.jsf?cr=6383119
> 3. 6356600 - zfs datasets can't be used to provide space for zone roots
>         http://monaco.sfbay/detail.jsf?cr=6356600

Mike Gerdts

Reply via email to