Re: [zones-discuss] zones on shared storage proposal

Mike Gerdts Thu, 21 May 2009 10:00:20 -0700

On Thu, May 21, 2009 at 3:55 AM, Edward Pilatowicz
<edward.pilatow...@sun.com> wrote:
> hey all,
>
> i've created a proposal for my vision of how zones hosted on shared
> storage should work.  if anyone is interested in this functionality then
> please give my proposal a read and let me know what you think.  (fyi,
> i'm leaving on vacation next week so if i don't reply to comments right
> away please don't take offence, i'll get to it when i get back.  ;)
>
> ed


I'm very happy to see this.  Comments appear below.

> " please ensure that the vim modeline option is not disabled
> vim:textwidth=72
>
> -------------------------------------------------------------------------------
> Zones on shared storage (v1.0)
>
[snip]
> ----------
> C.1.i Zonecfg(1m)
>
> The zonecfg(1m) command will be enhanced with the following two new
> resources and associated properties:
>
>       rootzpool                               resource
>               src                             resource property
>               install-size                    resource property
>               zpool-preserve                  resource property
>               dataset                         resource property
>
>       zpool                                   resource
>               src                             resource property
>               install-size                    resource property
>               zpool-preserve                  resource property
>               name                            resource property
>
> The new resource and properties will be defined as follows:
>
> "rootzpool"
>     - Description: Identifies a shared storage object (and it's
>       associated parameters) which will be used to contain the root
>       zfs filesystem for a zone.
>
> "zpool"
>     - Description: Identifies a shared storage object (and it's
>       associated parameters) which will be made available to the
>       zone as a delegated zfs dataset.

That is to say "put your OS stuff in rootzpool, put everything else in
zpool" - right?

>
> "src"
>     - Status: Required.
>     - Format: Storage object uri (so-uri).  (See definition below.)
>     - Description: Identifies the storage object associated with this
>       resource.
>
> "install-size"
>     - Status: Optional.
>     - Format: Integer.  Defaults to bytes, but can be flagged as
>       gigabytes, kilobytes, or megabytes, with a g, k, or m suffix,
>       respectively.
>     - Description: If the specified storage object doesn't exist at zone
>       install time it will be created with this specific size.  This
>       property has no effect for storage objects which already exist and
>       have a pre-defined size.
>
> "zpool-preserve"
>     - Status: Optional.
>     - Format: Boolean.  Defaults to false.
>     - Description: When doing an install, if this property if this
>       property is set to true and a zpool already exists on the
>       specified storage object it will be used.  When doing a destroy,
>       if this property is set to true, the root zpool will not be
>       destroyed.
>
> "dataset"
>     - Status: Optional
>     - Format: zfs filesystem name component (can't contain a '/')
>     - Description: Name of a dataset within the root zpool to delegate
>       to the zone.
>
> "name"
>     - Status: Required
>     - Format: zfs filesystem name component (can't contain a '/')
>     - Description: Used as part of the name for a zpool which will be
>       delegated to the zone.
>
> Zonecfg(1m) "verify" will verify the syntax of any "rootzpool" resource
> group (and its properties), but it will NOT verify the accessibility of
> any storage specified by by a so-uri.  (This is because accessing the
> storage specified by an so-uri could require configuration changes to
> other subsystems.)
>
>
> ----------
> C.1.ii Storage object uri (so-uri) format
>
> The storage object uri (so-uri) syntax[03] will conform to the standard
> uri format defined in RFC 3986 [04].  The nfs URI scheme is defined in
> RFC 2224 [05].  The so-uri syntax can be summarised as follows:
>
> File storage objects:
>
>     path:///<file-absolute>
>     nfs://<host>[:port]/<file-absolute>
>
> Vdisk storage objects:
>
>     vpath:///<file-absolute>
>     vnfs://<host>[:port]/<file-absolute>
>
> Device storage objects:
>
>     fc:///wwn[@<lun>]
>     iscsi:///alias=<alias>[@<lun>]
>     iscsi:///target=<target>[@<lun>]
>     iscsi://host[:port]/[tpgt=<tpgt>/]target=<target>[@<lun>]
>
> File storage objects point to plain files on a local, nfs, or cifs
> filesystems.  These files are used to contain zpools which store zone
> datasets.  These are the simplest types of storage objects.  Once
> created, they have a fixed size, can't be grown, and don't support
> advanced features like snapshotting, etc.  Some example file so-uri's
> are:
>
> path:///export/xvm/vm1.disk
>       - a local file
> path:///net/heaped.sfbay/export/xvm/1.disk
>       - a nfs file accessible via autofs
> nfs://heaped.sfbay/export/xvm/1.disk
>       - same file specified directly via a nfs so-uri
>
> Vdisk storage objects are similar to file storage objects in that they
> can live on local, nfs, or cifs filesystems, but they each have their
> own special data format and varying featuresets, with support for things
> like snapshotting, etc..  Some common vdisk formats are: VDI, VMDK and
> VHD.  Some example vdisk so-uri's are:
>
> vpath:///export/xvm/vm1.vmdk
>       - a local vdisk image
> vpath:///net/heaped.sfbay/export/xvm/1.vmdk
>       - a nfs vdisk image accessible via autofs
> vnfs://heaped.sfbay/export/xvm/1.vmdk
>       - same vdisk image specified directly via a nfs so-uri
>
> Device storage objects specify block storage devices in a host
> independant fashion.  When configuring FC or iscsi storage on different
> hosts, the storage configuration normally lives outsize of zonecfg, and
> the configured storage may have varying /dev/dsk/cXtXdX* names.  The
> so-uri syntax provides a way to specify storage in a host independent
> fashion, and during zone management operations, the zones framework can
> map this storage to a host specific device path.  Some example device
> so-uri's are:
>
> fc:///20000014c3474...@0
>       - lun 0 of a fc disk with the specified wwn
> iscsi:///alias=oracle zone r...@0
>       - lun 0 of an iscsi disk with the specified alias.
> iscsi:///target=iqn.1986-03.com.sun:02:38abfd16-78c5-c58e-e629-ea77a33c6740
>       - lun 0 of an iscsi disk with the specified target id.

What about if there is already the necessary layer of abstraction that
provides a consistent namespace?  For example,
/dev/vx/dsk/zone1dg/rootvol would refer to a block device named rootvol
in the disk group zone1dg.  That may reside on a single disk or span
many disks and will have the same name regardless of which host the disk
group is imported on.  Since this VxVM volume may span many disks, it
would be inappropriate to refer to a single LUN that makes up that disk
group.

Perhaps the following is appropriate for such situations.

dev:///dev/vx/dsk/zone1dg/rootvol


> ----------
> C.1.iii Zoneadm(1m) install
>
> When a zone is installed via the zoneadm(1m) "install" subcommand, the
> zones subsystem will first verify that any required so-uris exist and
> are accessible.
>
> If an so-uri points to a plain file, nfs file, or vdisk, and the object
> does not exist, the object will be created with the install-size that
> was specified via zonecfg(1m).  If the so-uri does not exist and an
> install-size was not specified via zonecfg(1m) an error will be
> generated and the install will fail.
>
> If an so-uri points to an explicit nfs server, the zones framework will
> need to mount the nfs filesystem containing storage object.  The nfs
> server share containing the specified object will be auto-mounted at:
>
>       /var/zones/nfsmount/<zonename>/<host>/<nfs-share-name>

Just for clarity, I think you mean:

- "will be mounted at".  I think "auto-mounted" conjures up the idea
  that there is integration with autofs.
- <host> is the NFS server
- <nfs-share-name> is the path on the NFS server.  Is this the exact
  same thing as <path-absolute> in the URI specification?  Is this the
  file that is mounted or the directory above the file?

My storage administrators give me grief if I create too many NFS mounts
(but I am not sure I've heard a convincing reason).  As I envision NFS
server layout, I think I would see something like:

vol
  zones
    zone1
      rootzpool
      zpool
    zone2
      rootzpool
      zpool
    zone3
      rootzpool
      zpool

It seems as though if these three zones are all running on the same box
the box will have at least the following mounts:

/var/zones/nfsmount/zone1/nfsserver/vol/zones/zone1
/var/zones/nfsmount/zone2/nfsserver/vol/zones/zone2
/var/zones/nfsmount/zone3/nfsserver/vol/zones/zone3

But maybe as many as:

/var/zones/nfsmount/zone1/nfsserver/vol/zones/zone1/rootzpool
/var/zones/nfsmount/zone1/nfsserver/vol/zones/zone1/zpool
/var/zones/nfsmount/zone2/nfsserver/vol/zones/zone2/rootzpool
/var/zones/nfsmount/zone2/nfsserver/vol/zones/zone2/zpool
/var/zones/nfsmount/zone3/nfsserver/vol/zones/zone3/rootzpool
/var/zones/nfsmount/zone3/nfsserver/vol/zones/zone3/zpool

With a slightly different arrangment this could be reduced to one.
Change

>       /var/zones/nfsmount/<zonename>/<host>/<nfs-share-name>

To:

        /var/zones/nfsmount/<host>/<nfs-share-name>/<zonename>/<file>

I can see that this would complicate things a bit because it would be
hard to figure out how far up the path is the right place for the mount.

Perhaps if this is what I would like I would be better off adding a
global zone vfstab entry to mount nfsserver:/vol/zones somewhere and use
the path:/// uri instead.

Thoughts?

> If an so-uri points to a fibre channel lun, the zones subsystem will
> verify that the specified wwn corresponds to a global zone accessible
> fibre channel disk device.
>
> If an so-uri points to an iSCSI target or alias, the zones subsystem
> will verify that the iSCSI device is accessible on the local system.  If
> an so-uri points to a static iSCSI target and that target is not
> already accessible on the local host, then the zones subsystem will
> enable static discovery for the local iSCSI initiator and attempt to
> apply the specified static iSCSI configuration.  If the iSCSI target
> device is not accessible then the install will fail.
>
> Once a zones install has verified that any required so-uri exists and is
> accessible, the zones subsystem will need to initialise the so-uri.  In
> the case of a path or nfs path, this will involve creating a zpool
> within the specified file.  In the case of a vdisk, fibre channel lun,
> or iSCSI lun, this will involve creating a EFI/GPT partition on the
> device which uses the entire disk, then a zpool will be created within
> this partition.  For data protection purposes, if a storage object
> contains any pre-existing partitions, zpools, or ufs filesystems, the
> install will fail will fail with an appropriate error message.  To

s/will fail will fail/will fail/

> continue the installation and overwrite any pre-existing data, the user
> will be able to specify a new '-f' option to zoneadm(1m) install.  (This
> option mimics the '-f' option used by zpool(1m) create.)
>
> If zpool-preserve is set to true, then before initialising any target
> storage objects, the zones subsystem will attempt to import a
> pre-existing zpool from those objects.  This will allow users to
> pre-create a zpool with custom creation time options, for use with
> zones.  To successfully import a pre-created zpool for a zone install,
> that zpool must not be attached.  (Ie, any pre-created zpool must be
> exported from the system where it was created before a zone can be
> installed on it.)  Once the zpool is imported the install process will
> check for the existence of a /ROOT filesystem within the zpool.  If this
> filesystem exists the install will fail with an appropriate error
> message.  To continue the installation the user will need to specify the
> '-f' option to zoneadm(1m) install, which will cause the zones framework
> to delete the pre-existing /ROOT filesystem within the zpool.

Is this because the zone root will be installed <zonepath>/ROOT/<bename>
rather than <zonepath>/root?

> The newly created or imported root zpool will be named after the zone to
> which it is associated, with the assigned name being "<zonename>_rpool".
> This zpool will then be mounted at the zones rootpath and then the
> install process will continue normally[07].

This seems odd... why not have the root zpool mounted at zonepath rather
than zoneroot?  This way (e.g.) SUNWdetached.xml would follow the zone
during migrations.

> XXX: use altroot at zpool creation or just manually mount zpool?
>
> If the user has specified a "zpool" resource, then the zones framework
> will configure, initialize, and/or import it in a similar manaer to a
> zpool specified by the "rootzpool" resource.  The key differences are
> that the name of the newly created or imported zpool will be
> "<zonename>_<name>".  The specified zpool will also have the zfs "zoned"
> property set to "on", hence it will not be mounted anywhere in the
> global zone.
>
> XXX: do we need "zpool import -O file-system-property=" to set the
>      zoned property upon import.
>
> Once a zone configured with a so-uri is in the installed state, the
> zones framework needs a mechanism to mark that storage as in use to
> prevent it from being accessed by multiple hosts simultaneously.  The
> most likely situation where this could happen is via a zoneadm(1m)
> attach on a remote host.  The easiest way to achieve this is to keep the
> zpools associated with the storage imported and mounted at all times,
> and leverage the existing zpool support for detecting and preventing
> multi-host access.
>
> So whenever a global zone boots and the zones smf service runs, it will
> attempt to configure and import any shared storage objects associated
> with installed zones.  It will then continue to behave as it does today
> and boot any installed zones that have the autoboot property set.  If
> any shared sorage objects fail to configure or import, then:
>
> - the zones associated with the failed storage will be transitioned
>   to the "uninstalled" state.

Is "uninstalled" a real state?  Perhaps "configured" is more
appropriate, as this allows a transition to "installed" via "zoneadm
attach".

> - an error message will be emitted to the zones smf log file.
> - after booting any remaning installed zones that have autoboot set
>   to true, the zones smf service will enter the "maintainence" state,
>   there by prompting the administrator to look at the zones smf log
>   file.
>
> After fixing any problems with shared storage accessibility, the
> admin should be able to simply re-attach the zone to the system.
>
> Currently the zones smf service is dependant upon multi-user-server, so
> all networking services required for access to shared storage should be
> propertly configured well before we try to import any shared storage
> associated with zones.

May I propose a fix to the zones SMF service as part of this?  The
current integration with the global zone's SMF is rather weak in
reporting the real status of zones and allowing the use of SMF for
controlling the zones service.  In particular:

- If a zone fails to start, the state of svc:/system/zones:default does
  not reflect a maintenance or degraded state.
- If an admin wishes to start a zone the same way that the system would
  do it, "svcadm restart" and similar have the side effect of rebooting
  all zones on the system.
- There is no way to establish dependencies between zones or between a
  zone and something that needs to happen in the global zone.
- There isn't a good way to allow certain individuals within the global
  zone the ability to start/stop specific zones with RBAC or
  authorizations.

I propose that:

- zonecfg creates a new services instance svc:/system/zones:zonename
  when the zone is configured.  Its initial state is disabled.  If the
  service already exists sanity checking may be performed but it should
  not whack things like dependencies and authorizations.
- After zoneadm installs a zone, the general/enabled property of
  svc:/system/zones:zonename is set to match the zonecfg autoboot
  property.
- "zoneadm boot" is the equivalent of
  "svcadm enable -t svc:/system/zones:zonename"
- A new command "zoneadm shutdown" is the equivalent of
  "svcadm disable -t svc:/system/zones:zonename"
- "zoneadm halt" is the equivalent of "svcadm mark maintenance
  svc:/system/zones:zonename:" followed by the traditional ungraceful
  teardown of the zone.
- Modification of the autoboot property with zonecfg (so long as the
  zone has been installed/attached) triggers the corresponding
  general/enabled property change in SMF.  This should set the property
  general/enabled without causing an immediate state change.
- zoneadm uninstall and zoneadm detach set the service to not autostart.
- zonecfg delete also deletes the service.
- A new property be added to zonecfg to disable SMF integration of this
  particular zone.  This will be important for people that have already
  worked around this problem (including ISV's providing clustering
  products) that don't want SMF getting in the way of their already
  working solution.

> On system shutdown, the zones system will NOT export zpools contained
> within storage object used by the zone.  Zpools contained within storage
> objects assigned to installed zones will only be exported during zone
> detach.  More details about the behaviour of zone detach is provided
> below.
>
>
> ----------
> C.1.iv Zoneadm(1m) attach
>
[snip]
>
> ----------
> C.1.v Zoneadm(1m) boot
>
[snip]
>
> ----------
> C.1.vi Zoneadm(1m) detach
>
[snip]
>
> ----------
> C.1.vii Zoneadm(1m) uninstall
>
[snip]
>
> ----------
> C.1.viii Zoneadm(1m) clone
>
> Normally when cloning a zone which lives on a zfs filesystem the zones
> framework will take a zfs(1m) snapshot of the source zone and then do a
> zfs(1m) clone operation to create a filesystem for the new zone which is
> being instantiated.  This works well when all the zones on a given
> system live on local storage in a single zfs filesystem, but this model
> doesn't work well for zones with encapsulated roots.  First, with
> encapsulated roots each zone has it's own zpool, and zfs (1m) does not
> support cloning across zpools.  Second, zfs(1m) snapshotting/cloning
> within the source zpool and then mounting the resultant filesystem onto
> the target zones zoneroot would introduce dependencies between zones,
> complicating things like zone migration.
>
> Hence, for cloning operations, if the source zone has an encapsulated
> root, zoneadm(1m) will not use zfs(1m) snapshot/clone.  Currently
> zoneadm(1m) will fall back to the use of find+cpio to clone zones if it
> is unable to use zfs(1m) snapshot/clone.  We could just fall back to
> this default behaviour for encapsulated root zones, but find+cpio are
> not error free and can have problem with large files.  So we propose to
> update zoneadm(1m) clone to detect when both the source and target zones
> are using separate zfs filesystems, and in that case attempt to use zfs
> send/recv before falling back to find+cpio.

Can a provision be added for running an external command to produce the
clone?  I envision this being used to make a call to a storage device to
tell the storage device to create a clone of the storage.  (This implies
that the super-secret tool to re-write the GUID would need to become
available.)

The alternative seems to be to have everyone invent their own mechanism
with the same external commands and zoneadm attach.

> Today, the zoneadm(1m) clone operations ignores any additional storage
> (specified via the "fs", "device", or "dataset" resources) that may be
> associated with the zone.  Similarly, the clone operation will ignore
> additional storage associated with any "zpool" resources.
>
> Since zoneadm(1m) clone will be enhanced to support cloning between
> encapsulated root zones and un-encapsulated root zones, zoneadm(1m)
> clone will be documented as the recommended migration mechanism for
> users who which to migrate existing zones from one format to another.
>
>
> ----------
> C.2 Storage object uid/gid handling
>
> One issue faced by all VTs that support shared storage is dealing with
> file access permissions of storage objects accessible via NFS.  This
> issue doesn't affect device based shared storage, or local files and
> vdisks, since these types of storage are always accessible, regardless
> of the uid of the access process (as long as the accessing process has
> the necessary privileges).  But when accessing files and vdisk via NFS,
> the accessing process can not use privileges to circumvent restrictive
> file access premissions.  This issue is also complicated by the fact
> that by default most NFS servier will map all accesses by remote root
> user to a different uid, usually "nobody".  (a process known as "root
> squashing".)
>
> In order to avoid root squashing, or requiring users to setup special
> configurations on their NFS servers, whenever the zone framework
> attempts to create a storage object file or vdisk, it will temporarily
> change it's uid and gid to the "xvm" user and group, and then create the
> file with 0600 access permissions.
>
> Additionally, whenever the zones framework attempts to access an storage
> object file or vdisk it will temporarily switch its uid and gid to match
> the owner and group of the file/vdisk, ensure that the file is readable
> and writeable by it's owner (updating the file/vdisk permissions if
> necessary), and finally setup the file/vdisk for access via a zpool
> import or lofiadm -a.  This should will allow the zones framework to
> access storage object files/vdisks that we created by any user,
> regardless of their ownership, simplifying file ownership and management
> issues for administrators.

This implies that the xvm user is getting some additional privileges.
What are those privileges?

> ----------
> C.3 Taskq enhancements
>
> The integration of Duckhorn[08] greatly simplifies the management of cpu
> resources assigned to zone.  This management is partially implemented
> through the use of dynamic resource pools, where zones and their
> associated cpu resources can both be bound to a pool.
>
> Internally, zfs has worker threads associated with each zpool.  These
> are kernel taskq threads which can run on any cpu which has not been
> explicitly allocated to a cpu set/partition/pool.
>
> So today, for any zones living on zfs filesystems, and running in a
> dedicated cpu pool, any zfs disk processing associated with that zone is
> not done by the cpu's bound to that zones pool.  Essentially all the
> zones zfs processing is done for "free" by the global zone.
>
> With the introduction of zpools encapsulated within storage objects,
> which are themselves associated with specific zones, it would be
> desirable to have the zpool worker threads bound to the cpus currently
> allocated to the zone.  Currently, zfs uses taskq threads for each
> zpool, so one way of doing this would be to introduce a mechanism that
> allows for the binding of taskqs to pools.
>
> Hence we propose the following new interfaces:
>       zfs_poolbind(char *, poolid_t);
>       taskq_poolbind(taskq_t, poolid_t);
>
> When a zone, which is bound to a pool, is booted, the zones framework
> will call zfs_poolbind() for each zpool associated with an encapsulated
> storage object bound to the zone being booted.
>
> Zfs will in turn use the new taskq pool binding interfaces to bind all
> it's taskqs to the specified pools.  This mapping is transient and zfs
> will not record or persist this binding in any way.
>
> The taskq implementation will be enhanced to allow for binding worker
> threads to a specific pool.  If taskqs threads are created for a taskq
> which is bound to a specific pool, those new thread will also inherit
> the same pool bindings.  The taskq to pool binding will remain in effect
> until the taskq is explicitly rebound or the pool to which it is bound
> is destroyed.

Any thoughts of dooing something similar for dedicated NICs?  From
dladm(1M):

     cpus

         Bind the processing of packets for a given data link  to
         a  processor  or a set of processors. The value can be a
         comma-separated list of one or more  processor  ids.  If
         the  list  consists of more than one processor, the pro-
         cessing will spread out to all the  processors.  Connec-
         tion  to  processor affinity and packet ordering for any
         individual connection will be maintained.

That is, the enhancement is already there, it's just a matter of making
use of it.

> ----------
> C.4 Zfs enhancements
>
> In addition to the zfs_poolbind() interface proposed above.  The
> zpool(1m) "import" command will need to be enhanced.  Currently the
> zpool(1m) import by default scans all storage devices on the system
> looking for pools to import.  The caller can also use the '-d' option to
> specify a directory within which the zpool(1m) command will scan for
> zpools that may be imported.  This scanning involves sampling many
> objects.  When dealing with zpools encapsulated in storage objects, this
> scanning is unnecessary since we already know the path to the objects
> which contains the zpool.  Hence, the '-d' option will be enhanced to
> allow for the specification of a file or device.  The user will also be
> able to specify this option multiple times, in case the zpool spans
> multiple objects.
>
>
> ----------
> C.5 Lofi and lofiadm(1m) enhancements
>
> Currently, there is no way for a global zone to access the contents of a
> vdisk.  Vdisk support was first introduced in VirtualBox.  xVM then
> adopted the VirtualBox code for vdisk support.  With both technologies,
> the only way to access the contents of a vdisk is to export it to a VM.
>
> To allow zones to use vdisk devices we propose to leverage the code
> introduced by by xVM by incorporating it into lofi.  This will allow any
> solaris system to access the contents of vdisk devices.  The interface
> changes to lofi to allow for this are fairly straitforward.
>
> A new '-l' option will be added to the lofiadm(1m) "-a" device creation
> mode.  The '-l' option will indicate to lofi that the new device should
> have a label associated with it.  Normally lofi device are named
> /dev/lofi/<I> and /dev/rlofi/<I>, where <I> is the lofi device number.
> When a disk device has a label associated with it, it exports many
> device nodes with different names.  Therefore lofi will need to be
> enhanced to support these new device names, which multiple nodes
> per device.  These new names will be:
>
>       /dev/lofi/dsk<I>/p<j>           - block device partitions
>       /dev/lofi/dsk<I>/s<j>           - block device slices
>       /dev/rlofi/dsk<I>/p<j>          - char device partitions
>       /dev/rlofi/dsk<I>/s<j>          - char device slices

One of the big weaknesses with lofi is that you can't count on the
device name being the same between boots.  Could -l take an argument
to be used instead of "dsk<I>"?  That is:

   lofiadm -a -l coolgames /media/coolgames.iso

Creates:

   /dev/lofi/coolgames/p<j>
   /dev/lofi/coolgames/s<j>
   /dev/rlofi/coolgames/p<j>
   /dev/rlofi/coolgames/s<j>

For those cases where legacy behavior is desired, an optional %d can be
used to create the names you suggest above.

   lofiadm -a -l dsk%d /nfs/server/zone/stuff

[snip]

> ----------
> C.6 Performance considerations
>
> As previously mentioned, this proposal primarily simplifies the process
> of configuring zones on shared storage.  In most cases these proposed
> configurations can be created today, but no one has actually verified
> that these configurations perform acceptably.  Hence, in conjunction
> with providing functionality to simplify the setup of these configs,
> we also need to be quantifying their performance to make sure that
> none of the configurations suffer from gross performance problems.
>
> The most straitforward configurations, with the least possibilities for
> poor performance, are ones using local devices, fibre channel luns, and
> iSCSI luns.  These configuration should perform identically to the
> configurations where the global zone uses these objects to host zfs
> filesystems without zones.  Additionally, the performance of these
> configurations will mostly be dependent upon the hardware associated
> with the storage devices.  Hence the performance of these configuration
> is for the most part uninteresting and performance analysis of these
> configuration can by skipped.
>
> Looking at the performance of storage objects which are local files or
> nfs files is more interesting.  In these cases the zpool that hosts the
> zone will be accessing it's storage via the zpool vdev_file vdev_ops_t
> interface.  Currently, this interface doesn't receive as much use and
> performance testing as some of the other zpool vdev_ops_t interfaces.
> Hence it will worthwhile to measure the performance of a zpool backed by
> a file within another zfs filesystem.  Likewise we will want to measure
> the performance of a zpool backed by a file on an NFS filesystem.
> Finally, we should compare these two performance points to a zone which
> is not encapsulated within a zpool, but is instead installed directly on
> a local zfs filesystem.  (These comparisons are not really that
> interesting when dealing with block device based storage objects.)

Reminder for when I am testing: is this a case where forcedirectio will
make a lot of sense?  That is, zfs is already buffering, don't make NFS
do it too.

> Currently, while it is very common to deploy large numbers of zfs
> filesystems, systems with large numbers of zpools are not very common.
> The solution proposed in this project will likely result in an increase
> of zpools on systems hosting zones.  Hence, we should evaluate the
> impact of an increasing number of zpools on performance scalability.
> This could be done by comparing the io performance drop-off of an
> increasing number of zones hosted multiple zfs filesystems in a single
> zpool vs zones hosted in seperate zpools.
>
> Finally, it will be important to do performance measurements for vdisk
> configurations.  These configurations are similar to the local file or
> nfs configurations, but they will be utilising the vdev_disk backend and
> they will have an additional layer of indirection through lofi.
>
> XXX: impact of multiple zpools on arc and l2 arc?  talk to mark maybee.
>
>
> ----------
> C.7 Phased delivery
>
> Customers have been asking for a simple mechanisms to allow hosting of
> zones on NFS since the introduction of zones.  Hence we'd like to get
> this functionality into the hands of customers as quickly as possible.
> Also, the approach taken by this proposal to supporting zones on shared
> storage is different from what was originally anticipated, hence we'd
> like to get practical experience with this approach at customer sites
> asap to determine if there are situations where this approach may not
> meet their requires.  To accelerate the delivery of the previously
> proposed features, we plan to deliver them in three phases:

Sounds quite reasonable.

[snip]
>
> -------------------------------------------------------------------------------


-- 
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] zones on shared storage proposal

Reply via email to