Hey Darren,
On Thu, 2008-09-18 at 13:36 +0100, Darren J Moffat wrote:
> I believe from conversations with Tim part of the spec is missing.
Yep - thanks. For the record, I'll attach the draft one pager I have
here. I think it answers the questions about the zfssnap role. The
package name will be SUNWzfs-auto-snapshot for the core SMF service
component (which is a in separate package from the gui components)
thanks!
tim
-------------- next part --------------
Template Version: @(#)onepager.txt 1.31 07/08/08 SMI
This information is Copyright 2008 Sun Microsystems
1. Introduction
1.1. Project/Component Working Name:
ZFS Automatic Snapshots
1.2. Name of Document Author/Supplier:
Tim Foster
1.3. Date of This Document:
02/04/2008 (minor edits 09/09/2008)
1.4. Name of Major Document Customer(s)/Consumer(s):
1.4.1. The Community you expect to review your project:
ZFS OpenSolaris Community
[editor's note - I'm not sure what was expected for 1.4.1 above]
1.4.2. The ARC(s) you expect to review your project: PSARC
1.5. Email Aliases:
1.5.2. Responsible Engineer: tim.foster at sun.com
1.5.4. Interest List: zfs-discuss at opensolaris.org
2. Project Summary
2.1. Project Description:
This project delivers an SMF service which allows the admin to performs
regular, periodic snapshots of user/administrator-specified ZFS
filesystems. It is loosely coupled with the ZFS codebase, using only
the ZFS CLI, cron and SMF to perform it's functionality.
2.2. Risks and Assumptions:
The current prototype has been implemented entirely in Korn shell -
performance/scalability testing has not yet been carried out to
determine whether this implementation is fast enough. If much tighter
integration into the ZFS codebase is required, then this project will
need additional resources.
This project is not officially Sun funded - the engineer is doing this
in his spare time. This could be mitigated by additional resources if
a significant amount of additional engineering is recommended by the
ARC and those resources become available.
3. Business Summary
3.1. Problem Area:
This adds one more feature to the capabilities ZFS brings to Solaris,
integrating ZFS more tightly with the operating system and providing
a feature that some expect ZFS to have already.
3.2. Market/Requester:
No specific person has asked for this feature, but it appears to be a
general feature of many NAS boxes. The idea for such a system in ZFS
came from a discussion on the zfs-discuss at opensolaris.org mailing
list:
http://www.opensolaris.org/jive/thread.jspa?messageID=37190
3.3. Business Justification:
Not providing scheduled periodic ZFS snapshots on Solaris out of the
box, means that there's one more thing that a system administrator needs
to write and debug scripts for, before putting a Solaris system into
production to best exercise the features ZFS can provide.
Having a common facility in Solaris that does this would prevent
duplication of effort at user sites, increase the
speed-to deploy a Solaris system, and make life easier for support staff
when users either request this feature, or try to troubleshoot a
user's homemade solution.
3.4. Competitive Analysis:
Many other NAS products and operating systems that support snapshots
already do this. These include:
http://www.emc.com/products/software/snapview2.jsp
http://www.microsoft.com/windows/products/windowsvista/features/details/shadowcopy.mspx
http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=0650&db=relnotes&fname=/usr/relnotes/nasmgr
http://www.netapp.com/ftp/snapshot-brochure.pdf
http://www.real-storage.com/nas-snapshots.html
http://people.freebsd.org/~rse/snapshot/
3.5. Opportunity Window/Exposure:
We're playing catchup.
3.6. How will you know when you are done?:
The major features have already been implemented in Korn shell, but we
could use some performance/stress testing on large systems. This code
is primarily aimed at small systems.
Community feedback can be used to determine if we've implemented enough
of the functionality for this to be useful.
[ editor's note: yes, that's pretty vague. I don't know
what specific metrics I could use here - any suggestions ? ]
4. Technical Description:
4.1. Details:
The service works by having a separate service instance, each denoting
a separate schedule of periodic snapshots, per group of fileystems.
The SMF method script is responsible for adding/removing the snapshot
cron job, which corresponds to enabling and disabling the service.
The method script is also called directly from cron according to the
crontab entries - in which case it is responsible for taking the
snapshot.
Filesystems are grouped together either by setting their names as a
space separated list in an SMF instance property, or queried dynamically
by the method script, by the service searching for an instance-specific
ZFS user-property across all ZFS filesystems. With ZFS Delegated
Administration (PSARC 2006/465), users can specify this property on
their own filesystems, and need not reconfigure the SMF service.
The service can also be responsible for destroying older snapshots taken
by the service, allowing the administrator to keep a given number of
snapshots into the past. The service can perform a backup command at
each invocation of the cron job - the admin specifies what command to
run at the end of a pipe that starts with
"zfs send <filesystem>@<snapshot>", with the option of sending an
incremental stream from the previous periodic snapshot.
What does this offer that a simple "zfs snapshot <filesystem>@snap"
entry in crontab doesn't? Using SMF allows the adminstrator to
easily see when snapshots fail for some reason, allows them to easily
enable/disable snapshots for groupings of filesystems and adds
additional features, like performing backups of their filesystems.
In the default configuration, we have daily, weekly, hourly, monthly
and yearly snapshots - each managed under a different SMF instance.
The administrator could add instances to take more frequent snapshots
for some filesystems, less frequent snapshots for other filesystems -
and have the service manage the complexity of dealing with cron for
them.
This has been a personal project up till now, with code (licensed under
CDDL) and implementation posted on the engineer's blog. The README
documentation for the project is at:
http://blogs.sun.com/timf/resource/README.zfs-auto-snapshot.txt
The "SEE ALSO" section of the README has a list of links showing
the various stages of the project to date. To summarize, the project
has evolved over 10 versions since May 2006 to the present date. Users
have been running the code, and providing feedback, which has been
integrated into each subsequent version.
Two known bugs are worth calling out here:
One is to do with our reliance on cron. That is, to properly allow the
administrator have snapshots taken every 3 days, we'd need to re-write
the crontab entry when the days in the month aren't evenly divisible by
3 - at the moment, the following crontab day field would look like:
1,4,7,10,13,16,19,22,25,28,31
after taking the snapshot on the 31st our next snapshot should be taken
on the 3rd of the following month - but as implemented, it'll get taken
on the 1st instead. Other time periods are similarly affected.
The other bug is
6474294 Need to be able to better control who can read files in a
snapshot.
This service doesn't change the implications of that bug, but having
automatic snapshots could result in more people running into the
situation.
4.2. Bug/RFE Number(s):
TBD
4.3. In Scope:
Everything discussed in this one-pager is in scope.
4.4. Out of Scope:
While this service does provide a means for a snapshot stream to be
stored remotely (the "backup" option allows for a ZFS send-stream
to get piped to an administrator-specified command) it doesn't provide
the eqivalent "restore" command. This is not a general purpose backup
tool (ie. does not fix 5004379). This is also not a general purpose
remote replication facility (5036182) although some users have already
started using it as a "poor man's cluster".
[ editor's note - with that in mind, could this service ultimately end
up confusing people who are expecting the above? ]
4.5. Interfaces:
The interface will be the SMF service, allowing users to create
instances of the service to perform work. Setting/getting SMF properties
and refreshing the instance works.
For certain operations, we allow the user to set zfs filesystem
properties
(PSARC 2006/504) these are:
zfs set com.sun:auto-snapshot=true|false <filesystem>
zfs set com.sun:auto-snapshot:<label>=true|false <filesystem>
We suspect the stability level will be Evolving, but would like advice.
Over the course of the prototype development, we've added, but never
removed several service properties - a 0.1 manifest will work correctly
with a 0.10 version of the service.
4.6. Doc Impact:
The ZFS Administration Guide could be modified to reference this
service.
4.7. Admin/Config Impact:
Adding this SMF service will introduce no change to the way Solaris
is currently installed or administered. Out of the box, the included
service instances can be installed as "disabled". The administrator
would need to enable each service they wanted to use, then mark
filesystems for inclusion under each the snapshot schedule set by the
now-enabled instances.
4.8. HA Impact:
// What new requirements does this proposal place on the High
// Availability or Clustering aspects of the component?
[ editor's note: I'm not sure of the answer here - I assume HA
clusters already have some form of SMF synchronisation to
ensure that failover-nodes have the same SMF configuration
applied automatically, should the running node change it's
SMF configuration ? ]
4.9. I18N/L10N Impact:
Additional translation of the ZFS Administration Guide could be
required.
4.10. Packaging & Delivery:
One additional package, which delivers the default instance,
the included instances and the method script. No impact during
Install/Upgrade.
4.11. Security Impact:
If periodic snapshots are taken of sensitive data, then 6474294
may be worth visiting prior to integration, however this service
only highlights that problem - it exists without the service as
well.
RBAC is used here, along with the creation of a new "zfssnap" role
under which the cron jobs run. This role is granted the
"ZFS File System Management" profile.
[ editor's note - this role probably needs a dedicated uid < 100 ]
4.12. Dependencies:
Cron, SMF and ZFS. The service works with ZFS from s10u2 and later
- later ZFS versions include faster recursive snapshots, which the
method script detects and uses the feature if it's available.
5. Reference Documents:
The following bugids have been mentioned in this one-pager, under
sections 4.4 and 4.11.
5004379 want comprehensive backup strategy
5036182 want remote replication (intent-log based)
6474294 Need to be able to better control who can read files in
a snapshot.
6. Resources and Schedule:
6.1. Projected Availability:
TDB
6.2. Cost of Effort:
// Order of magnitude people and time for the *whole* project, not
// just the development engineering part.
// You may wish to split the estimate between feature
// implementation, implementing adminsitrative Interfaces, unit
// tests, documentation, support training material, i18n, etc.
6.4. Product Approval Committee requested information:
6.4.1. Consolidation or Component Name: ON
6.4.7. Target RTI Date/Release:
TBD
// List target release & build and/or date.
// RTI = Request to Integrate - when does *this* project
// expect to be ready to integrate its changes back into
// the master source tree? We are not asking when the
// component wants to ship, but instead, when the
// gatekeeper/PM needs to expect your changes to show up.
// examples: S8u7_1, S9_45, Aug 2002...
6.4.8. Target Code Design Review Date: TBD
6.5. ARC review type: Standard
6.6. ARC Exposure: open
6.6.1. Rationale: Part of OpenSolaris
7. Prototype Availability:
7.1. Prototype Availability:
An evolving prototype has been available since May 2006.
7.2. Prototype Cost:
$0