We had a meeting this morning to further discuss this issue, but unfortunately a number of key people were not able to make it. SMF and Zones people: Please take a look at the following and see if you can shoot any holes in it.
--- There is an important assumption that today's automated patching tools have made in their attempts to solve this problem: that system startup and shutdown are sequential and that it is possible to intercept them at a point where the system is in a state equivalent to "single user mode". This assumption is false. It is part of the design of SMF that if the system is booting to full operation, services are started when their dependencies are satisfied, whether or not any particular milestone has been reached. A service that is started only when the system is headed for full operation may well come up at the very beginning of startup, if it has no dependencies. Similarly, that service may well persist until the very end during shutdown, since (again) it has no dependencies. This means that the *only* way on an SMF system to get the system into the desired quiescent state is to set the system to a limited milestone like milestone/single-user. Milestone/single-user is not suitable for use as that milestone, because an unknowable set of services are starting in parallel to reach that milestone. It is not possible to place an automated patching service in that set and ensure that it executes after all of the other services have reached a quiescent state. Thus, it seems necessary to introduce a new patching milestone. That milestone should be dependent on a new patching service (or services) that does the actual work. The patching service must depend on milestone/single-user, so that single-user services are known to be running and quiescent. Other constraints: Patches may use SMF operations, and some SMF operations are blocked during execution of rc*.d scripts. We tell users to install these patches "in single user mode". This means that "boot -s" or equivalent operations must yield an environment where patchadd can apply these patches. Patching requires the file system mounts implemented by system/filesystem/local. There is a legacy that "single user mode" does not mount all file systems. This brings with it the possibility that a user might expect that file systems will not be mounted. Now I think we've set the stage for a concrete set of proposals. Common elements: - Define a new milestone, say "milestone/patching". - Create a new service (or, more likely, two, one for UCE and one for UCS). Set this service to depend on milestone/single-user and system/filesystem/local, and be depended upon by milestone/patching. - Have the application queue up patches to be installed, and then boot the system to milestone/patching. - Once the patches have been applied, the patching service should either allow the system to come up (by changing its target milestone) or reboot it, depending on the requirements of the patches. Proposal 1: - Move system/filesystem/local under milestone/single-user. Upside: Really simple (other than common elements). Downside: Violates the user assumption that only key file systems are mounted. Proposal 2: - Make the new patching service depend on system/filesystem/local. - Modify patchadd to temporarily enable system/filesystem/local before patching, and disable it afterward, *if* it is offline. Note that when the patching service runs, system/filesystem/local will be online and so patchadd will not need to manipulate it. Upside: Relatively simple and modular. Downside: Patchadd mounting and unmounting file systems may be unexpected. Proposal 3: - Modify patchadd to execute system/filesystem/local's start method before patching, and its stop method after, if it is offline. Upside: avoiding SMF manipulations in patchadd avoids the possibility of SMF-related deadlocks. Downside: patchadd has unholy knowledge of system/filesystem/local, or requires unpleasant examination of s/f/l's SMF data. Patchadd mounting and unmounting file systems may be unexpected. Common issues: - How do we avoid starting the patching service when the system is coming up to "all" (and so the system may not be quiescent when the service runs)? One possibility might be to have the service disabled when there are no patches queued, but that might lead to problems if the user explicitly selects "all" in some way. Another possibility might be to have the service inspect the current target milestone, and do nothing if it is not "milestone/patching". - How does the application boot the system into milestone/patching? Using a temporary milestone ("reboot -- -m milestone/patching") requires that the application initiates the reboot, rather than allowing the user to reboot at leisure. Using a permanent milestone setting requires that the application stash away the original target milestone to be restored after the patches are applied - an opportunity for strange behavior if the user intervenes. (Note that in both of these cases, if the user requests a boot to milestone/single-user, patching will not be done. This is probably best viewed as a feature, since it allows one to avoid patching when it causes problems.) Recommendations: - Adopt proposal 2 above - have patchadd temporarily enable system/filesystem/local if it is offline. - Have the application stash the current default milestone into an SMF property, and set the default milestone to "milestone/patching". - When the service starts, have it inspect the current milestone. If the current milestone is milestone/patching, have it apply the patches and restore the previous default milestone. - Optionally, if the current milestone is *not* milestone/patching when the service starts, have it inspect the queue of patches to be installed. If the queue is non-empty, this indicates that the user has manually overridden the application's attempt to set the system up for automated patching. Again, stash the default milestone and set the default milestone to milestone/patching. This again sets the system up for automated patching on the *next* boot. Any other proposals? Can anybody shoot any holes in this one? Futures: For extra credit, somebody can try to come up with a similar scheme that runs during system shutdown, to avoid the need for an "extra" reboot. Exactly how to get a service method to run at the right point during such a shutdown is unclear to me, but I won't say it's impossible.