> Quoth Michael Shapiro on Fri, Oct 27, 2006 at 04:13:14PM -0700: > > I need to hear more details on what specific things they expect to work and > > not work. In general, it sounds to me like you're describing the same type > > of thing I discussed with the ipfilter and routing people, which is the > > tension between service fault boundary and the idea of a single command > > to enable/disable a feature. I won't recap the principles here (refer to > > the earlier thread), but the bottom line is that we never established > > any rule or idea that one can arbitrarily type "svcadm enable X" or > > disable X for arbitrary X and that is the same as turning on or off > > some feature _as the administrator understands the top level concept_. > > That's easy for us to say, but doesn't it set the administrator up for > failure? Aren't you suggesting that the workflow for disabling > a service that an administrator sees running via svcs should be > > 1. Run svcs -x on the FMRI to get the manpage. > > 2. Read the manpage to see whether svcadm should be used on the > service or not. > > 3. Use svcadm if appropriate, or otherwise use the feature-specific > tool as indicated by the manpage. > > This is the antithesis of approachability, and administrators aren't > going to do it. We should throw them a bone by telling them when it's > a bad idea to use svcadm ASAP, like by failing the svcadm.
Flip it around: you're arguing that (a) I want to disable something, and (b) I somehow know the FMRI for it, and (b) I expect to then go disable that FMRI. But how did the administrator know (b) in the first place? It must have been by either 1+2 above, or by looking at some piece of documentation. The only other way is to do svcs -a | grep <some random token> which is a sure-fire way to get into trouble -- we can't possibly support nor do we support people running around disabling things they don't understand. ( This was also the same before SMF. Example: I think I want to disable NFS. I ps -e | grep nfs and see nfs.lockd and nfs.statd and I kill them. Except NFS v3 isn't really disabled -- I've just broken part of it. ) The bottom line is that no admin has pre-existing birth knowledge of FMRIs. They must be discovered by reading some piece of documentation either on-line or off-line before using the svc* tools in the first place. In cases of larger features, we hopefully don't need that because you use the more appropriate higher-level tool for that feature, or some higher-level management framework does the underlying manipulations. > Install, upgrade, and higher-level software should use the same > interface that share(1M) uses, if they know what to do better than > share(1M) does. For development or debugging, the protection can be > turned off, since it's set by the developer anyway. I agree -- but you haven't specified the semantics of locking etc. when these s/w things execute svccfg commands during install or upgrade or what developers are supposed to do. And I think making any of these contexts have even more brittle interactions is a bad design direction. > > Fundamentally from the point of view of the service developer, handling > > independent disable as in (c) doesn't require doing anything different, > > because you *already* have to handle (b), so that code needs to be there. > > That's right. The point of this proposal is how it appears to the > administrator. Does the system let him do something unsupported and > then put a bunch of services into maintenance? Or silently undo what he > has done? Or does the system say, "sorry, use the appropriate interface"? > > > General points aside, we can best discuss NFS by seeing a list of specific > > behaviors we're trying to support or discussing the failure boundaries. > > I would rather add features to the framework in a generic fashion. I'm > only using NFS here as a concrete example; I could just have easily used > SVM, routing, or wpad. > > David But it's not a concrete example, but you haven't explained what specific problem in NFS we're trying to solve, as I asked. If NFS services have the proper failure boundaries, then they need to handle components being gone in some sane way. Ergo they should handle those things being disabled with no extra work. So why is locking needed in the first place? Either because (a) the factoring is wrong or (b) the handling of individual failures is wrong, and you're trying to work around that instead of fixing NFS. If you claim NFS can't be fixed, then explain what *specific circumstance* is going to cause some interaction with the administrator that is confusing and why the existing infrastructure isn't able to address that. i.e. give me a list of what commands I type to get into this situation, what error messages I see, etc. I'm unconvinced because you've offered no data. -Mike -- Mike Shapiro, Solaris Kernel Development. blogs.sun.com/mws/