Liane Praza wrote: > Peter Memishian wrote: >> > This root cause presents one solution: Modify svc.startd to create >> > restarter property groups on all services before starting any of them. >> > This would provide the same SCF environment to early services as to late >> > ones. That is, service instances only lack restarter property groups >> > when they're not configured properly or something else is wrong. >> > I don't think this would cause a significant performance problem because >> > creating restarter property groups shouldn't require disk I/O and should >> > be relatively fast. It's a nontrivial change to a crucial component, >> > though, and certainly won't be done within the next few days. For now, >> > Clearview could use the shell script workaround Tony and Liane found. >> > >> > An alternative solution, however, is to modify svcadm enable -ts to stop >> > interpreting the restarter property group's absence as an error and >> > instead just wait for it to appear. Since it should be interpreted as >> > an error after boot, but we don't have a good way to tell when that's >> > the case, Tony and I think that for this situation svcadm should inform >> > the user that the restarter property group is missing, that that's >> > usually an error, and that it is waiting in case it isn't. This would >> > be a change in behavior for a relatively rare case, and could in theory >> > be problematic if a program is invoking svcadm enable -ts without >> > passing the message on to the user. This fix should be pretty simple >> > and might be doable with or before the Clearview putback. I don't like >> > the wishy-washy semantics, though. >> > >> > So it seems that our choice is between an easy svcadm special change and >> > an ugly Clearview workaround plus a more complicated startd fix later. >> > I like the startd fix better in the long-term, but I'm not sure it's >> > a slam-dunk. Please opine. >> >> I agree about the wishy-washy semantics of the second solution. As far as >> expedient solutions go, I'd rather go with the shell script workaround in >> net-physical, as its blast radius is contained and its sins are obvious. > > I agree. Let's scrutinize it, though, to make sure it doesn't cause > boot to hang indefinitely if something goes wrong. > >> Longer term, would the startd solution still be necessary once the changes >> have been made to make manifest-import run earlier? > > Theoretically, the problem could still exist. However, UV (and problems > of a similar category) won't run into it because it won't need the hack > to make sure its dependencies are running on the first boot after upgrade. > > I think a low/medium priority bug is in order for the startd change. As > always, I wish UV wasn't in the situation of having to do this at all, > and would prefer to continue giving priority to the early-import effort > so that future super-early-boot service changes can avoid having to do > any work at all in this scenario. >
I've filed 6653687 svcadm's wait_fmri_enabled failed prematurely because "restarter" pg isn't yet available to keep track of this issue. -tony