SMF Experts,

First, apologies for the length of this email.  Hopefully you'll find it
straightforward. 

As part of final testing for Clearview UV, we've been chasing an upgrade
issue which we now believe to be an issue with SMF dependency handling.

Specifically, we upgraded via packaging from a current Nevada system to a
system running UV.  After the upgrade completed, we checked the
repository[1] and verified that the new network/datalink-management
service had been imported[2] *prior* to rebooting the upgraded system:

    # svccfg
    svc:> repository /a/etc/svc/repository.db
    svc:> select network/datalink-management
    svc:/network/datalink-management> listprop
    general                       framework
    general/entity_stability      astring  Unstable
    general/single_instance       boolean  true
    dependents                    framework
    dependents/device-system      fmri     svc:/system/device/local
    dependents/install-discovery  fmri     svc:/system/install-discovery
->  dependents/network-physical   fmri     svc:/network/physical
    ...

Note in particular that svc:/network/physical is listed as a dependent.
Likewise, looking at the network/physical service shows that it depends
on network/datalink-management:

    svc:/network/datalink-management> select network/physical
    svc:/network/physical> listprop
    loopback                           dependency
    loopback/entities                  fmri     svc:/network/loopback
    loopback/grouping                  astring  require_all
    loopback/restart_on                astring  none
    loopback/type                      astring  service
    tnctl_network-physical             dependency
    tnctl_network-physical/entities    fmri     svc:/network/tnctl
    tnctl_network-physical/external    boolean  true
    tnctl_network-physical/grouping    astring  optional_all
    tnctl_network-physical/restart_on  astring  none
    tnctl_network-physical/type        astring  service
->  network-physical                   dependency
->  network-physical/entities          fmri     svc:/network/datalink-management
    network-physical/external          boolean  true
->  network-physical/grouping          astring  require_all
    network-physical/restart_on        astring  none
    network-physical/type              astring  service
    general                            framework
    general/entity_stability           astring  Unstable

Also prior to rebooting, we added some debug messages to the net-physical
startup script to determine what svcs -d/-D thought the dependencies
between network/physical and network/datalink-management were.  When we
rebooted, those debugging messages revealed that during net-physical
execution, the datalink-management service dependency was missing[3]:

    svcs -d network/physical
    STATE          STIME    FMRI
    online          2:02:30 svc:/network/tnctl:default
    online          2:02:30 svc:/network/tnctl:default
    online          2:02:30 svc:/network/loopback:default
    online          2:02:30 svc:/network/loopback:default

Likewise, debugging messages indicated that the datalink-management
service has no dependents:

    svcs -D network/datalink-management
    STATE          STIME    FMRI
        
... and "svcs -a" showed datalink-management in an unidentified state:

    STATE          STIME    FMRI
    ...
    uninitialized   2:02:38 svc:/network/talk:default
    uninitialized   2:02:38 svc:/network/slp:default
    uninitialized   2:02:38 svc:/network/telnet:defaul
->  -              svc:/network/datalink-management:default

Once boot completed and one logged in, things appeared to have become
correct - though the STIME makes it clear that something is amiss:

    # svcs -d network/physical
    STATE          STIME    FMRI
    online          2:02:30 svc:/network/tnctl:default
    online          2:02:30 svc:/network/tnctl:default
    online          2:02:30 svc:/network/loopback:default
    online          2:02:30 svc:/network/loopback:default
    online          2:02:41 svc:/network/datalink-management:default
    online          2:02:41 svc:/network/datalink-management:default

    # svcs -D network/datalink-management
    STATE          STIME    FMRI
    disabled        2:02:28 svc:/network/physical:nwam
    online          2:03:05 svc:/system/device/local:default
    online          2:04:48 svc:/network/physical:default

I'm not familiar with the internals of SMF, but it seems as if the
dependency information early in boot is being extracted from a snapshot --
perhaps until manifest-import?  If so, this is certainly a problem for us,
as we need to ensure that network/datalink-management is online prior to
running a certain part of the net-physical script (hence the dependency).
One workaround that occurred to me was to keep the dependency, but also
explicitly do a "svcadm enable -ts network/datalink-management" from the
net-physical script to ensure it's online by the time we need it.

In any case, we are eager to hear your thoughts on all of this -- and
especially on the workaround.

Thanks!

[1] To be sure we were examining the right repository, we modified a
    property in the network/datalink-management service description;
    after rebooting the system, the changed value was still present.

[2] Recall that we always import this service (rather than using
    manifest-import) since the service needs to run early in boot,
    before manifest-import runs.

[3] I presume the duplicate dependency entries are harmless and are
    a separate bug.

--
meem

Reply via email to