it seems that you could
1)improve your rc script to check the other dependence for apache
2)use SMF for apache that check other dependence
my 2c

On 12/1/2011 1:33 PM, Derek McEachern wrote:
Thanks Mike.

The more I look at this more I think it is load related. svcs -x only shows that the LP print server is not running which I don't think has any impact on what I'm seeing.

As for who not reporting what I would expect I tracked that down to someone installing the gnu tools in /usr/local/bin and then setting default path to reference those before /bin/ :-(

/bin/who -r shows the zone is at run level 3.

Looking at /var/svc/log/milestone-multi-user-server:default.log I can see that some of the other services have most likely not completed before it tries to run the rc scripts. It appears that the /usr filesystem hasn't yet been mounted read/write and the appstart script is logging an error that indicates rpc services are not completely running.

Executing legacy init script "/etc/rc3.d/S98apache".
(30)Read-only file system: httpd: could not open error log file /usr/local/apache2/logs/error_log.
Unable to open logs
Legacy init script "/etc/rc3.d/S98apache" exited with return code 0.
Executing legacy init script "/etc/rc3.d/S99appstart".
ERROR: Unable to contact any server
Legacy init script "/etc/rc3.d/S99appstart" exited with return code 0.
[ Dec 1 09:17:13 Method "start" exited with status 0 ]

We have a process in place that only starts 3 zones at one time so we are not doing all 40 at once but it could be that with this hardware even trying 3 at a time is too much and we may need to drop to 2.


On Thu, Dec 1, 2011 at 12:07 PM, Mike Gerdts < <>> wrote:

    On Thu 01 Dec 2011 at 10:39AM, Derek McEachern wrote:
    > Have a peculiar problem that I haven't seen before.
    > When starting a system that has about 35 - 40 zones on it
    occasionally we
    > see that one of the zones doesn't come up properly. You can log
    into the
    > zone but none of the /etc/rc3.d scripts have been run.
    > /var/adm/messages is completely empty and when running who -r to
    see the
    > run level it doesn't report anything.

    Take a look at the output of svcs -x.  Most likely you have a service
    that svc:/milestone/multi-user-server:default depends on (directly or
    indirectly) that has timed out and as such is in maintenance.  Because
    the dependency is not satisfied, this milestone doesn't come up so the
    rc3 scripts are not run.

    My guess is the timeout is because so many zones are starting at once
    that the disks are being thrashed.  The resulting I/O backlog
    slows down
    the startup of services, which leads to timeouts, which lead to some
    services failing to even try to start.

    A google search and a 5 second read suggests that this link may be of
    help to adjust the timeout of services that require a longer timeout:

    Mike Gerdts
    Solaris Core OS / Zones

zones-discuss mailing list

Hung-Sheng Tsao Ph D.
Founder&  Principal
HopBit GridComputing LLC
cell: 9734950840

<<attachment: laotsao.vcf>>

zones-discuss mailing list

Reply via email to