I agree, our script could certainly be improved to add logic to check for
these failures and handle them which we will probably end up doing.
On Thu, Dec 1, 2011 at 2:47 PM, "Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D." <
> it seems that you could
> 1)improve your rc script to check the other dependence for apache
> 2)use SMF for apache that check other dependence
> my 2c
> On 12/1/2011 1:33 PM, Derek McEachern wrote:
> Thanks Mike.
> The more I look at this more I think it is load related. svcs -x only
> shows that the LP print server is not running which I don't think has any
> impact on what I'm seeing.
> As for who not reporting what I would expect I tracked that down to
> someone installing the gnu tools in /usr/local/bin and then setting default
> path to reference those before /bin/ :-(
> /bin/who -r shows the zone is at run level 3.
> Looking at /var/svc/log/milestone-multi-user-server:default.log I can
> see that some of the other services have most likely not completed before
> it tries to run the rc scripts. It appears that the /usr filesystem hasn't
> yet been mounted read/write and the appstart script is logging an error
> that indicates rpc services are not completely running.
> Executing legacy init script "/etc/rc3.d/S98apache".
> (30)Read-only file system: httpd: could not open error log file
> Unable to open logs
> Legacy init script "/etc/rc3.d/S98apache" exited with return code 0.
> Executing legacy init script "/etc/rc3.d/S99appstart".
> ERROR: Unable to contact any server
> Legacy init script "/etc/rc3.d/S99appstart" exited with return code 0.
> [ Dec 1 09:17:13 Method "start" exited with status 0 ]
> We have a process in place that only starts 3 zones at one time so we
> are not doing all 40 at once but it could be that with this hardware even
> trying 3 at a time is too much and we may need to drop to 2.
> On Thu, Dec 1, 2011 at 12:07 PM, Mike Gerdts <mike.ger...@oracle.com>wrote:
>> On Thu 01 Dec 2011 at 10:39AM, Derek McEachern wrote:
>> > Have a peculiar problem that I haven't seen before.
>> > When starting a system that has about 35 - 40 zones on it occasionally
>> > see that one of the zones doesn't come up properly. You can log into the
>> > zone but none of the /etc/rc3.d scripts have been run.
>> > /var/adm/messages is completely empty and when running who -r to see the
>> > run level it doesn't report anything.
>> Take a look at the output of svcs -x. Most likely you have a service
>> that svc:/milestone/multi-user-server:default depends on (directly or
>> indirectly) that has timed out and as such is in maintenance. Because
>> the dependency is not satisfied, this milestone doesn't come up so the
>> rc3 scripts are not run.
>> My guess is the timeout is because so many zones are starting at once
>> that the disks are being thrashed. The resulting I/O backlog slows down
>> the startup of services, which leads to timeouts, which lead to some
>> services failing to even try to start.
>> A google search and a 5 second read suggests that this link may be of
>> help to adjust the timeout of services that require a longer timeout:
>> Mike Gerdts
>> Solaris Core OS / Zones
> zones-discuss mailing listzones-disc...@opensolaris.org
> Hung-Sheng Tsao Ph D.
> Founder & Principal
> HopBit GridComputing LLC
> cell: 9734950840http://laotsao.wordpress.com/http://laotsao.blogspot.com/
> zones-discuss mailing list
zones-discuss mailing list