I agree, our script could certainly be improved to add logic to check for
these failures and handle them which we will probably end up doing.

Derek

On Thu, Dec 1, 2011 at 2:47 PM, "Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D." <
laot...@gmail.com> wrote:

>  it seems that you could
> 1)improve your rc script to check the other dependence for apache
> or
> 2)use SMF for apache that check other dependence
> my 2c
>
>
>
> On 12/1/2011 1:33 PM, Derek McEachern wrote:
>
> Thanks Mike.
>
>  The more I look at this more I think it is load related. svcs -x only
> shows that the LP print server is not running which I don't think has any
> impact on what I'm seeing.
>
>  As for who not reporting what I would expect I tracked that down to
> someone installing the gnu tools in /usr/local/bin and then setting default
> path to reference those before /bin/ :-(
>
>  /bin/who -r shows the zone is at run level 3.
>
>  Looking at /var/svc/log/milestone-multi-user-server:default.log I can
> see that some of the other services have most likely not completed before
> it tries to run the rc scripts. It appears that the /usr filesystem hasn't
> yet been mounted read/write and the appstart script is logging an error
> that indicates rpc services are not completely running.
>
>  Executing legacy init script "/etc/rc3.d/S98apache".
> (30)Read-only file system: httpd: could not open error log file
> /usr/local/apache2/logs/error_log.
> Unable to open logs
> Legacy init script "/etc/rc3.d/S98apache" exited with return code 0.
> Executing legacy init script "/etc/rc3.d/S99appstart".
> ERROR: Unable to contact any server
> Legacy init script "/etc/rc3.d/S99appstart" exited with return code 0.
> [ Dec 1 09:17:13 Method "start" exited with status 0 ]
>
>  We have a process in place that only starts 3 zones at one time so we
> are not doing all 40 at once but it could be that with this hardware even
> trying 3 at a time is too much and we may need to drop to 2.
>
>  Derek
>
>  On Thu, Dec 1, 2011 at 12:07 PM, Mike Gerdts <mike.ger...@oracle.com>wrote:
>
>> On Thu 01 Dec 2011 at 10:39AM, Derek McEachern wrote:
>> > Have a peculiar problem that I haven't seen before.
>> >
>> > When starting a system that has about 35 - 40 zones on it occasionally
>> we
>> > see that one of the zones doesn't come up properly. You can log into the
>> > zone but none of the /etc/rc3.d scripts have been run.
>> >
>> > /var/adm/messages is completely empty and when running who -r to see the
>> > run level it doesn't report anything.
>>
>>  Take a look at the output of svcs -x.  Most likely you have a service
>> that svc:/milestone/multi-user-server:default depends on (directly or
>> indirectly) that has timed out and as such is in maintenance.  Because
>> the dependency is not satisfied, this milestone doesn't come up so the
>> rc3 scripts are not run.
>>
>> My guess is the timeout is because so many zones are starting at once
>> that the disks are being thrashed.  The resulting I/O backlog slows down
>> the startup of services, which leads to timeouts, which lead to some
>> services failing to even try to start.
>>
>> A google search and a 5 second read suggests that this link may be of
>> help to adjust the timeout of services that require a longer timeout:
>>
>> http://www.runningunix.com/2009/01/changing-timeouts-on-smf-services/
>>
>> --
>> Mike Gerdts
>> Solaris Core OS / Zones
>> http://blogs.oracle.com/zoneszone/
>>
>
>
>
> _______________________________________________
> zones-discuss mailing listzones-disc...@opensolaris.org
>
>
> --
> Hung-Sheng Tsao Ph D.
> Founder & Principal
> HopBit GridComputing LLC
> cell: 9734950840http://laotsao.wordpress.com/http://laotsao.blogspot.com/
>
>
> _______________________________________________
> zones-discuss mailing list
> zones-discuss@opensolaris.org
>
_______________________________________________
zones-discuss mailing list
zones-discuss@opensolaris.org

Reply via email to