On Wed, May 18, 2011 at 07:49:52PM -0700, Edward Pilatowicz wrote:
> On Wed, May 18, 2011 at 02:46:19PM +1200, Tim Foster wrote:
> > On Mon, 2011-05-16 at 17:59 -0700, Edward Pilatowicz wrote:
> > > On Fri, May 13, 2011 at 02:18:02PM +1200, Tim Foster wrote:
> > > > http://cr.opensolaris.org/~timf/sysrepo-refactor-webrev/
>
> > Setting an unsupported publisher in the GZ and dropping to maintenance
> > also results in the zones-proxyd service stopping, which causes the
> > zones-proxy-client to try to restart in all zones, which eventually
> > drops to maintenance too:
> >
> > [ May 17 15:16:06 Executing start method
> > ("/usr/lib/zones/zoneproxy-client -s localhost:1008"). ]
> > Timed out trying to reach proxy
> > [ May 17 15:19:06 Method "start" exited with status 95. ]
> >
> > Fixing the problem by disabling that unsupported publisher in the GZ,
> > then doing an 'svcadm clear system-repository' will start the
> > zones-proxyd service again.
> >
> > If we fix the problem before any of zones have had their
> > zones-proxy-clients drop to maintenance (that is, while they're still
> > re-running their start method script, which has a long, 300 second
> > timeout) the clients can then connect to the zones-proxyd and
> > everything's rosy again.
> >
> > If not, despite fixing the problem in the GZ, we need to clear the
> > maintenance state of each zone-proxy-client (one per zone) manually.  I
> > think setting an infinite timeout would be better for this client
> > service, but would welcome comments.
> >
>
> so the cascade effects are pretty unfortunate.  perhaps we can have a
> follow up fix where pkg set-publisher checks if the sysrepo is running,
> and if so refuses to configure unsupported file repositories?
>

so, i just looked at zoneproxyd.xml and noticed:

---8<---
        <dependency
                  name='sysrepo'
                  type='service'
                  grouping='require_all'
                  restart_on='restart'>
                  <service_fmri value='svc:/application/pkg/system-repository' 
/>
        </dependency>
---8<---

why are we using restart_on='restart'?  why not just change this to
restart_on='none'?  if the sysrepo service is broken or not responding,
clients will be unable to connect to it so they should receive pretty
quick notification via tcp connection failures.  that way the proxy
service can continue to run uninterrupted.

ed
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Reply via email to