Issue #16117 has been updated by Dominic Cleal.
I've re-read the description and see that it actually went wrong during the
service refresh event (not starting), which would have performed an `svcadm
restart pe-mcollective`. The initial start had to have succeeded as since
#10807, it would have returned a failure.
The svcs output shows the service is shutting down ("online*") and then 60
seconds it fails. This is probably an error in the stop method of the
pe-mcollective service, most likely because there's still a process running
that didn't die (hence the timeout). This'll cause it force-kill the remaining
processes and to drop into maintenance.
We could certainly improve the provider here to detect a failed restart event
by polling for the restart to have ended. There's no svcadm flag like with a
start/stop to do this.
On the second run, Puppet cleared the state and started it back up. I think
it's questionable that the provider does clear a state needing admin action,
but at least it recovered.
Nigel Kersten wrote:
> I do feel like I've heard similar sentiments from other Solaris Puppet users.
> Is it perhaps common for specific smf services to be faulty in this manner?
It does happen often when dependencies aren't specified or there's an error in
the service methods (stop/start), but that's a feature of SMF I guess.
----------------------------------------
Bug #16117: smf service provider is not correctly idempotent
https://projects.puppetlabs.com/issues/16117#change-69850
Author: Nigel Kersten
Status: Unreviewed
Priority: Normal
Assignee:
Category:
Target version:
Affected Puppet version:
Keywords: solaris smf
Branch:
(I don't know Solaris. I'm capturing data from another source that makes us
think this)
After the initial run of the MCollective configuration that is applied from the
'default' group in a PE install MCollective is put into maintenance mode.
<pre>
info: /Stage[main]/Pe_mcollective::Posix/File[puppet-dashboard-public.pem]:
Scheduling refresh of Service[mcollective]
notice: /Stage[main]/Pe_mcollective::Posix/Service[mcollective]/ensure: ensure
changed 'stopped' to 'running'
notice: /Stage[main]/Pe_mcollective::Posix/Service[mcollective]: Triggered
'refresh' from 30 events
notice: Finished catalog run in 5.42 seconds
bash-3.00# svcs pe-mcollective
online* 12:33:32 svc:/network/pe-mcollective:default
bash-3.00# svcs -x
svc:/network/pe-mcollective:default (Mcollective Daemon)
State: maintenance since Thu Apr 12 12:34:33 2012
Reason: Method failed.
See: http://sun.com/msg/SMF-8000-8Q
See: pe-mcollective(1)
See: /var/svc/log/network-pe-mcollective:default.log
Impact: This service is not running.
</pre>
Running puppet again seems to resolve the issue...
<pre>
bash-3.00# /opt/puppet/bin/puppet agent -t
info: Retrieving plugin
info: Loading facts in /var/opt/lib/pe-puppet/lib/facter/facter_dot_d.rb
info: Loading facts in /var/opt/lib/pe-puppet/lib/facter/puppet_vardir.rb
info: Loading facts in /var/opt/lib/pe-puppet/lib/facter/root_home.rb
info: Caching catalog for sol-proto.vm
info: Applying configuration version '1334259649'
notice: /Stage[main]/Pe_mcollective::Posix/Service[mcollective]/ensure: ensure
changed 'maintenance' to 'running'
notice: Finished catalog run in 2.41 seconds
bash-3.00# /opt/puppet/bin/puppet agent -t
info: Retrieving plugin
info: Loading facts in /var/opt/lib/pe-puppet/lib/facter/facter_dot_d.rb
info: Loading facts in /var/opt/lib/pe-puppet/lib/facter/puppet_vardir.rb
info: Loading facts in /var/opt/lib/pe-puppet/lib/facter/root_home.rb
info: Caching catalog for sol-proto.vm
info: Applying configuration version '1334259649'
notice: Finished catalog run in 2.07 seconds
bash-3.00# svcs pe-mcollective
online 12:40:53 svc:/network/pe-mcollective:default
bash-3.00# svcs -x
bash-3.00#
peadmin@lucid-alpha:/root$ mco ping
lucid-alpha.vm time=84.67 ms
sol-proto.vm time=120.91 ms
</pre>
note: lucid-alpha is a master/console/agent, both at PE 2.5.1
Haus described this as:
<blockquote>
Sure. Solaris services have a few states, and the current smf (solaris service)
provider handles going from one state to another, but as I understand it, to go
from maintenance to running requires two state changes, one from maintenance =>
cleared and then from cleared => running. Currently the smf provider will clear
maintenance mode in one run and start the service in the next run.
The provider could do some switching and polling to do something like. Oh we're
going from maintenance to running. So first let's clear the maintenance mode
and wait for the service to be ready to start. Poll poll poll. Oh the service
is ready, now let's start it. I don’t know enough about smf or svcadm to know
if it is sync or async, and/or if there is the potential for a clear to fail.
</blockquote>
--
You have received this notification because you have either subscribed to it,
or are involved in it.
To change your notification preferences, please click here:
http://projects.puppetlabs.com/my/account
--
You received this message because you are subscribed to the Google Groups
"Puppet Bugs" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/puppet-bugs?hl=en.