Issue #16117 has been updated by Dominic Cleal.

I've re-read the description and see that it actually went wrong during the 
service refresh event (not starting), which would have performed an `svcadm 
restart pe-mcollective`.  The initial start had to have succeeded as since 
#10807, it would have returned a failure.

The svcs output shows the service is shutting down ("online*") and then 60 
seconds it fails.  This is probably an error in the stop method of the 
pe-mcollective service, most likely because there's still a process running 
that didn't die (hence the timeout).  This'll cause it force-kill the remaining 
processes and to drop into maintenance.

We could certainly improve the provider here to detect a failed restart event 
by polling for the restart to have ended.  There's no svcadm  flag like with a 
start/stop to do this.

On the second run, Puppet cleared the state and started it back up.  I think 
it's questionable that the provider does clear a state needing admin action, 
but at least it recovered.

Nigel Kersten wrote:
> I do feel like I've heard similar sentiments from other Solaris Puppet users. 
> Is it perhaps common for specific smf services to be faulty in this manner?

It does happen often when dependencies aren't specified or there's an error in 
the service methods (stop/start), but that's a feature of SMF I guess.
----------------------------------------
Bug #16117: smf service provider is not correctly idempotent
https://projects.puppetlabs.com/issues/16117#change-69850

Author: Nigel Kersten
Status: Unreviewed
Priority: Normal
Assignee: 
Category: 
Target version: 
Affected Puppet version: 
Keywords: solaris smf
Branch: 


(I don't know Solaris. I'm capturing data from another source that makes us 
think this)

After the initial run of the MCollective configuration that is applied from the 
'default' group in a PE install MCollective is put into maintenance mode.

<pre>

info: /Stage[main]/Pe_mcollective::Posix/File[puppet-dashboard-public.pem]: 
Scheduling refresh of Service[mcollective]
notice: /Stage[main]/Pe_mcollective::Posix/Service[mcollective]/ensure: ensure 
changed 'stopped' to 'running'
notice: /Stage[main]/Pe_mcollective::Posix/Service[mcollective]: Triggered 
'refresh' from 30 events
notice: Finished catalog run in 5.42 seconds

bash-3.00# svcs pe-mcollective             
online*        12:33:32 svc:/network/pe-mcollective:default

bash-3.00# svcs -x
svc:/network/pe-mcollective:default (Mcollective Daemon)
 State: maintenance since Thu Apr 12 12:34:33 2012
Reason: Method failed.
   See: http://sun.com/msg/SMF-8000-8Q
   See: pe-mcollective(1)
   See: /var/svc/log/network-pe-mcollective:default.log
Impact: This service is not running.

</pre>

Running puppet again seems to resolve the issue...

<pre>

bash-3.00# /opt/puppet/bin/puppet agent -t
info: Retrieving plugin
info: Loading facts in /var/opt/lib/pe-puppet/lib/facter/facter_dot_d.rb
info: Loading facts in /var/opt/lib/pe-puppet/lib/facter/puppet_vardir.rb
info: Loading facts in /var/opt/lib/pe-puppet/lib/facter/root_home.rb
info: Caching catalog for sol-proto.vm
info: Applying configuration version '1334259649'
notice: /Stage[main]/Pe_mcollective::Posix/Service[mcollective]/ensure: ensure 
changed 'maintenance' to 'running'
notice: Finished catalog run in 2.41 seconds

bash-3.00# /opt/puppet/bin/puppet agent -t
info: Retrieving plugin
info: Loading facts in /var/opt/lib/pe-puppet/lib/facter/facter_dot_d.rb
info: Loading facts in /var/opt/lib/pe-puppet/lib/facter/puppet_vardir.rb
info: Loading facts in /var/opt/lib/pe-puppet/lib/facter/root_home.rb
info: Caching catalog for sol-proto.vm
info: Applying configuration version '1334259649'
notice: Finished catalog run in 2.07 seconds

bash-3.00# svcs pe-mcollective             
online         12:40:53 svc:/network/pe-mcollective:default
bash-3.00# svcs -x                       
bash-3.00#

peadmin@lucid-alpha:/root$ mco ping
lucid-alpha.vm                           time=84.67 ms
sol-proto.vm                             time=120.91 ms

</pre>

note: lucid-alpha is a master/console/agent, both at PE 2.5.1


Haus described this as:

<blockquote>
Sure. Solaris services have a few states, and the current smf (solaris service) 
provider handles going from one state to another, but as I understand it, to go 
from maintenance to running requires two state changes, one from maintenance => 
cleared and then from cleared => running. Currently the smf provider will clear 
maintenance mode in one run and start the service in the next run.

The provider could do some switching and polling to do something like. Oh we're 
going from maintenance to running. So first let's clear the maintenance mode 
and wait for the service to be ready to start. Poll poll poll. Oh the service 
is ready, now let's start it. I don’t know enough about smf or svcadm to know 
if it is sync or async, and/or if there is the potential for a clear to fail.
</blockquote>


-- 
You have received this notification because you have either subscribed to it, 
or are involved in it.
To change your notification preferences, please click here: 
http://projects.puppetlabs.com/my/account

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Bugs" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-bugs?hl=en.

Reply via email to