[Puppet - Bug #16117] smf service provider is not correctly idempotent

tickets Tue, 13 Nov 2012 14:46:16 -0800

Issue #16117 has been updated by Justin Stoller.


I am almost certain this isn't a Puppet problem but a PE SMF issue, here are 
the logs Rahul requested:

<pre>

bash-3.00# less /var/svc/log/network-pe-mcollective:default.log
WARNING: terminal is not fully functional
[ Nov 13 14:10:17 Disabled. ]ective:default.log  (press RETURN)
[ Nov 13 14:10:17 Rereading configuration. ]
[ Nov 13 14:18:01 Enabled. ]
[ Nov 13 14:18:01 Executing start method ("/opt/puppet/sbin/mcollectived 
--pid=/var/run/pe-mcollective.pid 
--config="/etc/puppetlabs/mcollective/server.cfg") ]
[ Nov 13 14:18:01 Method "start" exited with status 0 ]
[ Nov 13 14:18:01 Stopping because service restarting. ]
[ Nov 13 14:18:01 Executing stop method (:kill) ]
[ Nov 13 14:19:02 Method or service exit timed out.  Killing contract 101 ]
[ Nov 13 14:22:21 Leaving maintenance because clear requested. ]
[ Nov 13 14:22:21 Enabled. ]
[ Nov 13 14:22:21 Executing start method ("/opt/puppet/sbin/mcollectived 
--pid=/var/run/pe-mcollective.pid 
--config="/etc/puppetlabs/mcollective/server.cfg") ]
[ Nov 13 14:22:21 Method "start" exited with status 0 ]
/var/svc/log/network-pe-mcollective:default.log (END) 

</pre>

<pre>

bash-3.00# svcs -d svc:/network/pe-mcollective:default
STATE          STIME    FMRI
online         13:42:25 svc:/network/loopback:default
online         13:42:35 svc:/network/physical:default
online         13:42:36 svc:/system/filesystem/local:default

</pre>

This is the log for starting puppetagent:default on Solaris 10:

<pre>

bash-3.00# less /var/svc/log/network-puppetagent\:default.log 
WARNING: terminal is not fully functional
[ Nov 13 14:10:22 Disabled. ]ent:default.log  (press RETURN)
[ Nov 13 14:10:22 Rereading configuration. ]
[ Nov 13 14:10:32 Rereading configuration. ]
[ Nov 13 14:10:32 Enabled. ]
[ Nov 13 14:10:32 Executing start method ("/opt/puppet/bin/puppet agent") ]
[ Nov 13 14:10:33 Method "start" exited with status 0 ]

</pre>

It rereads the configuration info for pe-mcollective and goes into `online*`, 
then the service tries to restart and hangs, going into maintanence mode, then 
it is properly started.

puppetagent rereads configuration without going into `online*` mode and is 
cleaning started again.

These are the meaningful parts of the service manifests:

<code>

<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<!-- Original mcollective manifest: Michael Stahnke - puppetlabs.com -->
<service_bundle type="manifest" name="pe-mcollective">

  <service name="network/pe-mcollective" type="service" version="1">

    <create_default_instance enabled="false"/>
    <single_instance/>

    <dependency name="config-file" grouping="require_all" restart_on="none" 
type="path">
      <service_fmri value="file:///etc/puppetlabs/mcollective/server.cfg"/>
    </dependency>

    <dependency name="loopback" grouping="require_all" restart_on="error" 
type="service">
      <service_fmri value="svc:/network/loopback:default"/>
    </dependency>

    <dependency name="physical" grouping="require_all" restart_on="error" 
type="service">
      <service_fmri value="svc:/network/physical:default"/>
    </dependency>

    <dependency name="fs-local" grouping="require_all" restart_on="none" 
type="service">
      <service_fmri value="svc:/system/filesystem/local"/>
    </dependency>

    <exec_method type="method" name="start" exec="/opt/puppet/sbin/mcollectived 
--pid=/var/run/pe-mcollective.pid 
--config=&quot;/etc/puppetlabs/mcollective/server.cfg" timeout_sec
onds="60"/>

    <exec_method type="method" name="stop" exec=":kill" timeout_seconds="60"/>

    <stability value="Evolving"/>

    <template>
      <common_name>
        <loctext xml:lang="C">Mcollective Daemon</loctext>
      </common_name>
      <documentation>
        <manpage title="pe-mcollective" section="1"/>
        <doc_link name="puppetlabs.com" 
uri="http://www.puppetlabs.com/mcollective/introduction/"/>
      </documentation>

...snip...

/var/svc/manifest/network/pe-mcollective.xml 

</code>

<code>

<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<!-- Original puppet manifest: Luke Kanies - puppetlabs.com -->
<service_bundle type="manifest" name="puppetagent">

  <service name="network/puppetagent" type="service" version="1">

    <create_default_instance enabled="false"/>
    <single_instance/>

    <dependency name="config-file" grouping="require_all" restart_on="none" 
type="path">
      <service_fmri value="file:///etc/puppetlabs/puppet/puppet.conf"/>
    </dependency>

    <dependency name="loopback" grouping="require_all" restart_on="error" 
type="service">
      <service_fmri value="svc:/network/loopback:default"/>
    </dependency>

    <dependency name="physical" grouping="require_all" restart_on="error" 
type="service">
      <service_fmri value="svc:/network/physical:default"/>
    </dependency>

    <dependency name="fs-local" grouping="require_all" restart_on="none" 
type="service">
      <service_fmri value="svc:/system/filesystem/local"/>
    </dependency>

    <exec_method type="method" name="start" exec="/opt/puppet/bin/puppet agent" 
timeout_seconds="60"/>

    <exec_method type="method" name="stop" exec=":kill" timeout_seconds="60"/>

  <stability value="Evolving"/>

...snip...

</code>
----------------------------------------
Bug #16117: smf service provider is not correctly idempotent
https://projects.puppetlabs.com/issues/16117#change-76453

Author: Nigel Kersten
Status: Needs More Information
Priority: Normal
Assignee: Rahul Gopinath
Category: 
Target version: 
Affected Puppet version: 
Keywords: solaris smf
Branch: 


(I don't know Solaris. I'm capturing data from another source that makes us 
think this)

After the initial run of the MCollective configuration that is applied from the 
'default' group in a PE install MCollective is put into maintenance mode.

<pre>

info: /Stage[main]/Pe_mcollective::Posix/File[puppet-dashboard-public.pem]: 
Scheduling refresh of Service[mcollective]
notice: /Stage[main]/Pe_mcollective::Posix/Service[mcollective]/ensure: ensure 
changed 'stopped' to 'running'
notice: /Stage[main]/Pe_mcollective::Posix/Service[mcollective]: Triggered 
'refresh' from 30 events
notice: Finished catalog run in 5.42 seconds

bash-3.00# svcs pe-mcollective             
online*        12:33:32 svc:/network/pe-mcollective:default

bash-3.00# svcs -x
svc:/network/pe-mcollective:default (Mcollective Daemon)
 State: maintenance since Thu Apr 12 12:34:33 2012
Reason: Method failed.
   See: http://sun.com/msg/SMF-8000-8Q
   See: pe-mcollective(1)
   See: /var/svc/log/network-pe-mcollective:default.log
Impact: This service is not running.

</pre>

Running puppet again seems to resolve the issue...

<pre>

bash-3.00# /opt/puppet/bin/puppet agent -t
info: Retrieving plugin
info: Loading facts in /var/opt/lib/pe-puppet/lib/facter/facter_dot_d.rb
info: Loading facts in /var/opt/lib/pe-puppet/lib/facter/puppet_vardir.rb
info: Loading facts in /var/opt/lib/pe-puppet/lib/facter/root_home.rb
info: Caching catalog for sol-proto.vm
info: Applying configuration version '1334259649'
notice: /Stage[main]/Pe_mcollective::Posix/Service[mcollective]/ensure: ensure 
changed 'maintenance' to 'running'
notice: Finished catalog run in 2.41 seconds

bash-3.00# /opt/puppet/bin/puppet agent -t
info: Retrieving plugin
info: Loading facts in /var/opt/lib/pe-puppet/lib/facter/facter_dot_d.rb
info: Loading facts in /var/opt/lib/pe-puppet/lib/facter/puppet_vardir.rb
info: Loading facts in /var/opt/lib/pe-puppet/lib/facter/root_home.rb
info: Caching catalog for sol-proto.vm
info: Applying configuration version '1334259649'
notice: Finished catalog run in 2.07 seconds

bash-3.00# svcs pe-mcollective             
online         12:40:53 svc:/network/pe-mcollective:default
bash-3.00# svcs -x                       
bash-3.00#

peadmin@lucid-alpha:/root$ mco ping
lucid-alpha.vm                           time=84.67 ms
sol-proto.vm                             time=120.91 ms

</pre>

note: lucid-alpha is a master/console/agent, both at PE 2.5.1


Haus described this as:

<blockquote>
Sure. Solaris services have a few states, and the current smf (solaris service) 
provider handles going from one state to another, but as I understand it, to go 
from maintenance to running requires two state changes, one from maintenance => 
cleared and then from cleared => running. Currently the smf provider will clear 
maintenance mode in one run and start the service in the next run.

The provider could do some switching and polling to do something like. Oh we're 
going from maintenance to running. So first let's clear the maintenance mode 
and wait for the service to be ready to start. Poll poll poll. Oh the service 
is ready, now let's start it. I don’t know enough about smf or svcadm to know 
if it is sync or async, and/or if there is the potential for a clear to fail.
</blockquote>


-- 
You have received this notification because you have either subscribed to it, 
or are involved in it.
To change your notification preferences, please click here: 
http://projects.puppetlabs.com/my/account

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Bugs" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-bugs?hl=en.

[Puppet - Bug #16117] smf service provider is not correctly idempotent

Reply via email to