I have been unable to get senlin health policies to work at all, and I'm 
confused about what I might be doing wrong, though have some idea that it is 
related to credentials. Any help would be much appreciated.


What I have tried is the following (using a heat template, though same error 
making senlin commands directly from the command line).

  *   Create a cluster with a single node in it (initial size), and a health 
policy attached to it using NODE_STATUS_POLLING.
  *   Verify that the cluster exists, that a node has been created with a VM, 
and that the health policy exists and has been linked to the cluster.
  *   Nuke the VM (nova delete) to try and trigger healing.

I would expect that senlin health policy would detect that the VM has gone, and 
do healing. However, that does not happen. If I do "senlin node-check" then the 
node state changes to ERROR and the cluster state changes to WARNING (so it can 
tell that the cluster is in a bad way). However, the health policy does not do 
as I would expect (replacing the Senlin node).

I'm seeing some odd log extracts that make me think that the issue is that the 
health policy does not have access to the right credentials in order to issue 
the polling requests. I have found 
http://docs.openstack.org/developer/senlin/developer/authorization.html but 
cannot see quite how it relates.

I'm using stable/mitaka and devstack on a single ubuntu server, heat template 
below, and also the extract from the logs.

Can anybody suggest what I might be doing wrong or point me at some 
documentation that explains how healing / authentication in Senlin should / 
does work?

Thanks, Peter White


Heat template

heat_template_version: 2016-04-08

description: Simple template to test healing

resources:
  profile:
    type: OS::Senlin::Profile
    properties:
      type: os.nova.server-1.0
      properties:
        image: cirros-0.3.4-x86_64-uec
        flavor: m1.tiny

  cluster1:
    type: OS::Senlin::Cluster
    properties:
      name: cluster1
      profile: {get_resource: profile}
      desired_capacity: 1
      min_size: 1

  heal_policy:
    type: OS::Senlin::Policy
    properties:
      type: senlin.policy.health-1.0
      bindings:
        - cluster: {get_resource: cluster1}
      properties:
        detection:
          type: NODE_STATUS_POLLING
          options:
            interval: 60
        recovery:
          actions:
            - RECREATE
          #fencing: # Not sure what this does, but didn't seem to make any 
difference.
          #  - COMPUTE


Senlin log extract

2016-07-20 11:20:04.379 DEBUG oslo_messaging._drivers.amqpdriver [-] received 
reply msg_id: 3b7f3c9c28074c7eb14af8e50ba10a42 from (pid=21537) __call__ 
/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:302
2016-07-20 11:20:04.379 ERROR oslo.service.loopingcall [-] Fixed interval 
looping call 'senlin.engine.health_manager.HealthManager._poll_cluster' failed
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall Traceback (most recent 
call last):
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall   File 
"/usr/local/lib/python2.7/dist-packages/oslo_service/loopingcall.py", line 136, 
in _run_loop
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall     result = 
func(*self.args, **self.kw)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall   File 
"/opt/stack/senlin/senlin/engine/health_manager.py", line 110, in _poll_cluster
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall     
self.rpc_client.cluster_check(self.ctx, cluster_id)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall   File 
"/opt/stack/senlin/senlin/rpc/client.py", line 217, in cluster_check
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall     params=params))
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall   File 
"/opt/stack/senlin/senlin/rpc/client.py", line 50, in call
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall     return 
client.call(ctxt, method, **kwargs)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall   File 
"/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 
413, in call
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall     return 
self.prepare().call(ctxt, method, **kwargs)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall   File 
"/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 
158, in call
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall     retry=self.retry)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall   File 
"/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, 
in _send
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall     timeout=timeout, 
retry=retry)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall   File 
"/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", 
line 470, in send
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall     retry=retry)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall   File 
"/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", 
line 461, in _send
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall     raise result
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall ValueError: Field `user' 
cannot be None
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall Traceback (most recent 
call last):
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall   File 
"/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 
138, in _dispatch_and_reply
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall     incoming.message))
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall   File 
"/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 
185, in _dispatch
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall     return 
self._do_dispatch(endpoint, method, ctxt, args)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall   File 
"/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 
127, in _do_dispatch
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall     result = func(ctxt, 
**new_args)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall   File 
"/opt/stack/senlin/senlin/engine/service.py", line 68, in wrapped
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall     return func(self, 
ctx, *args, **kwargs)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall   File 
"/opt/stack/senlin/senlin/engine/service.py", line 1328, in cluster_check
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall     
consts.CLUSTER_CHECK, **params)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall   File 
"/opt/stack/senlin/senlin/engine/actions/base.py", line 282, in create
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall     return 
obj.store(context)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall   File 
"/opt/stack/senlin/senlin/engine/actions/base.py", line 187, in store
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall     action = 
ao.Action.create(context, values)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall   File 
"/opt/stack/senlin/senlin/objects/action.py", line 52, in create
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall     return 
cls._from_db_object(context, cls(context), obj)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall   File 
"/opt/stack/senlin/senlin/objects/base.py", line 43, in _from_db_object
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall     obj[field] = 
db_obj[field]
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall   File 
"/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 
727, in __setitem__
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall     setattr(self, name, 
value)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall   File 
"/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 
72, in setter
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall     field_value = 
field.coerce(self, name, value)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall   File 
"/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/fields.py", line 
190, in coerce
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall     return 
self._null(obj, attr)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall   File 
"/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/fields.py", line 
168, in _null
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall     raise 
ValueError(_("Field `%s' cannot be None") % attr)
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall ValueError: Field `user' 
cannot be None
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall
2016-07-20 11:20:04.462 INFO senlin.engine.event 
[req-6ce9b961-acdf-4523-a04b-ef98d7752f85 None None] cluster1 [6f720478] 
CLUSTER_ATTACH_POLICY - SUCCEEDED: Policy attached.



_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : [email protected]
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Reply via email to