I have been unable to get senlin health policies to work at all, and I'm confused about what I might be doing wrong, though have some idea that it is related to credentials. Any help would be much appreciated.
What I have tried is the following (using a heat template, though same error making senlin commands directly from the command line). * Create a cluster with a single node in it (initial size), and a health policy attached to it using NODE_STATUS_POLLING. * Verify that the cluster exists, that a node has been created with a VM, and that the health policy exists and has been linked to the cluster. * Nuke the VM (nova delete) to try and trigger healing. I would expect that senlin health policy would detect that the VM has gone, and do healing. However, that does not happen. If I do "senlin node-check" then the node state changes to ERROR and the cluster state changes to WARNING (so it can tell that the cluster is in a bad way). However, the health policy does not do as I would expect (replacing the Senlin node). I'm seeing some odd log extracts that make me think that the issue is that the health policy does not have access to the right credentials in order to issue the polling requests. I have found http://docs.openstack.org/developer/senlin/developer/authorization.html but cannot see quite how it relates. I'm using stable/mitaka and devstack on a single ubuntu server, heat template below, and also the extract from the logs. Can anybody suggest what I might be doing wrong or point me at some documentation that explains how healing / authentication in Senlin should / does work? Thanks, Peter White Heat template heat_template_version: 2016-04-08 description: Simple template to test healing resources: profile: type: OS::Senlin::Profile properties: type: os.nova.server-1.0 properties: image: cirros-0.3.4-x86_64-uec flavor: m1.tiny cluster1: type: OS::Senlin::Cluster properties: name: cluster1 profile: {get_resource: profile} desired_capacity: 1 min_size: 1 heal_policy: type: OS::Senlin::Policy properties: type: senlin.policy.health-1.0 bindings: - cluster: {get_resource: cluster1} properties: detection: type: NODE_STATUS_POLLING options: interval: 60 recovery: actions: - RECREATE #fencing: # Not sure what this does, but didn't seem to make any difference. # - COMPUTE Senlin log extract 2016-07-20 11:20:04.379 DEBUG oslo_messaging._drivers.amqpdriver [-] received reply msg_id: 3b7f3c9c28074c7eb14af8e50ba10a42 from (pid=21537) __call__ /usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:302 2016-07-20 11:20:04.379 ERROR oslo.service.loopingcall [-] Fixed interval looping call 'senlin.engine.health_manager.HealthManager._poll_cluster' failed 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall Traceback (most recent call last): 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_service/loopingcall.py", line 136, in _run_loop 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall result = func(*self.args, **self.kw) 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/opt/stack/senlin/senlin/engine/health_manager.py", line 110, in _poll_cluster 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall self.rpc_client.cluster_check(self.ctx, cluster_id) 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/opt/stack/senlin/senlin/rpc/client.py", line 217, in cluster_check 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall params=params)) 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/opt/stack/senlin/senlin/rpc/client.py", line 50, in call 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall return client.call(ctxt, method, **kwargs) 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 413, in call 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall return self.prepare().call(ctxt, method, **kwargs) 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall retry=self.retry) 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall timeout=timeout, retry=retry) 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 470, in send 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall retry=retry) 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 461, in _send 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall raise result 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall ValueError: Field `user' cannot be None 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall Traceback (most recent call last): 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 138, in _dispatch_and_reply 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall incoming.message)) 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 185, in _dispatch 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall return self._do_dispatch(endpoint, method, ctxt, args) 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 127, in _do_dispatch 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall result = func(ctxt, **new_args) 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/opt/stack/senlin/senlin/engine/service.py", line 68, in wrapped 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall return func(self, ctx, *args, **kwargs) 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/opt/stack/senlin/senlin/engine/service.py", line 1328, in cluster_check 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall consts.CLUSTER_CHECK, **params) 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/opt/stack/senlin/senlin/engine/actions/base.py", line 282, in create 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall return obj.store(context) 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/opt/stack/senlin/senlin/engine/actions/base.py", line 187, in store 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall action = ao.Action.create(context, values) 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/opt/stack/senlin/senlin/objects/action.py", line 52, in create 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall return cls._from_db_object(context, cls(context), obj) 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/opt/stack/senlin/senlin/objects/base.py", line 43, in _from_db_object 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall obj[field] = db_obj[field] 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 727, in __setitem__ 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall setattr(self, name, value) 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 72, in setter 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall field_value = field.coerce(self, name, value) 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/fields.py", line 190, in coerce 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall return self._null(obj, attr) 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/fields.py", line 168, in _null 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall raise ValueError(_("Field `%s' cannot be None") % attr) 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall ValueError: Field `user' cannot be None 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall 2016-07-20 11:20:04.379 TRACE oslo.service.loopingcall 2016-07-20 11:20:04.462 INFO senlin.engine.event [req-6ce9b961-acdf-4523-a04b-ef98d7752f85 None None] cluster1 [6f720478] CLUSTER_ATTACH_POLICY - SUCCEEDED: Policy attached.
_______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : [email protected] Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
