Hi Juha, Are you using Havana?
If so, your alarm resource templates seem to be missing setting the "repeat_actions" attribute to true. Continuous notification was required in order for the Heat autoscaling cooldown logic (and continued scaling actions) to work. This was a quirk of the Heat implementation in Havan, but TBH from a usability PoV should have always had this set to True by default. This is fixed now on Heat:master: https://github.com/openstack/heat/commit/4dc987ef but hasn't been backported to Heat:stable/stable. The issue you see with the instance.scheduled message being rejected with "signature invalid" is not related to autoscaling, and in case I've fixed it in Ceilometer:master and stable/havana: https://bugs.launchpad.net/ceilometer/+bug/1262255 released in: https://launchpad.net/ceilometer/+milestone/2013.2.2 The issue you see with mongo complaining that a metering message is not okForStorage is generally caused by a stray period ('.') in some resource metadata key. Where the metadata is prefixed by 'metering.', then the periods are mapped to underscores, so in your case: { "Key" : "metering.server_group", "Value" : "Group_A" }, ^^^^^^^^^^ that mapping should have occurred. In any case, I'd be interested in hearing which actual metering messages are being rejected with not okForStorage. Thanks, Eoghan > Hi, > > I'm having some problems concerning auto scaling feature. > Any ideas? > > First scaling up and down is working just fine. But then when tested later on > scaling down/up is no longer working properly. > Scaling down may occur even it shouldn't or scaling up doesn't occur even it > should. When in this situation I remove all the > received metric data from the DB, auto scaling starts to work again. > > Ceilometer is configured to use Mongo and the auto scaling is based on the > cpu_util metrics. > > Related configurations: > ----------------------- > /etc/ceilometer/pipeline.yaml on compute nodes: > > name: cpu_pipeline > interval: 15 > > /etc/ceilometer/ceilometer.conf on controller: > evaluation_interval=15 > > Heat template used: > ------------------- > "Resources" : { > > "Group_A" : { > "Type" : "AWS::AutoScaling::AutoScalingGroup", > "Properties" : { > "AvailabilityZones" : { "Fn::GetAZs" : ""}, > "LaunchConfigurationName" : { "Ref" : "Group_A_Config" }, > "MinSize" : "1", > "MaxSize" : "3", > "Tags" : [ > { "Key" : "metering.server_group", "Value" : "Group_A" }, > { "Key" : "custom_metadata", "Value" : "test" } > ], > "VPCZoneIdentifier" : [ { "Ref" : "PrivateSubnetId" } ] > } > }, > > "Group_A_Config" : { > "Type" : "AWS::AutoScaling::LaunchConfiguration", > "Properties": { > "ImageId" : { "Ref" : "ImageId" }, > "InstanceType" : { "Ref" : "InstanceType" }, > "KeyName" : { "Ref" : "KeyName" } > } > }, > > "ScaleUpPolicy" : { > "Type" : "AWS::AutoScaling::ScalingPolicy", > "Properties" : { > "AdjustmentType" : "ChangeInCapacity", > "AutoScalingGroupName" : { "Ref" : "Group_A" }, > "Cooldown" : "20", > "ScalingAdjustment" : "1" > } > }, > > "ScaleDownPolicy" : { > "Type" : "AWS::AutoScaling::ScalingPolicy", > "Properties" : { > "AdjustmentType" : "ChangeInCapacity", > "AutoScalingGroupName" : { "Ref" : "Group_A" }, > "Cooldown" : "20", > "ScalingAdjustment" : "-1" > } > }, > > "CPUAlarmHigh": { > "Type": "OS::Ceilometer::Alarm", > "Properties": { > "description": "Scale-up if CPU is greater than 90% for 20 seconds", > "meter_name": "cpu_util", > "statistic": "avg", > "period": "20", > "evaluation_periods": "1", > "threshold": "90", > "alarm_actions": > [ {"Fn::GetAtt": ["ScaleUpPolicy", "AlarmUrl"]} ], > "matching_metadata": > {"metadata.user_metadata.server_group": "Group_A" }, > "comparison_operator": "gt" > } > }, > > "CPUAlarmLow": { > "Type": "OS::Ceilometer::Alarm", > "Properties": { > "description": "Scale-down if CPU is less than 50% for 20 seconds", > "meter_name": "cpu_util", > "statistic": "avg", > "period": "20", > "evaluation_periods": "1", > "threshold": "50", > "alarm_actions": > [ {"Fn::GetAtt": ["ScaleDownPolicy", "AlarmUrl"]} ], > "matching_metadata": > {"metadata.user_metadata.server_group": "Group_A" }, > "comparison_operator": "lt" > } > > In ceilometer logs I can see the following kind of warnings: > > <44>Feb 24 08:41:08 node-16 > ceilometer-ceilometer.collector.dispatcher.database WARNING: message > signature invalid, discarding message: {u'counter_name': > u'instance.scheduled', u'user_id': None, u'message_signature': > u'd1b49ddf004edc5b7a8dc9405b42a71f2ae975d04c25838c3dc0ea0e6f6e4edd', > u'timestamp': u'2014-02-24 08:41:08.334580', u'resource_id': > u'48c815ab-01c9-4ac8-9096-ac171976598c', u'message_id': > u'67e611e4-9d2f-11e3-81f1-080027e519cb', u'source': u'openstack', > u'counter_unit': u'instance', u'counter_volume': 1, u'project_id': > u'efcca4ba425c4beda73eb31a54df931a', u'resource_metadata': {u'instance_id': > u'48c815ab-01c9-4ac8-9096-ac171976598c', u'weighted_host': {u'host': > u'node-18', u'weight': 3818.0}, u'host': u'scheduler.node-16', > u'request_spec': {u'num_instances': 1, u'block_device_mapping': > [{u'instance_uuid': u'48c815ab-01c9-4ac8-9096-ac171976598c', > u'guest_format': None, u'boot_index': 0, u'delete_on_termination': True, > u'no_device': None, u'connection_info': None, u'volume_id': None, > u'device_name': None, u'disk_bus': None, u'image_id': > u'11848cbf-a428-4dfb-8818-2f0a981f540b', u'source_type': u'image', > u'device_type': u'disk', u'snapshot_id': None, u'destination_type': > u'local', u'volume_size': None}], u'image': {u'status': u'active', u'name': > u'cirrosImg', u'deleted': False, u'container_format': u'bare', > u'created_at': u'2014-02-12T08:46:04.000000', u'disk_format': u'qcow2', > u'updated_at': u'2014-02-12T08:46:04.000000', u'properties': {}, > u'min_disk': 0, u'min_ram': 0, u'checksum': > u'50bdc35edb03a38d91b1b071afb20a3c', u'owner': > u'efcca4ba425c4beda73eb31a54df931a', u'is_public': True, u'deleted_at': > None, u'id': u'11848cbf-a428-4dfb-8818-2f0a981f540b', u'size': 9761280}, > u'instance_type': {u'root_gb': 1, u'name': u'm1.tiny', u'ephemeral_gb': 0, > u'memory_mb': 512, u'vcpus': 1, u'extra_specs': {}, u'swap': 0, > u'rxtx_factor': 1.0, u'flavorid': u'1', u'vcpu_weight': None, u'id': 2}, > u'instance_properties': {u'vm_state': u'building', u'availability_zone': > None, u'terminated_at': None, u'ephemeral_gb': 0, u'instance_type_id': 2, > u'user_data': u'Q29udGVudC1UeXBlOiBtdWx0aXBhcnQvbWl4ZWQ7IGJvdW5kYXJ5PSI9PT0 > ... > , u'cleaned': False, u'vm_mode': None, u'deleted_at': None, > u'reservation_id': u'r-l91mh33v', u'id': 274, u'security_groups': > {u'objects': []}, u'disable_terminate': False, u'root_device_name': None, > u'display_name': u'tyky-Group_A-55cklit7nvbq-Group_A-2-yis32na5m7ey', > u'uuid': u'48c815ab-01c9-4ac8-9096-ac171976598c', u'default_swap_device': > None, u'info_cache': {u'instance_uuid': > u'48c815ab-01c9-4ac8-9096-ac171976598c', u'network_info': []}, u'hostname': > u'tyky-group-a-55cklit7nvbq-group-a-2-yis32na5m7ey', u'launched_on': None, > u'display_description': u'tyky-Group_A-55cklit7nvbq-Group_A-2-yis32na5m7ey', > u'key_data': u'ssh-rsa > AAAAB3NzaC1yc2EAAAADAQABAAABAQC39hmz8e40Xv/+QKkLyRA7j02RfIG61cr1j41RftnkOF3ZbwBzi7qibsOA3gC9Ln05YbB6z2/iUnQzxQsoOpmlnXuv2O296utY2ZCTKhdFSzn2Ot7l635zEXkivMc97wz4bITtaBTjX3nV6sXOfevdTIOJeC11SqxmfNRRzXcz9fRv6kLjz7IrA0tvRTp2xDVtFEj+vFLWaXc3TcUSygxiSLeAuNkH1rZ9jVuHXXvzb/e7navrGyJec2P86AQg2TUk77MhLjPcbyKiJJK0DhK6zOkZUWXtgIVQx7+gO/Xs2QgQHcw+VdzRzpJK+/EOzUOU8IDWNnyfaJEnQEoX2oMj > Generated by Nova\n', u'deleted': False, u'config_drive': u'', > u'power_state': 0, u'default_ephemeral_device': None, u'progress': 0, > u'project_id': u'efcca4ba425c4beda73eb31a54df931a', u'launched_at': None, > u'scheduled_at': None, u'node': None, u'ramdisk_id': u'', u'access_ip_v6': > None, u'access_ip_v4': None, u'kernel_id': u'', u'key_name': u'heat_key', > u'updated_at': None, u'host': None, u'user_id': > u'ef4e983291ef4ad1b88eb1f776bd52b6', u'system_metadata': > {u'instance_type_memory_mb': 512, u'instance_type_swap': 0, > u'instance_type_vcpu_weight': None, u'instance_type_root_gb': 1, > u'instance_type_name': u'm1.tiny', u'instance_type_id': 2, > u'instance_type_ephemeral_gb': 0, u'instance_type_rxtx_factor': 1.0, > u'image_disk_format': u'qcow2', u'instance_type_flavorid': u'1', > u'instance_type_vcpus': 1, u'image_container_format': u'bare', > u'image_min_ram': 0, u'image_min_disk': 1, u'image_base_image_ref': > u'11848cbf-a428-4dfb-8818-2f0a981f540b'}, u'task_state': u'scheduling', > u'shutdown_terminate': False, u'cell_name': None, u'root_gb': 1, u'locked': > False, u'name': u'instance-00000112', u'created_at': > u'2014-02-24T08:41:08.257534', u'locked_by': None, u'launch_index': 0, > u'memory_mb': 512, u'vcpus': 1, u'image_ref': > u'11848cbf-a428-4dfb-8818-2f0a981f540b', u'architecture': None, > u'auto_disk_config': False, u'os_type': None, u'metadata': > {u'metering.server_group': u'Group_A', u'AutoScalingGroupName': > u'tyky-Group_A-55cklit7nvbq', u'custom_metadata': u'test'}}, > u'security_group': [u'default'], u'instance_uuids': > [u'48c815ab-01c9-4ac8-9096-ac171976598c']}, u'event_type': > u'scheduler.run_instance.scheduled'}, u'counter_type': u'delta'} > > Also the following warnings/errors can be seen but they seem to occur when > auto scaling is properly working and have no negative effects as such: > > <44>Feb 24 08:43:08 node-16 > <U+FEFF>ceilometer-ceilometer.transformer.conversions WARNING: dropping > sample with no predecessor: <ceilometer.sample.Sample object at 0x3774fd0> > <44>Feb 24 08:43:08 node-16 ceilometer-ceilometer.publisher.rpc AUDIT: > Publishing 1 samples on metering > <44>Feb 24 08:43:08 node-16 ceilometer-ceilometer.publisher.rpc AUDIT: > Publishing 1 samples on metering > <44>Feb 24 08:43:08 node-16 ceilometer-ceilometer.publisher.rpc AUDIT: > Publishing 1 samples on metering > <44>Feb 24 08:43:08 node-16 ceilometer-ceilometer.publisher.rpc AUDIT: > Publishing 1 samples on metering > <44>Feb 24 08:43:08 node-16 ceilometer-ceilometer.publisher.rpc AUDIT: > Publishing 1 samples on metering > <44>Feb 24 08:43:08 node-16 ceilometer-ceilometer.publisher.rpc AUDIT: > Publishing 1 samples on metering > <44>Feb 24 08:43:09 node-16 ceilometer-ceilometer.publisher.rpc AUDIT: > Publishing 1 samples on metering > <43>Feb 24 08:43:09 node-16 > ceilometer-ceilometer.collector.dispatcher.database ERROR: Failed to record > metering data: not okForStor > age > Traceback (most recent call last): > File > "/usr/lib/python2.7/dist-packages/ceilometer/collector/dispatcher/database.py", > line 65, in record_metering_data > self.storage_conn.record_metering_data(meter) > File "/usr/lib/python2.7/dist-packages/ceilometer/storage/impl_mongodb.py", > line 417, in record_metering_data > upsert=True, > File "/usr/lib/python2.7/dist-packages/pymongo/collection.py", line 487, in > update > check_keys, self.__uuid_subtype), safe) > File "/usr/lib/python2.7/dist-packages/pymongo/mongo_client.py", line 969, in > _send_message > rv = self.__check_response_to_last_error(response) > File "/usr/lib/python2.7/dist-packages/pymongo/mongo_client.py", line 911, in > __check_response_to_last_error > raise OperationFailure(details["err"], details["code"]) > OperationFailure: not okForStorage > > Br, > -Juha > > _______________________________________________ > Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > Post to : openstack@lists.openstack.org > Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack