On Tue, Nov 25, 2014 at 5:21 AM, Bartosz Kupidura <bkupid...@mirantis.com> wrote: > > Hello All, > > Im working on Zabbix implementation which include HA support. > > Zabbix server should be deployed on all controllers in HA mode.
This needs to be discouraged as much as putting mongo-db on the controllers. > Currently we have dedicated role 'zabbix-server', which does not support more > than one zabbix-server. Instead of this we will move monitoring solution > (zabbix), > as an additional component. No, this must remain a separate role and can not be forced onto the controllers the user should be discouraged from doing this. The corosync code is quickly becoming granular enough to stand up a CRM cluster elsewhere. > > We will introduce additional role 'zabbix-monitoring', assigned to all > servers with > lowest priority in serializer (run puppet after every other roles) when > zabbix is > enabled. > 'Zabbix-monitoring' role will be assigned automatically. Seems a good way to handle it, but would it run well for a plugin that wants to be monitored (since they run after) > When zabbix component is enabled, we will install zabbix-server on all > controllers > in active-backup mode (pacemaker+haproxy). Again, not forced on controllers, this is very bad. Controllers: While there is development use cases to deploy monitoring on combined controllers, and it can make use of the already existing pacemaker cluster, this is the wrong direction to point users. There are many reasons this is bad: for one, monitoring can become quite loaded, and as we've seen secondary load on the controllers can collapse the entire control plane. Secondly running monitoring on the cluster may also result in the monitoring going offline if the cluster does, from my own experience, not being able to see your monitoring is nearly worse than having everything down and leads to lost precious moments of downtime SLA. HA Scaling: Just like with controllers, our other HA components need to support a scale of 1 to N. This is important as a cluster will need to scale, or as the operator moves from POC to Production, they can deploy more hardware. This also helps alleviate some of the not enough nodes issues mentioned in the thread already -- Andrew Mirantis Ceph community _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev