Re: [openstack-dev] [Fuel-dev] [Openstack-dev] New RA for Galera
Hello, Wiadomość napisana przez Vladimir Kuklin vkuk...@mirantis.com w dniu 29 maj 2014, o godz. 12:09: may be the problem is that you are using liftetime crm attributes instead of 'reboot' ones. shadow/commit is used by us because we need transactional behaviour in some cases. if you turn crm_shadow off, then you will experience problems with multi-state resources and location/colocation/order constraints. so we need to find a way to make commits transactional. there are two ways: 1) rewrite corosync providers to use crm_diff command and apply it instead of shadow commit that can swallow cluster attributes sometimes In PoC i removed all cs_commit/cs_shadow, and looks that everything is working. But as you says, this can lead to problems with more complicated deployments. This need to be verified. 2) store 'reboot' attributes instead of lifetime ones I test with —lifetime forever and reboot. No difference for cs_commit/cs_shadow fail. Moreover we need method to store GTID permanent (to support whole cluster reboot). If we want to stick to cs_commit/cs_shadow, we need other method to store GTID than crm_attribute. On Thu, May 29, 2014 at 12:42 PM, Bogdan Dobrelya bdobre...@mirantis.com wrote: On 05/27/14 16:44, Bartosz Kupidura wrote: Hello, Responses inline. Wiadomość napisana przez Vladimir Kuklin vkuk...@mirantis.com w dniu 27 maj 2014, o godz. 15:12: Hi, Bartosz First of all, we are using openstack-dev for such discussions. Second, there is also Percona's RA for Percona XtraDB Cluster, which looks like pretty similar, although it is written in Perl. May be we could derive something useful from it. Next, if you are working on this stuff, let's make it as open for the community as possible. There is a blueprint for Galera OCF script: https://blueprints.launchpad.net/fuel/+spec/reliable-galera-ocf-script. It would be awesome if you wrote down the specification and sent newer galera ocf code change request to fuel-library gerrit. Sure, I will update this blueprint. Change request in fuel-library: https://review.openstack.org/#/c/95764/ That is a really nice catch, Bartosz, thank you. I believe we should review the new OCF script thoroughly and consider omitting cs_commits/cs_shadows as well. What would be the downsides? Speaking of crm_attribute stuff. I am very surprised that you are saying that node attributes are altered by crm shadow commit. We are using similar approach in our scripts and have never faced this issue. This is probably because you update crm_attribute very rarely. And with my approach GTID attribute is updated every 60s on every node (3 updates in 60s, in standard HA setup). You can try to update any attribute in loop during deploying cluster to trigger fail with corosync diff. It sounds reasonable and we should verify it. I've updated the statuses for related bugs and attached them to the aforementioned blueprint as well: https://bugs.launchpad.net/fuel/+bug/1283062/comments/7 https://bugs.launchpad.net/fuel/+bug/1281592/comments/6 Corosync 2.x support is in our roadmap, but we are not sure that we will use Corosync 2.x earlier than 6.x release series start. Yeah, moreover corosync CMAP is not synced between cluster nodes (or maybe im doing something wrong?). So we need other solution for this... We should use CMAN for Corosync 1.x, perhaps. On Tue, May 27, 2014 at 3:08 PM, Bartosz Kupidura bkupid...@mirantis.com wrote: Hello guys! I would like to start discussion on a new resource agent for galera/pacemaker. Main features: * Support cluster boostrap * Support reboot any node in cluster * Support reboot whole cluster * To determine which node have latest DB version, we should use galera GTID (Global Transaction ID) * Node with latest GTID is galera PC (primary component) in case of reelection * Administrator can manually set node as PC GTID: * get GTID from mysqld --wsrep-recover or SQL query 'SHOW STATUS LIKE ‚wsrep_local_state_uuid'' * store GTID as crm_attribute for node (crm_attribute --node $HOSTNAME --lifetime $LIFETIME --name gtid --update $GTID) * on every monitor/stop/start action update GTID for given node * GTID can have 3 format: - ----:123 - standard cluster-id:commit-id - ----:-1 - standard non initialized cluster, ----:-1 - ----:INF - commit-id manually set to INF, force RA to create new cluster, with master on given node Check if reelection of PC is needed: * (node is located in partition with quorum OR we have only 1 node configured in cluster) AND galera resource is not running on any node * GTID is manually set to INF on given node Check if given node is PC: * have highest GTID in cluster
Re: [openstack-dev] [Fuel-dev] [Openstack-dev] New RA for Galera
Vladimir, Wiadomość napisana przez Vladimir Kuklin vkuk...@mirantis.com w dniu 2 cze 2014, o godz. 13:49: Bartosz, if you look into what Percona guys are doing - you will see here: https://github.com/percona/percona-pacemaker-agents/blob/new_pxc_ra/agents/pxc_resource_agent#L516 that they first try to use MySQL and then to get GTID from grastate.dat. Also, I am wondering if you are using cluster-wide attributes instead of node-attributes. If you use node-scoped attributes, then shadow/commit commands should not affect anything. PoC RA get GTID from sql (SHOW STATUS LIKE ‚wsrep_last_committed’) if MySQL is running, in other case RA start mysqld with --wsrep-recover. I skipped grastate.dat because in all my test this file had commit_id set to -1. In PoC i use only node-attributes (crm_attribute --node $HOSTNAME --lifetime forever --name gtid --update $GTID). On Mon, Jun 2, 2014 at 2:34 PM, Bogdan Dobrelya bdobre...@mirantis.com wrote: On 05/29/2014 02:06 PM, Bartosz Kupidura wrote: Hello, Wiadomość napisana przez Vladimir Kuklin vkuk...@mirantis.com w dniu 29 maj 2014, o godz. 12:09: may be the problem is that you are using liftetime crm attributes instead of 'reboot' ones. shadow/commit is used by us because we need transactional behaviour in some cases. if you turn crm_shadow off, then you will experience problems with multi-state resources and location/colocation/order constraints. so we need to find a way to make commits transactional. there are two ways: 1) rewrite corosync providers to use crm_diff command and apply it instead of shadow commit that can swallow cluster attributes sometimes In PoC i removed all cs_commit/cs_shadow, and looks that everything is working. But as you says, this can lead to problems with more complicated deployments. This need to be verified. 2) store 'reboot' attributes instead of lifetime ones I test with —lifetime forever and reboot. No difference for cs_commit/cs_shadow fail. Moreover we need method to store GTID permanent (to support whole cluster reboot). Please note, GTID could always be fetched from the /var/lib/mysql/grastate.dat at the galera node If we want to stick to cs_commit/cs_shadow, we need other method to store GTID than crm_attribute. WE could use a modified ocf::pacemaker:SysInfo resource. We could put GTID there and use it the similar way as I did for fencing PoC[0] (for free space monitoring) [0] https://github.com/bogdando/fuel-library-1/blob/ha_fencing_WIP/deployment/puppet/cluster/manifests/fencing_primitives.pp#L41-L70 On Thu, May 29, 2014 at 12:42 PM, Bogdan Dobrelya bdobre...@mirantis.com wrote: On 05/27/14 16:44, Bartosz Kupidura wrote: Hello, Responses inline. Wiadomość napisana przez Vladimir Kuklin vkuk...@mirantis.com w dniu 27 maj 2014, o godz. 15:12: Hi, Bartosz First of all, we are using openstack-dev for such discussions. Second, there is also Percona's RA for Percona XtraDB Cluster, which looks like pretty similar, although it is written in Perl. May be we could derive something useful from it. Next, if you are working on this stuff, let's make it as open for the community as possible. There is a blueprint for Galera OCF script: https://blueprints.launchpad.net/fuel/+spec/reliable-galera-ocf-script. It would be awesome if you wrote down the specification and sent newer galera ocf code change request to fuel-library gerrit. Sure, I will update this blueprint. Change request in fuel-library: https://review.openstack.org/#/c/95764/ That is a really nice catch, Bartosz, thank you. I believe we should review the new OCF script thoroughly and consider omitting cs_commits/cs_shadows as well. What would be the downsides? Speaking of crm_attribute stuff. I am very surprised that you are saying that node attributes are altered by crm shadow commit. We are using similar approach in our scripts and have never faced this issue. This is probably because you update crm_attribute very rarely. And with my approach GTID attribute is updated every 60s on every node (3 updates in 60s, in standard HA setup). You can try to update any attribute in loop during deploying cluster to trigger fail with corosync diff. It sounds reasonable and we should verify it. I've updated the statuses for related bugs and attached them to the aforementioned blueprint as well: https://bugs.launchpad.net/fuel/+bug/1283062/comments/7 https://bugs.launchpad.net/fuel/+bug/1281592/comments/6 Corosync 2.x support is in our roadmap, but we are not sure that we will use Corosync 2.x earlier than 6.x release series start. Yeah, moreover corosync CMAP is not synced between cluster nodes (or maybe im doing something wrong?). So we need other solution for this... We should use CMAN for Corosync 1.x, perhaps. On Tue, May 27
[openstack-dev] [FUEL] Zabbix in HA mode
Hello All, Im working on Zabbix implementation which include HA support. Zabbix server should be deployed on all controllers in HA mode. Currently we have dedicated role 'zabbix-server', which does not support more than one zabbix-server. Instead of this we will move monitoring solution (zabbix), as an additional component. We will introduce additional role 'zabbix-monitoring', assigned to all servers with lowest priority in serializer (run puppet after every other roles) when zabbix is enabled. 'Zabbix-monitoring' role will be assigned automatically. When zabbix component is enabled, we will install zabbix-server on all controllers in active-backup mode (pacemaker+haproxy). In next stage, we can allow users to deploy zabbix-server on dedicated node OR on controllers for performance reasons. But for now we should force zabbix-server to be deployed on controllers. BP is in initial phase, but code is ready and working with Fuel 5.1. Now im checking if it works with master. Any comments are welcome! BP link: https://blueprints.launchpad.net/fuel/+spec/zabbix-ha Best Regards, Bartosz Kupidura ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [FUEL] Zabbix in HA mode
Hello Vladimir, I agree. But in most cases, zabbix-server would be moved from failed node by pacemaker. Moreover some clients dont want to „waste” 3 additional servers only for monitoring. As i said, this is only first drop of zabbix HA. Later we can allow user to deploy zabbix-server not only on controllers, but also on dedicated nodes. Best Regards, Bartosz Kupidura Wiadomość napisana przez Vladimir Kuklin vkuk...@mirantis.com w dniu 25 lis 2014, o godz. 15:47: Bartosz, It is obviously possible to install zabbix on the master nodes and put it under pacemaker control. But it seems very strange for me to monitor something with software located on the nodes that you are monitoring. On Tue, Nov 25, 2014 at 4:21 PM, Bartosz Kupidura bkupid...@mirantis.com wrote: Hello All, Im working on Zabbix implementation which include HA support. Zabbix server should be deployed on all controllers in HA mode. Currently we have dedicated role 'zabbix-server', which does not support more than one zabbix-server. Instead of this we will move monitoring solution (zabbix), as an additional component. We will introduce additional role 'zabbix-monitoring', assigned to all servers with lowest priority in serializer (run puppet after every other roles) when zabbix is enabled. 'Zabbix-monitoring' role will be assigned automatically. When zabbix component is enabled, we will install zabbix-server on all controllers in active-backup mode (pacemaker+haproxy). In next stage, we can allow users to deploy zabbix-server on dedicated node OR on controllers for performance reasons. But for now we should force zabbix-server to be deployed on controllers. BP is in initial phase, but code is ready and working with Fuel 5.1. Now im checking if it works with master. Any comments are welcome! BP link: https://blueprints.launchpad.net/fuel/+spec/zabbix-ha Best Regards, Bartosz Kupidura ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Yours Faithfully, Vladimir Kuklin, Fuel Library Tech Lead, Mirantis, Inc. +7 (495) 640-49-04 +7 (926) 702-39-68 Skype kuklinvv 45bk3, Vorontsovskaya Str. Moscow, Russia, www.mirantis.com www.mirantis.ru vkuk...@mirantis.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev