Re: [openstack-dev] [Fuel-dev] [Openstack-dev] New RA for Galera

2014-05-29 Thread Bartosz Kupidura
Hello,


Wiadomość napisana przez Vladimir Kuklin vkuk...@mirantis.com w dniu 29 maj 
2014, o godz. 12:09:

 may be the problem is that you are using liftetime crm attributes instead of 
 'reboot' ones. shadow/commit is used by us because we need transactional 
 behaviour in some cases. if you turn crm_shadow off, then you will experience 
 problems with multi-state resources and location/colocation/order 
 constraints. so we need to find a way to make commits transactional. there 
 are two ways:
 1) rewrite corosync providers to use crm_diff command and apply it instead of 
 shadow commit that can swallow cluster attributes sometimes

In PoC i removed all cs_commit/cs_shadow, and looks that everything is working. 
But as you says, this can lead to problems with more complicated deployments.
This need to be verified.

 2) store 'reboot' attributes instead of lifetime ones

I test with —lifetime forever and reboot. No difference for cs_commit/cs_shadow 
fail.

Moreover we need method to store GTID permanent (to support whole cluster 
reboot). 
If we want to stick to cs_commit/cs_shadow, we need other method to store GTID 
than crm_attribute.

 
 
 
 On Thu, May 29, 2014 at 12:42 PM, Bogdan Dobrelya bdobre...@mirantis.com 
 wrote:
 On 05/27/14 16:44, Bartosz Kupidura wrote:
  Hello,
  Responses inline.
 
 
  Wiadomość napisana przez Vladimir Kuklin vkuk...@mirantis.com w dniu 27 
  maj 2014, o godz. 15:12:
 
  Hi, Bartosz
 
  First of all, we are using openstack-dev for such discussions.
 
  Second, there is also Percona's RA for Percona XtraDB Cluster, which looks 
  like pretty similar, although it is written in Perl. May be we could 
  derive something useful from it.
 
  Next, if you are working on this stuff, let's make it as open for the 
  community as possible. There is a blueprint for Galera OCF script: 
  https://blueprints.launchpad.net/fuel/+spec/reliable-galera-ocf-script. It 
  would be awesome if you wrote down the specification and sent  newer 
  galera ocf code change request to fuel-library gerrit.
 
  Sure, I will update this blueprint.
  Change request in fuel-library: https://review.openstack.org/#/c/95764/
 
 That is a really nice catch, Bartosz, thank you. I believe we should
 review the new OCF script thoroughly and consider omitting
 cs_commits/cs_shadows as well. What would be the downsides?
 
 
 
  Speaking of crm_attribute stuff. I am very surprised that you are saying 
  that node attributes are altered by crm shadow commit. We are using 
  similar approach in our scripts and have never faced this issue.
 
  This is probably because you update crm_attribute very rarely. And with my 
  approach GTID attribute is updated every 60s on every node (3 updates in 
  60s, in standard HA setup).
 
  You can try to update any attribute in loop during deploying cluster to 
  trigger fail with corosync diff.
 
 It sounds reasonable and we should verify it.
 I've updated the statuses for related bugs and attached them to the
 aforementioned blueprint as well:
 https://bugs.launchpad.net/fuel/+bug/1283062/comments/7
 https://bugs.launchpad.net/fuel/+bug/1281592/comments/6
 
 
 
 
  Corosync 2.x support is in our roadmap, but we are not sure that we will 
  use Corosync 2.x earlier than 6.x release series start.
 
  Yeah, moreover corosync CMAP is not synced between cluster nodes (or maybe 
  im doing something wrong?). So we need other solution for this...
 
 
 We should use CMAN for Corosync 1.x, perhaps.
 
 
 
  On Tue, May 27, 2014 at 3:08 PM, Bartosz Kupidura bkupid...@mirantis.com 
  wrote:
  Hello guys!
  I would like to start discussion on a new resource agent for 
  galera/pacemaker.
 
  Main features:
  * Support cluster boostrap
  * Support reboot any node in cluster
  * Support reboot whole cluster
  * To determine which node have latest DB version, we should use galera 
  GTID (Global Transaction ID)
  * Node with latest GTID is galera PC (primary component) in case of 
  reelection
  * Administrator can manually set node as PC
 
  GTID:
  * get GTID from mysqld --wsrep-recover or SQL query 'SHOW STATUS LIKE 
  ‚wsrep_local_state_uuid''
  * store GTID as crm_attribute for node (crm_attribute --node $HOSTNAME 
  --lifetime $LIFETIME --name gtid --update $GTID)
  * on every monitor/stop/start action update GTID for given node
  * GTID can have 3 format:
   - ----:123 - standard cluster-id:commit-id
   - ----:-1 - standard non initialized 
  cluster, ----:-1
   - ----:INF - commit-id manually set to 
  INF, force RA to create new cluster, with master on given node
 
  Check if reelection of PC is needed:
  * (node is located in partition with quorum OR we have only 1 node 
  configured in cluster) AND galera resource is not running on any node
  * GTID is manually set to INF on given node
 
  Check if given node is PC:
  * have highest GTID in cluster

Re: [openstack-dev] [Fuel-dev] [Openstack-dev] New RA for Galera

2014-06-02 Thread Bartosz Kupidura
Vladimir,


Wiadomość napisana przez Vladimir Kuklin vkuk...@mirantis.com w dniu 2 cze 
2014, o godz. 13:49:

 Bartosz, if you look into what Percona guys are doing - you will see here: 
 https://github.com/percona/percona-pacemaker-agents/blob/new_pxc_ra/agents/pxc_resource_agent#L516
  that they first try to use MySQL and then to get GTID from grastate.dat. 
 Also, I am wondering if you are using cluster-wide attributes instead of 
 node-attributes. If you use node-scoped attributes, then shadow/commit 
 commands should not affect anything.

PoC RA get GTID from sql (SHOW STATUS LIKE ‚wsrep_last_committed’) if MySQL is 
running, in other case RA start mysqld with --wsrep-recover. I skipped 
grastate.dat because in all my test this file had commit_id set to -1.

In PoC i use only node-attributes (crm_attribute --node $HOSTNAME --lifetime 
forever --name gtid --update $GTID).

 
 
 On Mon, Jun 2, 2014 at 2:34 PM, Bogdan Dobrelya bdobre...@mirantis.com 
 wrote:
 On 05/29/2014 02:06 PM, Bartosz Kupidura wrote:
  Hello,
 
 
  Wiadomość napisana przez Vladimir Kuklin vkuk...@mirantis.com w dniu 29 
  maj 2014, o godz. 12:09:
 
  may be the problem is that you are using liftetime crm attributes instead 
  of 'reboot' ones. shadow/commit is used by us because we need 
  transactional behaviour in some cases. if you turn crm_shadow off, then 
  you will experience problems with multi-state resources and 
  location/colocation/order constraints. so we need to find a way to make 
  commits transactional. there are two ways:
  1) rewrite corosync providers to use crm_diff command and apply it instead 
  of shadow commit that can swallow cluster attributes sometimes
 
  In PoC i removed all cs_commit/cs_shadow, and looks that everything is 
  working. But as you says, this can lead to problems with more complicated 
  deployments.
  This need to be verified.
 
  2) store 'reboot' attributes instead of lifetime ones
 
  I test with —lifetime forever and reboot. No difference for 
  cs_commit/cs_shadow fail.
 
  Moreover we need method to store GTID permanent (to support whole cluster 
  reboot).
 
 Please note, GTID could always be fetched from the
 /var/lib/mysql/grastate.dat at the galera node
 
  If we want to stick to cs_commit/cs_shadow, we need other method to store 
  GTID than crm_attribute.
 
 WE could use a modified ocf::pacemaker:SysInfo resource. We could put
 GTID there and use it the similar way as I did for fencing PoC[0] (for
 free space monitoring)
 
 [0]
 https://github.com/bogdando/fuel-library-1/blob/ha_fencing_WIP/deployment/puppet/cluster/manifests/fencing_primitives.pp#L41-L70
 
 
 
 
 
  On Thu, May 29, 2014 at 12:42 PM, Bogdan Dobrelya bdobre...@mirantis.com 
  wrote:
  On 05/27/14 16:44, Bartosz Kupidura wrote:
  Hello,
  Responses inline.
 
 
  Wiadomość napisana przez Vladimir Kuklin vkuk...@mirantis.com w dniu 27 
  maj 2014, o godz. 15:12:
 
  Hi, Bartosz
 
  First of all, we are using openstack-dev for such discussions.
 
  Second, there is also Percona's RA for Percona XtraDB Cluster, which 
  looks like pretty similar, although it is written in Perl. May be we 
  could derive something useful from it.
 
  Next, if you are working on this stuff, let's make it as open for the 
  community as possible. There is a blueprint for Galera OCF script: 
  https://blueprints.launchpad.net/fuel/+spec/reliable-galera-ocf-script. 
  It would be awesome if you wrote down the specification and sent  newer 
  galera ocf code change request to fuel-library gerrit.
 
  Sure, I will update this blueprint.
  Change request in fuel-library: https://review.openstack.org/#/c/95764/
 
  That is a really nice catch, Bartosz, thank you. I believe we should
  review the new OCF script thoroughly and consider omitting
  cs_commits/cs_shadows as well. What would be the downsides?
 
 
 
  Speaking of crm_attribute stuff. I am very surprised that you are saying 
  that node attributes are altered by crm shadow commit. We are using 
  similar approach in our scripts and have never faced this issue.
 
  This is probably because you update crm_attribute very rarely. And with 
  my approach GTID attribute is updated every 60s on every node (3 updates 
  in 60s, in standard HA setup).
 
  You can try to update any attribute in loop during deploying cluster to 
  trigger fail with corosync diff.
 
  It sounds reasonable and we should verify it.
  I've updated the statuses for related bugs and attached them to the
  aforementioned blueprint as well:
  https://bugs.launchpad.net/fuel/+bug/1283062/comments/7
  https://bugs.launchpad.net/fuel/+bug/1281592/comments/6
 
 
 
 
  Corosync 2.x support is in our roadmap, but we are not sure that we will 
  use Corosync 2.x earlier than 6.x release series start.
 
  Yeah, moreover corosync CMAP is not synced between cluster nodes (or 
  maybe im doing something wrong?). So we need other solution for this...
 
 
  We should use CMAN for Corosync 1.x, perhaps.
 
 
 
  On Tue, May 27

[openstack-dev] [FUEL] Zabbix in HA mode

2014-11-25 Thread Bartosz Kupidura
Hello All,

Im working on Zabbix implementation which include HA support.

Zabbix server should be deployed on all controllers in HA mode.

Currently we have dedicated role 'zabbix-server', which does not support more 
than one zabbix-server. Instead of this we will move monitoring solution 
(zabbix), 
as an additional component.

We will introduce additional role 'zabbix-monitoring', assigned to all servers 
with 
lowest priority in serializer (run puppet after every other roles) when zabbix 
is 
enabled.
'Zabbix-monitoring' role will be assigned automatically.

When zabbix component is enabled, we will install zabbix-server on all 
controllers 
in active-backup mode (pacemaker+haproxy).

In next stage, we can allow users to deploy zabbix-server on dedicated node OR
on controllers for performance reasons.
But for now we should force zabbix-server to be deployed on controllers.

BP is in initial phase, but code is ready and working with Fuel 5.1. 
Now im checking if it works with master.

Any comments are welcome!

BP link: https://blueprints.launchpad.net/fuel/+spec/zabbix-ha

Best Regards,
Bartosz Kupidura
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [FUEL] Zabbix in HA mode

2014-11-25 Thread Bartosz Kupidura
Hello Vladimir,
I agree. But in most cases, zabbix-server would be moved from failed node by 
pacemaker. 
Moreover some clients dont want to „waste” 3 additional servers only for 
monitoring.

As i said, this is only first drop of zabbix HA. Later we can allow user to 
deploy zabbix-server
not only on controllers, but also on dedicated nodes. 

Best Regards,
Bartosz Kupidura 


 Wiadomość napisana przez Vladimir Kuklin vkuk...@mirantis.com w dniu 25 lis 
 2014, o godz. 15:47:
 
 Bartosz, 
 
 It is obviously possible to install zabbix on the master nodes and put it 
 under pacemaker control. But it seems very strange for me to monitor 
 something with software located on the nodes that you are monitoring. 
 
 On Tue, Nov 25, 2014 at 4:21 PM, Bartosz Kupidura bkupid...@mirantis.com 
 wrote:
 Hello All,
 
 Im working on Zabbix implementation which include HA support.
 
 Zabbix server should be deployed on all controllers in HA mode.
 
 Currently we have dedicated role 'zabbix-server', which does not support more
 than one zabbix-server. Instead of this we will move monitoring solution 
 (zabbix),
 as an additional component.
 
 We will introduce additional role 'zabbix-monitoring', assigned to all 
 servers with
 lowest priority in serializer (run puppet after every other roles) when 
 zabbix is
 enabled.
 'Zabbix-monitoring' role will be assigned automatically.
 
 When zabbix component is enabled, we will install zabbix-server on all 
 controllers
 in active-backup mode (pacemaker+haproxy).
 
 In next stage, we can allow users to deploy zabbix-server on dedicated node OR
 on controllers for performance reasons.
 But for now we should force zabbix-server to be deployed on controllers.
 
 BP is in initial phase, but code is ready and working with Fuel 5.1.
 Now im checking if it works with master.
 
 Any comments are welcome!
 
 BP link: https://blueprints.launchpad.net/fuel/+spec/zabbix-ha
 
 Best Regards,
 Bartosz Kupidura
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 
 -- 
 Yours Faithfully,
 Vladimir Kuklin,
 Fuel Library Tech Lead,
 Mirantis, Inc.
 +7 (495) 640-49-04
 +7 (926) 702-39-68
 Skype kuklinvv
 45bk3, Vorontsovskaya Str.
 Moscow, Russia,
 www.mirantis.com
 www.mirantis.ru
 vkuk...@mirantis.com
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev