Re: [openstack-dev] [FUEL] Zabbix in HA mode

2014-11-25 Thread Bartosz Kupidura
Hello Vladimir,
I agree. But in most cases, zabbix-server would be moved from failed node by 
pacemaker. 
Moreover some clients dont want to „waste” 3 additional servers only for 
monitoring.

As i said, this is only first drop of zabbix HA. Later we can allow user to 
deploy zabbix-server
not only on controllers, but also on dedicated nodes. 

Best Regards,
Bartosz Kupidura 


> Wiadomość napisana przez Vladimir Kuklin  w dniu 25 lis 
> 2014, o godz. 15:47:
> 
> Bartosz, 
> 
> It is obviously possible to install zabbix on the master nodes and put it 
> under pacemaker control. But it seems very strange for me to monitor 
> something with software located on the nodes that you are monitoring. 
> 
> On Tue, Nov 25, 2014 at 4:21 PM, Bartosz Kupidura  
> wrote:
> Hello All,
> 
> Im working on Zabbix implementation which include HA support.
> 
> Zabbix server should be deployed on all controllers in HA mode.
> 
> Currently we have dedicated role 'zabbix-server', which does not support more
> than one zabbix-server. Instead of this we will move monitoring solution 
> (zabbix),
> as an additional component.
> 
> We will introduce additional role 'zabbix-monitoring', assigned to all 
> servers with
> lowest priority in serializer (run puppet after every other roles) when 
> zabbix is
> enabled.
> 'Zabbix-monitoring' role will be assigned automatically.
> 
> When zabbix component is enabled, we will install zabbix-server on all 
> controllers
> in active-backup mode (pacemaker+haproxy).
> 
> In next stage, we can allow users to deploy zabbix-server on dedicated node OR
> on controllers for performance reasons.
> But for now we should force zabbix-server to be deployed on controllers.
> 
> BP is in initial phase, but code is ready and working with Fuel 5.1.
> Now im checking if it works with master.
> 
> Any comments are welcome!
> 
> BP link: https://blueprints.launchpad.net/fuel/+spec/zabbix-ha
> 
> Best Regards,
> Bartosz Kupidura
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> -- 
> Yours Faithfully,
> Vladimir Kuklin,
> Fuel Library Tech Lead,
> Mirantis, Inc.
> +7 (495) 640-49-04
> +7 (926) 702-39-68
> Skype kuklinvv
> 45bk3, Vorontsovskaya Str.
> Moscow, Russia,
> www.mirantis.com
> www.mirantis.ru
> vkuk...@mirantis.com
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [FUEL] Zabbix in HA mode

2014-11-25 Thread Bartosz Kupidura
Hello All,

Im working on Zabbix implementation which include HA support.

Zabbix server should be deployed on all controllers in HA mode.

Currently we have dedicated role 'zabbix-server', which does not support more 
than one zabbix-server. Instead of this we will move monitoring solution 
(zabbix), 
as an additional component.

We will introduce additional role 'zabbix-monitoring', assigned to all servers 
with 
lowest priority in serializer (run puppet after every other roles) when zabbix 
is 
enabled.
'Zabbix-monitoring' role will be assigned automatically.

When zabbix component is enabled, we will install zabbix-server on all 
controllers 
in active-backup mode (pacemaker+haproxy).

In next stage, we can allow users to deploy zabbix-server on dedicated node OR
on controllers for performance reasons.
But for now we should force zabbix-server to be deployed on controllers.

BP is in initial phase, but code is ready and working with Fuel 5.1. 
Now im checking if it works with master.

Any comments are welcome!

BP link: https://blueprints.launchpad.net/fuel/+spec/zabbix-ha

Best Regards,
Bartosz Kupidura
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Fuel-dev] [Openstack-dev] New RA for Galera

2014-06-02 Thread Bartosz Kupidura
Vladimir,


Wiadomość napisana przez Vladimir Kuklin  w dniu 2 cze 
2014, o godz. 13:49:

> Bartosz, if you look into what Percona guys are doing - you will see here: 
> https://github.com/percona/percona-pacemaker-agents/blob/new_pxc_ra/agents/pxc_resource_agent#L516
>  that they first try to use MySQL and then to get GTID from grastate.dat. 
> Also, I am wondering if you are using cluster-wide attributes instead of 
> node-attributes. If you use node-scoped attributes, then shadow/commit 
> commands should not affect anything.

PoC RA get GTID from sql (SHOW STATUS LIKE ‚wsrep_last_committed’) if MySQL is 
running, in other case RA start mysqld with --wsrep-recover. I skipped 
grastate.dat because in all my test this file had commit_id set to -1.

In PoC i use only node-attributes (crm_attribute --node $HOSTNAME --lifetime 
forever --name gtid --update $GTID).

> 
> 
> On Mon, Jun 2, 2014 at 2:34 PM, Bogdan Dobrelya  
> wrote:
> On 05/29/2014 02:06 PM, Bartosz Kupidura wrote:
> > Hello,
> >
> >
> > Wiadomość napisana przez Vladimir Kuklin  w dniu 29 
> > maj 2014, o godz. 12:09:
> >
> >> may be the problem is that you are using liftetime crm attributes instead 
> >> of 'reboot' ones. shadow/commit is used by us because we need 
> >> transactional behaviour in some cases. if you turn crm_shadow off, then 
> >> you will experience problems with multi-state resources and 
> >> location/colocation/order constraints. so we need to find a way to make 
> >> commits transactional. there are two ways:
> >> 1) rewrite corosync providers to use crm_diff command and apply it instead 
> >> of shadow commit that can swallow cluster attributes sometimes
> >
> > In PoC i removed all cs_commit/cs_shadow, and looks that everything is 
> > working. But as you says, this can lead to problems with more complicated 
> > deployments.
> > This need to be verified.
> >
> >> 2) store 'reboot' attributes instead of lifetime ones
> >
> > I test with —lifetime forever and reboot. No difference for 
> > cs_commit/cs_shadow fail.
> >
> > Moreover we need method to store GTID permanent (to support whole cluster 
> > reboot).
> 
> Please note, GTID could always be fetched from the
> /var/lib/mysql/grastate.dat at the galera node
> 
> > If we want to stick to cs_commit/cs_shadow, we need other method to store 
> > GTID than crm_attribute.
> 
> WE could use a modified ocf::pacemaker:SysInfo resource. We could put
> GTID there and use it the similar way as I did for fencing PoC[0] (for
> free space monitoring)
> 
> [0]
> https://github.com/bogdando/fuel-library-1/blob/ha_fencing_WIP/deployment/puppet/cluster/manifests/fencing_primitives.pp#L41-L70
> 
> >
> >>
> >>
> >>
> >> On Thu, May 29, 2014 at 12:42 PM, Bogdan Dobrelya  
> >> wrote:
> >> On 05/27/14 16:44, Bartosz Kupidura wrote:
> >>> Hello,
> >>> Responses inline.
> >>>
> >>>
> >>> Wiadomość napisana przez Vladimir Kuklin  w dniu 27 
> >>> maj 2014, o godz. 15:12:
> >>>
> >>>> Hi, Bartosz
> >>>>
> >>>> First of all, we are using openstack-dev for such discussions.
> >>>>
> >>>> Second, there is also Percona's RA for Percona XtraDB Cluster, which 
> >>>> looks like pretty similar, although it is written in Perl. May be we 
> >>>> could derive something useful from it.
> >>>>
> >>>> Next, if you are working on this stuff, let's make it as open for the 
> >>>> community as possible. There is a blueprint for Galera OCF script: 
> >>>> https://blueprints.launchpad.net/fuel/+spec/reliable-galera-ocf-script. 
> >>>> It would be awesome if you wrote down the specification and sent  newer 
> >>>> galera ocf code change request to fuel-library gerrit.
> >>>
> >>> Sure, I will update this blueprint.
> >>> Change request in fuel-library: https://review.openstack.org/#/c/95764/
> >>
> >> That is a really nice catch, Bartosz, thank you. I believe we should
> >> review the new OCF script thoroughly and consider omitting
> >> cs_commits/cs_shadows as well. What would be the downsides?
> >>
> >>>
> >>>>
> >>>> Speaking of crm_attribute stuff. I am very surprised that you are saying 
> >>>> that node attributes are altered by crm shadow commit. We are using 
> >>>> similar approach in our scripts and have never faced t

Re: [openstack-dev] [Fuel-dev] [Openstack-dev] New RA for Galera

2014-05-29 Thread Bartosz Kupidura
Hello,


Wiadomość napisana przez Vladimir Kuklin  w dniu 29 maj 
2014, o godz. 12:09:

> may be the problem is that you are using liftetime crm attributes instead of 
> 'reboot' ones. shadow/commit is used by us because we need transactional 
> behaviour in some cases. if you turn crm_shadow off, then you will experience 
> problems with multi-state resources and location/colocation/order 
> constraints. so we need to find a way to make commits transactional. there 
> are two ways:
> 1) rewrite corosync providers to use crm_diff command and apply it instead of 
> shadow commit that can swallow cluster attributes sometimes

In PoC i removed all cs_commit/cs_shadow, and looks that everything is working. 
But as you says, this can lead to problems with more complicated deployments.
This need to be verified.

> 2) store 'reboot' attributes instead of lifetime ones

I test with —lifetime forever and reboot. No difference for cs_commit/cs_shadow 
fail.

Moreover we need method to store GTID permanent (to support whole cluster 
reboot). 
If we want to stick to cs_commit/cs_shadow, we need other method to store GTID 
than crm_attribute.

> 
> 
> 
> On Thu, May 29, 2014 at 12:42 PM, Bogdan Dobrelya  
> wrote:
> On 05/27/14 16:44, Bartosz Kupidura wrote:
> > Hello,
> > Responses inline.
> >
> >
> > Wiadomość napisana przez Vladimir Kuklin  w dniu 27 
> > maj 2014, o godz. 15:12:
> >
> >> Hi, Bartosz
> >>
> >> First of all, we are using openstack-dev for such discussions.
> >>
> >> Second, there is also Percona's RA for Percona XtraDB Cluster, which looks 
> >> like pretty similar, although it is written in Perl. May be we could 
> >> derive something useful from it.
> >>
> >> Next, if you are working on this stuff, let's make it as open for the 
> >> community as possible. There is a blueprint for Galera OCF script: 
> >> https://blueprints.launchpad.net/fuel/+spec/reliable-galera-ocf-script. It 
> >> would be awesome if you wrote down the specification and sent  newer 
> >> galera ocf code change request to fuel-library gerrit.
> >
> > Sure, I will update this blueprint.
> > Change request in fuel-library: https://review.openstack.org/#/c/95764/
> 
> That is a really nice catch, Bartosz, thank you. I believe we should
> review the new OCF script thoroughly and consider omitting
> cs_commits/cs_shadows as well. What would be the downsides?
> 
> >
> >>
> >> Speaking of crm_attribute stuff. I am very surprised that you are saying 
> >> that node attributes are altered by crm shadow commit. We are using 
> >> similar approach in our scripts and have never faced this issue.
> >
> > This is probably because you update crm_attribute very rarely. And with my 
> > approach GTID attribute is updated every 60s on every node (3 updates in 
> > 60s, in standard HA setup).
> >
> > You can try to update any attribute in loop during deploying cluster to 
> > trigger fail with corosync diff.
> 
> It sounds reasonable and we should verify it.
> I've updated the statuses for related bugs and attached them to the
> aforementioned blueprint as well:
> https://bugs.launchpad.net/fuel/+bug/1283062/comments/7
> https://bugs.launchpad.net/fuel/+bug/1281592/comments/6
> 
> 
> >
> >>
> >> Corosync 2.x support is in our roadmap, but we are not sure that we will 
> >> use Corosync 2.x earlier than 6.x release series start.
> >
> > Yeah, moreover corosync CMAP is not synced between cluster nodes (or maybe 
> > im doing something wrong?). So we need other solution for this...
> >
> 
> We should use CMAN for Corosync 1.x, perhaps.
> 
> >>
> >>
> >> On Tue, May 27, 2014 at 3:08 PM, Bartosz Kupidura  
> >> wrote:
> >> Hello guys!
> >> I would like to start discussion on a new resource agent for 
> >> galera/pacemaker.
> >>
> >> Main features:
> >> * Support cluster boostrap
> >> * Support reboot any node in cluster
> >> * Support reboot whole cluster
> >> * To determine which node have latest DB version, we should use galera 
> >> GTID (Global Transaction ID)
> >> * Node with latest GTID is galera PC (primary component) in case of 
> >> reelection
> >> * Administrator can manually set node as PC
> >>
> >> GTID:
> >> * get GTID from mysqld --wsrep-recover or SQL query 'SHOW STATUS LIKE 
> >> ‚wsrep_local_state_uuid''
> >> * store GTID as crm_attribute for node (crm_attribute --node $HOSTNAME 
>