Bartosz, if you look into what Percona guys are doing - you will see here: https://github.com/percona/percona-pacemaker-agents/blob/new_pxc_ra/agents/pxc_resource_agent#L516 that they first try to use MySQL and then to get GTID from grastate.dat. Also, I am wondering if you are using cluster-wide attributes instead of node-attributes. If you use node-scoped attributes, then shadow/commit commands should not affect anything.
On Mon, Jun 2, 2014 at 2:34 PM, Bogdan Dobrelya <bdobre...@mirantis.com> wrote: > On 05/29/2014 02:06 PM, Bartosz Kupidura wrote: > > Hello, > > > > > > Wiadomość napisana przez Vladimir Kuklin <vkuk...@mirantis.com> w dniu > 29 maj 2014, o godz. 12:09: > > > >> may be the problem is that you are using liftetime crm attributes > instead of 'reboot' ones. shadow/commit is used by us because we need > transactional behaviour in some cases. if you turn crm_shadow off, then you > will experience problems with multi-state resources and > location/colocation/order constraints. so we need to find a way to make > commits transactional. there are two ways: > >> 1) rewrite corosync providers to use crm_diff command and apply it > instead of shadow commit that can swallow cluster attributes sometimes > > > > In PoC i removed all cs_commit/cs_shadow, and looks that everything is > working. But as you says, this can lead to problems with more complicated > deployments. > > This need to be verified. > > > >> 2) store 'reboot' attributes instead of lifetime ones > > > > I test with —lifetime forever and reboot. No difference for > cs_commit/cs_shadow fail. > > > > Moreover we need method to store GTID permanent (to support whole > cluster reboot). > > Please note, GTID could always be fetched from the > /var/lib/mysql/grastate.dat at the galera node > > > If we want to stick to cs_commit/cs_shadow, we need other method to > store GTID than crm_attribute. > > WE could use a modified ocf::pacemaker:SysInfo resource. We could put > GTID there and use it the similar way as I did for fencing PoC[0] (for > free space monitoring) > > [0] > > https://github.com/bogdando/fuel-library-1/blob/ha_fencing_WIP/deployment/puppet/cluster/manifests/fencing_primitives.pp#L41-L70 > > > > >> > >> > >> > >> On Thu, May 29, 2014 at 12:42 PM, Bogdan Dobrelya < > bdobre...@mirantis.com> wrote: > >> On 05/27/14 16:44, Bartosz Kupidura wrote: > >>> Hello, > >>> Responses inline. > >>> > >>> > >>> Wiadomość napisana przez Vladimir Kuklin <vkuk...@mirantis.com> w > dniu 27 maj 2014, o godz. 15:12: > >>> > >>>> Hi, Bartosz > >>>> > >>>> First of all, we are using openstack-dev for such discussions. > >>>> > >>>> Second, there is also Percona's RA for Percona XtraDB Cluster, which > looks like pretty similar, although it is written in Perl. May be we could > derive something useful from it. > >>>> > >>>> Next, if you are working on this stuff, let's make it as open for the > community as possible. There is a blueprint for Galera OCF script: > https://blueprints.launchpad.net/fuel/+spec/reliable-galera-ocf-script. > It would be awesome if you wrote down the specification and sent newer > galera ocf code change request to fuel-library gerrit. > >>> > >>> Sure, I will update this blueprint. > >>> Change request in fuel-library: > https://review.openstack.org/#/c/95764/ > >> > >> That is a really nice catch, Bartosz, thank you. I believe we should > >> review the new OCF script thoroughly and consider omitting > >> cs_commits/cs_shadows as well. What would be the downsides? > >> > >>> > >>>> > >>>> Speaking of crm_attribute stuff. I am very surprised that you are > saying that node attributes are altered by crm shadow commit. We are using > similar approach in our scripts and have never faced this issue. > >>> > >>> This is probably because you update crm_attribute very rarely. And > with my approach GTID attribute is updated every 60s on every node (3 > updates in 60s, in standard HA setup). > >>> > >>> You can try to update any attribute in loop during deploying cluster > to trigger fail with corosync diff. > >> > >> It sounds reasonable and we should verify it. > >> I've updated the statuses for related bugs and attached them to the > >> aforementioned blueprint as well: > >> https://bugs.launchpad.net/fuel/+bug/1283062/comments/7 > >> https://bugs.launchpad.net/fuel/+bug/1281592/comments/6 > >> > >> > >>> > >>>> > >>>> Corosync 2.x support is in our roadmap, but we are not sure that we > will use Corosync 2.x earlier than 6.x release series start. > >>> > >>> Yeah, moreover corosync CMAP is not synced between cluster nodes (or > maybe im doing something wrong?). So we need other solution for this... > >>> > >> > >> We should use CMAN for Corosync 1.x, perhaps. > >> > >>>> > >>>> > >>>> On Tue, May 27, 2014 at 3:08 PM, Bartosz Kupidura < > bkupid...@mirantis.com> wrote: > >>>> Hello guys! > >>>> I would like to start discussion on a new resource agent for > galera/pacemaker. > >>>> > >>>> Main features: > >>>> * Support cluster boostrap > >>>> * Support reboot any node in cluster > >>>> * Support reboot whole cluster > >>>> * To determine which node have latest DB version, we should use > galera GTID (Global Transaction ID) > >>>> * Node with latest GTID is galera PC (primary component) in case of > reelection > >>>> * Administrator can manually set node as PC > >>>> > >>>> GTID: > >>>> * get GTID from mysqld --wsrep-recover or SQL query 'SHOW STATUS LIKE > ‚wsrep_local_state_uuid'' > >>>> * store GTID as crm_attribute for node (crm_attribute --node > $HOSTNAME --lifetime $LIFETIME --name gtid --update $GTID) > >>>> * on every monitor/stop/start action update GTID for given node > >>>> * GTID can have 3 format: > >>>> - XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX:123 - standard > cluster-id:commit-id > >>>> - XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX:-1 - standard non initialized > cluster, 00000000-0000-0000-0000-000000000000:-1 > >>>> - XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX:INF - commit-id manually set > to INF, force RA to create new cluster, with master on given node > >>>> > >>>> Check if reelection of PC is needed: > >>>> * (node is located in partition with quorum OR we have only 1 node > configured in cluster) AND galera resource is not running on any node > >>>> * GTID is manually set to INF on given node > >>>> > >>>> Check if given node is PC: > >>>> * have highest GTID in cluster, in case we have more than one node > with „highest” GTID, we use CRC32 to choose proper PC. > >>>> * GTID is manually set to INF > >>>> * in case node with highest GTID will not come back after cluster > reboot (for example disk failure) administrator should set GTID to INF on > other node > >>>> > >>>> I have almost ready RA: http://zynzel.spof.pl/mysql-wss > >>>> > >>>> Tested with vanila centos galera/pacemaker/corosync - OK > >>>> Tested with Fuel 4.1 - Fail > >>>> > >>>> > >>>> Fuel 4.1 with that RA will not deploy correctly, because we use > crm_attribute to store GTID, and in manifest we use cs_shadow/cs_commit for > every pacemaker resource. > >>>> This lead to cs_commit problem with different configuration in shadow > copy and running configuration (running config changed by RA). > >>>> "Could not commit shadow instance [..] to the CIB: Application of an > update diff failed” > >>>> > >>>> To solve this we can go in 2 ways: > >>>> 1) dont use cs_commit/cs_shadow in manifests > >>>> 2) store GTID in other way than crm_attribute > >>>> > >>>> IMHO 2) is better (less invasive) and we can store GTID in corosync > CMAP (http://www.polarhome.com/service/man/generic.php?qf=corosync-cmapctl), > but this require corosync 2.X > >>>> > >>>> > >>>> -- > >>>> Mailing list: https://launchpad.net/~fuel-dev > >>>> Post to : fuel-...@lists.launchpad.net > >>>> Unsubscribe : https://launchpad.net/~fuel-dev > >>>> More help : https://help.launchpad.net/ListHelp > >>>> > >>>> > >>>> > >>>> -- > >>>> Yours Faithfully, > >>>> Vladimir Kuklin, > >>>> Fuel Library Tech Lead, > >>>> Mirantis, Inc. > >>>> +7 (495) 640-49-04 > >>>> +7 (926) 702-39-68 > >>>> Skype kuklinvv > >>>> 45bk3, Vorontsovskaya Str. > >>>> Moscow, Russia, > >>>> www.mirantis.com > >>>> www.mirantis.ru > >>>> vkuk...@mirantis.com > >>> > >>> > >> > >> > >> -- > >> Best regards, > >> Bogdan Dobrelya, > >> Skype #bogdando_at_yahoo.com > >> Irc #bogdando > >> > >> > >> > >> -- > >> Yours Faithfully, > >> Vladimir Kuklin, > >> Fuel Library Tech Lead, > >> Mirantis, Inc. > >> +7 (495) 640-49-04 > >> +7 (926) 702-39-68 > >> Skype kuklinvv > >> 45bk3, Vorontsovskaya Str. > >> Moscow, Russia, > >> www.mirantis.com > >> www.mirantis.ru > >> vkuk...@mirantis.com > > > > > > > > > -- > Best regards, > Bogdan Dobrelya, > Skype #bogdando_at_yahoo.com > Irc #bogdando > -- Yours Faithfully, Vladimir Kuklin, Fuel Library Tech Lead, Mirantis, Inc. +7 (495) 640-49-04 +7 (926) 702-39-68 Skype kuklinvv 45bk3, Vorontsovskaya Str. Moscow, Russia, www.mirantis.com <http://www.mirantis.ru/> www.mirantis.ru vkuk...@mirantis.com
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev