Hi Bogdan,

Thank you for sharing this! I'll need to familiarize myself with this
Jepsen thing, but overall it looks interesting.

As it turns out, we already run Galera in multi-writer mode in Fuel
unintentionally in the case, when the active MySQL node goes down,
HAProxy starts opening connections to a backup, then the active goes
up again, HAProxy starts opening connections to the original MySQL
node, but OpenStack services may still have connections opened to the
backup in their connection pools - so now you may have connections to
multiple MySQL nodes at the same time, exactly what you wanted to
avoid by using active/backup in the HAProxy configuration.

^ this actually leads to an interesting issue [1], when the DB state
committed on one node is not immediately available on another one.
Replication lag can be controlled  via session variables [2], but that
does not always help: e.g. in [1] Nova first goes to Neutron to create
a new floating IP, gets 201 (and Neutron actually *commits* the DB
transaction) and then makes another REST API request to get a list of
floating IPs by address - the latter can be served by another
neutron-server, connected to another Galera node, which does not have
the latest state applied yet due to 'slave lag' - it can happen that
the list will be empty. Unfortunately, 'wsrep_sync_wait' can't help
here, as it's two different REST API requests, potentially served by
two different neutron-server instances.

Basically, you'd need to *always* wait for the latest state to be
applied before executing any queries, which Galera is trying to avoid
for performance reasons.

Thanks,
Roman

[1] https://bugs.launchpad.net/fuel/+bug/1529937
[2] 
http://galeracluster.com/2015/06/achieving-read-after-write-semantics-with-galera/

On Fri, Apr 22, 2016 at 10:42 AM, Bogdan Dobrelya
<bdobre...@mirantis.com> wrote:
> [crossposting to openstack-operators@lists.openstack.org]
>
> Hello.
> I wrote this paper [0] to demonstrate an approach how we can leverage a
> Jepsen framework for QA/CI/CD pipeline for OpenStack projects like Oslo
> (DB) or Trove, Tooz DLM and perhaps for any integration projects which
> rely on distributed systems. Although all tests are yet to be finished,
> results are quite visible, so I better off share early for a review,
> discussion and comments.
>
> I have similar tests done for the RabbitMQ OCF RA clusterers as well,
> although have yet wrote a report.
>
> PS. I'm sorry for so many tags I placed in the topic header, should I've
> used just "all" :) ? Have a nice weekends and take care!
>
> [0] https://goo.gl/VHyIIE
>
> --
> Best regards,
> Bogdan Dobrelya,
> Irc #bogdando
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Reply via email to