Re: [openstack-dev] [oslo.db]A proposal for DB read/write separation

2014-09-04 Thread Jay Pipes

On 09/02/2014 07:15 AM, Duncan Thomas wrote:

On 11 August 2014 19:26, Jay Pipes jaypi...@gmail.com wrote:


The above does not really make sense for MySQL Galera/PXC clusters *if only
Galera nodes are used in the cluster*. Since Galera is synchronously
replicated, there's no real point in segregating writers from readers, IMO.
Better to just spread the write AND read load equally among all Galera
cluster nodes.


Unfortunately it is possible to get bitten by the difference between
'synchronous' and 'virtually synchronous' in practice.


Not in my experience. The thing that has bitten me in practice are 
Galera's lack of support for SELECT FOR UPDATE, which is used 
extensively in some of the OpenStack projects. Instead of taking a 
write-intent lock on one or more record gaps (which is what InnoDB does 
in the case of a SELECT FOR UPDATE on a local node), Galera happily 
replicates DML statements to all other nodes in the cluster. If two of 
those nodes attempt to modify the same row or rows in a table, then the 
working set replication will fail to certify, which results in a 
certification timeout, which is then converted to an InnoDB deadlock error.


It's the difference between hanging around waiting on a local node for 
the transaction that called SELECT FOR UPDATE to complete and release 
the write-intent locks on a set of table rows versus hanging around 
waiting for the InnoDB deadlock/lock timeout to bubble up from the 
working set replication certification (which typically is longer than 
the time taken to lock the rows in a single transaction, and therefore 
causes thundering herd issues with the conductor attempting to retry 
stuff due to the use of the @retry_on_deadlock decorator which is so 
commonly used everywhere)


FWIW, I've cc'd a real expert on the matter. Peter, feel free to 
clarify, contradict, or just ignore me :)


Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo.db]A proposal for DB read/write separation

2014-09-02 Thread Duncan Thomas
On 11 August 2014 19:26, Jay Pipes jaypi...@gmail.com wrote:

 The above does not really make sense for MySQL Galera/PXC clusters *if only
 Galera nodes are used in the cluster*. Since Galera is synchronously
 replicated, there's no real point in segregating writers from readers, IMO.
 Better to just spread the write AND read load equally among all Galera
 cluster nodes.

Unfortunately it is possible to get bitten by the difference between
'synchronous' and 'virtually synchronous' in practice. I'll try to get
somebody more knowledgeable about the details to comment further.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo.db]A proposal for DB read/write separation

2014-08-13 Thread Mike Wilson
Lee,

No problem about mixing up the Mike's, there's a bunch of us out there :-).
What are you are describing here is very much like a spec I wrote for
Nova[1] a couple months ago and then never got back to. At the time I
considered gearing the feature toward oslo.db and I can't remember exactly
why I didn't. I think it probably had more to do with having folks that are
familiar with the problem reviewing code in Nova than anything else.
Anyway, I'd like to revisit this in Kilo or if you see a nice way to
integrate this into oslo.db I'd love to see your proposal.

-Mike

[1] https://review.openstack.org/#/c/93466/


On Sun, Aug 10, 2014 at 10:30 PM, Li Ma skywalker.n...@gmail.com wrote:

  not sure if I said that :).  I know extremely little about galera.

 Hi Mike Bayer, I'm so sorry I mistake you from Mike Wilson in the last
 post. :-) Also, say sorry to Mike Wilson.

  I’d totally guess that Galera would need to first have SELECTs come from
 a slave node, then the moment it sees any kind of DML / writing, it
 transparently switches the rest of the transaction over to a writer node.

 You are totally right.

 
  @transaction.writer
  def read_and_write_something(arg1, arg2, …):
  # …
 
  @transaction.reader
  def only_read_something(arg1, arg2, …):
  # …

 The first approach that I had in mind is the decorator-based method to
 separates read/write ops like what you said. To some degree, it is almost
 the same app-level approach to the master/slave configuration, due to
 transparency to developers. However, as I stated before, the current
 approach is merely used in OpenStack. Decorator is more friendly than
 use_slave_flag or something like that. If ideally transparency cannot be
 achieved, to say the least, decorator-based app-level switching is a great
 improvement, compared with the current implementation.

  OK so Galera would perhaps have some way to make this happen, and that's
 great.

 If any Galera expert here, please correct me. At least in my experiment,
 transactions work in that way.

  this (the word “integrate”, and what does that mean) is really the only
 thing making me nervous.

 Mike, just feel free. What I'd like to do is to add a django-style routing
 method as a plus in oslo.db, like:

 [database]
 # Original master/slave configuration
 master_connection =
 slave_connection =

 # Only Support Synchronous Replication
 enable_auto_routing = True

 [db_cluster]
 master_connection =
 master_connection =
 ...
 slave_connection =
 slave_connection =
 ...

 HOWEVER, I think it needs more investigation, so this is why I'd like to
 put it in the mailing list in the early stage to raise some discussions in
 depth. I'm not a Galera expert. I really appreciate any challenges here.

 Thanks,
 Li Ma


 - Original Message -
 From: Mike Bayer mba...@redhat.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: 星期日, 2014年 8 月 10日 下午 11:57:47
 Subject: Re: [openstack-dev] [oslo.db]A proposal for DB read/write
 separation


 On Aug 10, 2014, at 11:17 AM, Li Ma skywalker.n...@gmail.com wrote:

 
  How about Galera multi-master cluster? As Mike Bayer said, it is
 virtually synchronous by default. It is still possible that outdated rows
 are queried that make results not stable.

 not sure if I said that :).  I know extremely little about galera.


 
 
  Let's move forward to synchronous replication, like Galera with
 causal-reads on. The dominant advantage is that it has consistent
 relational dataset support. The disadvantage are that it uses optimistic
 locking and its performance sucks (also said by Mike Bayer :-). For
 optimistic locking problem, I think it can be dealt with by
 retry-on-deadlock. It's not the topic here.

 I *really* don’t think I said that, because I like optimistic locking, and
 I’ve never used Galera ;).

 Where I am ignorant here is of what exactly occurs if you write some rows
 within a transaction with Galera, then do some reads in that same
 transaction.   I’d totally guess that Galera would need to first have
 SELECTs come from a slave node, then the moment it sees any kind of DML /
 writing, it transparently switches the rest of the transaction over to a
 writer node.   No idea, but it has to be something like that?


 
 
  So, the transparent read/write separation is dependent on such an
 environment. SQLalchemy tutorial provides code sample for it [1]. Besides,
 Mike Bayer also provides a blog post for it [2].

 So this thing with the “django-style routers”, the way that example is, it
 actually would work poorly with a Session that is not in “autocommit” mode,
 assuming you’re working with regular old databases that are doing some
 simple behind-the-scenes replication.   Because again, if you do a flush,
 those rows go to the master, if the transaction is still open, then reading
 from the slaves you won’t see the rows you just inserted.So in reality,
 that example is kind of crappy

Re: [openstack-dev] [oslo.db]A proposal for DB read/write separation

2014-08-11 Thread Jay Pipes

Hi Li, comments inline.

On 08/08/2014 12:03 AM, Li Ma wrote:

Getting a massive amount of information from data storage to be displayed is
where most of the activity happens in OpenStack. The two activities of reading
data and writing (creating, updating and deleting) data are fundamentally
different.

The optimization for these two opposite database activities can be done by
physically separating the databases that service these two different
activities. All the writes go to database servers, which then replicates the
written data to the database server(s) dedicated to servicing the reads.

Currently, AFAIK, many OpenStack deployment in production try to take
advantage of MySQL (includes Percona or MariaDB) multi-master Galera cluster.
It is possible to design and implement a read/write separation schema
for such a DB cluster.


The above does not really make sense for MySQL Galera/PXC clusters *if 
only Galera nodes are used in the cluster*. Since Galera is 
synchronously replicated, there's no real point in segregating writers 
from readers, IMO. Better to just spread the write AND read load equally 
among all Galera cluster nodes.


However, if you have a Galera cluster that then slaves off to one or 
more standard MySQL slaves, then certainly doing writer/reader 
segregation could be useful, especially for directing readers of 
aggregate or report-type data to the read-only slaves.



Actually, OpenStack has a method for read scalability via defining
master_connection and slave_connection in configuration, but this method
lacks of flexibility due to deciding master or slave in the logical
context(code). It's not transparent for application developer.
As a result, it is not widely used in all the OpenStack projects.

So, I'd like to propose a transparent read/write separation method
for oslo.db that every project may happily takes advantage of it
without any code modification.


I've never seen a writer/reader segregation proxy or middleware piece 
that was properly able to send the right reads to the slaves. 
Unfortunately, determining what are the right reads to send to the 
slaves is highly application-dependent, since the application knows when 
it can tolerate slave lags.



Moreover, I'd like to put it in the mailing list in advance to
make sure it is acceptable for oslo.db.


I think oslo.db is not the right place for this. I believe the efforts 
that Mike Wilson has been doing in the slavification blueprints are 
the more appropriate place to add this slave-aware code.


Best,
-jay


I'd appreciate any comments.

br.
Li Ma


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo.db]A proposal for DB read/write separation

2014-08-10 Thread Amrith Kumar
Li Ma, Mike [Wilson | Bayer], and Roman Podoliaka,

 

A similar topic came up in Atlanta at a database panel I participated in. Jay 
Pipes had organized it as part of the ops track and Peter Boros (of Percona) 
and I were on the panel. The issue of what to do about the database under 
OpenStack in the face of high load from such components as, for example 
ceilometer.

 

Splitting reads and writes is a solution that is fraught with challenges as it 
requires the application to know where it wrote, where it should read from, 
what is replication latency, and all of that. At the heart of the issue is that 
you want to scale the database.

 

I had suggested at this panel that those who want to try and solve this problem 
should try the Database Virtualization Engine[1] product from Tesora. In the 
interest of full disclosure, I work for Tesora. 

 

The solution is a simple way to horizontally scale a MySQL (or Percona or 
MariaDB) database across a collection of database servers. It exposes a MySQL 
compatible interface and takes care of all the minutiae of where to store data, 
partitioning it across the various database servers, and executing queries on 
behalf of an application irrespective of the location of the data. It natively 
provides such capabilities as distributed joins, aggregation and sorting. 
Architecturally it is a traditional parallel database built from a collection 
of unmodified MySQL (or variant) databases. 

 

It is open source, and available for free download.[2] 

 

Percona recently tested[3] the DVE product and confirmed that the solution 
provided horizontal scalability and linear (and in some cases better than 
linear) performance improvements[4] with scale. You can get a copy of their 
full test report here.[5] 

 

Ingesting data at very high volume is often an area of considerable pain for 
large systems and in one demonstration of our product, we were required to 
ingest 1 million CDR style records per second. We demonstrated that with just 
15 Amazon RDS servers (m1.xlarge, standard EBS volumes, no provisioned IOPS) 
and two c1.xlarge servers to run the Tesora DVE software, we could in fact 
ingest a sustained stream of over 1 million CDR’s a second![6]

 

To Mike Wilson and Roman’s point, the solution I’m proposing would be entirely 
transparent to the developer and would be something that would be both elastic 
and scalable with the workload placed on it. In addition, standard SQL queries 
will continue to work unmodified, irrespective of which database server 
physically holds a row of data.

 

To Mike Bayer’s point about data distribution and transaction management; yes, 
we handle all the details relating to handling data consistency and providing 
atomic transactions during Insert/Update/Delete operations.

 

As a company, we at Tesora are committed to OpenStack and are significant 
participants in Trove (the database-as-a-service project for OpenStack). You 
can verify this yourself on Stackalytics [7] or [8]. If you would like to 
consider it as a part of your solution to oslo.db, we’d be thrilled to work 
with the OpenStack community to make this work, both from a technical and a 
business/licensing perspective. You can catch most of our dev team on either 
#openstack-trove or #tesora.

 

Some of us from Tesora, Percona and Mirantis are planning an ops panel similar 
to the one at Atlanta, for the Summit in Paris. I would definitely like to meet 
with more of you in Paris and discuss how we address issues of scale in the 
database that powers an OpenStack implementation.

 

Thanks,

 

-amrith

 

--

 

Amrith Kumar, CTO Tesora (www.tesora.com)

 

Twitter: @amrithkumar  

IRC: amrith @freenode 

 

 

[1] http://www.tesora.com/solutions/database-virtualization-engine

[2] http://www.tesora.com/solutions/downloads/products

[3] 
http://www.mysqlperformanceblog.com/2014/06/24/benchmarking-tesoras-database-virtualisation-engine-sysbench/
 

[4] 
http://www.tesora.com/blog/perconas-evaluation-our-database-virtualization-engine

[5] http://resources.tesora.com/site/download/percona-benchmark-whitepaper 

[6] 
http://www.tesora.com/blog/ingesting-over-100-rows-second-mysql-aws-cloud 

[7] http://stackalytics.com/?module=trove-group 
http://stackalytics.com/?module=trove-groupmetric=commits metric=commits

[8] http://stackalytics.com/?module=trove-group 
http://stackalytics.com/?module=trove-groupmetric=marks metric=marks

 

 

 

 

 

From: Mike Wilson [mailto:geekinu...@gmail.com] 
Sent: Friday, August 08, 2014 7:35 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [oslo.db]A proposal for DB read/write separation

 

Li Ma,

 

This is interesting, In general I am in favor of expanding the scope of any 
read/write separation capabilities that we have. I'm not clear what exactly you 
are proposing, hopefully you can answer some of my questions inline. The thing 
I had thought of immediately was detection

Re: [openstack-dev] [oslo.db]A proposal for DB read/write separation

2014-08-10 Thread Mike Bayer

On Aug 10, 2014, at 9:59 AM, Amrith Kumar amr...@tesora.com wrote:

  
 To Mike Bayer’s point about data distribution and transaction management; 
 yes, we handle all the details relating to handling data consistency and 
 providing atomic transactions during Insert/Update/Delete operations.
  
 As a company, we at Tesora are committed to OpenStack and are significant 
 participants in Trove (the database-as-a-service project for OpenStack). You 
 can verify this yourself on Stackalytics [7] or [8]. If you would like to 
 consider it as a part of your solution to oslo.db, we’d be thrilled to work 
 with the OpenStack community to make this work, both from a technical and a 
 business/licensing perspective. You can catch most of our dev team on either 
 #openstack-trove or #tesora.
  
 Some of us from Tesora, Percona and Mirantis are planning an ops panel 
 similar to the one at Atlanta, for the Summit in Paris. I would definitely 
 like to meet with more of you in Paris and discuss how we address issues of 
 scale in the database that powers an OpenStack implementation.


OK well just to be clear, oslo.db is Python code that basically provides 
in-application helpers and patterns to work with databases, primarily through 
SQLAlchemy.   So it’s essentially openstack-specific patterns and recipes on 
top of SQLAlchemy. It has very little to do with the use of special 
database backends that know how to partition among shards and/or master/slaves 
(I thought the original proposal was for master/slave).So the Tesora 
product would be 99% “drop in”, with at most some configurational flags set up 
on the Python side, and everything else being configurational. Since the 
proposal here is for “transparent”, which is taken to mean, “no app changes are 
needed”.   My only point was that, an application-layer reader/writer 
distribution approach would need to work at the level of transactions, not 
statements, and therefore would need to know at transaction start time what the 
nature of the transaction would be (and thus requires some small declaration at 
the top, hence code changes…code changes that I think are a good thing as 
explicit declaration of reader/writer methods up top can be handy in other ways 
too).


  
 Thanks,
  
 -amrith
  
 --
  
 Amrith Kumar, CTO Tesora (www.tesora.com)
  
 Twitter: @amrithkumar 
 IRC: amrith @freenode
  
  
 [1] http://www.tesora.com/solutions/database-virtualization-engine
 [2] http://www.tesora.com/solutions/downloads/products
 [3] 
 http://www.mysqlperformanceblog.com/2014/06/24/benchmarking-tesoras-database-virtualisation-engine-sysbench/
 [4] 
 http://www.tesora.com/blog/perconas-evaluation-our-database-virtualization-engine
 [5] http://resources.tesora.com/site/download/percona-benchmark-whitepaper
 [6] 
 http://www.tesora.com/blog/ingesting-over-100-rows-second-mysql-aws-cloud
 [7] http://stackalytics.com/?module=trove-groupmetric=commits
 [8] http://stackalytics.com/?module=trove-groupmetric=marks
  
  
  
  
  
 From: Mike Wilson [mailto:geekinu...@gmail.com] 
 Sent: Friday, August 08, 2014 7:35 PM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [oslo.db]A proposal for DB read/write separation
  
 Li Ma,
  
 This is interesting, In general I am in favor of expanding the scope of any 
 read/write separation capabilities that we have. I'm not clear what exactly 
 you are proposing, hopefully you can answer some of my questions inline. The 
 thing I had thought of immediately was detection of whether an operation is 
 read or write and integrating that into oslo.db or sqlalchemy. Mike Bayer has 
 some thoughts on that[1] and there are other approaches around that can be 
 copied/learned from. These sorts of things are clear to me and while moving 
 towards more transparency for the developer, still require context. Please, 
 share with us more details on your proposal.
  
 -Mike
  
 [1] 
 http://www.percona.com/doc/percona-xtradb-cluster/5.5/wsrep-system-index.html
 [2] 
 http://techspot.zzzeek.org/2012/01/11/django-style-database-routers-in-sqlalchemy/
  
 
 On Thu, Aug 7, 2014 at 10:03 PM, Li Ma skywalker.n...@gmail.com wrote:
 Getting a massive amount of information from data storage to be displayed is
 where most of the activity happens in OpenStack. The two activities of reading
 data and writing (creating, updating and deleting) data are fundamentally
 different.
 
 The optimization for these two opposite database activities can be done by
 physically separating the databases that service these two different
 activities. All the writes go to database servers, which then replicates the
 written data to the database server(s) dedicated to servicing the reads.
 
 Currently, AFAIK, many OpenStack deployment in production try to take
 advantage of MySQL (includes Percona or MariaDB) multi-master Galera cluster.
 It is possible to design and implement a read/write separation schema
 for such a DB cluster

Re: [openstack-dev] [oslo.db]A proposal for DB read/write separation

2014-08-10 Thread Li Ma
Thanks for all the detailed analysis, Mike W, Mike B, and Roman.
 
For a production-ready database system, replication is a must I think. So, the 
questions are which replication mode is suitable for OpenStack and which way is 
suitable for OpenStack to improve performance and scalability of DB access.

In current implementation of database API in OpenStack, master/slave connection 
is defined for optimizing the performance. Developers of each OpenStack 
component take the responsibility of making use of it in the application 
context and some other guys take the responsibility of architecting database 
system to meet the requirements in various production environments. No general 
guideline for it. Actually, it is not that easy to determine which transaction 
is able to be conducted by slave due to data consistency and business logic for 
different OpenStack components.

The current status is that master/slave configuration is not widely used and 
only Nova uses slave connection in its periodic tasks which are not sensitive 
to the status of replication. Due to the nature of asynchronous replication, 
query to DB is not stable, so the risks of using slaves are apparent.

How about Galera multi-master cluster? As Mike Bayer said, it is virtually 
synchronous by default. It is still possible that outdated rows are queried 
that make results not stable.

When using such eventual consistency methods, you have to carefully design 
which transaction is tolerant of old data. AFAIK, no matter which component is, 
Nova, Cinder or Neutron, most of the transactions are not that 'tolerant'. As 
Mike Bayer said, consistent relational dataset is very important. As a 
footnote, consistent relational dataset is very important for OpenStack 
components. This is why only non-sensitive periodic tasks are using slaves in 
Nova.

Let's move forward to synchronous replication, like Galera with causal-reads 
on. The dominant advantage is that it has consistent relational dataset 
support. The disadvantage are that it uses optimistic locking and its 
performance sucks (also said by Mike Bayer :-). For optimistic locking problem, 
I think it can be dealt with by retry-on-deadlock. It's not the topic here.

If we first ignore the performance-suck problem, multi-master cluster with 
synchronous replication is the perfect for OpenStack with any masters+slaves 
enabled and it can truly scale-out.

So, the transparent read/write separation is dependent on such an environment. 
SQLalchemy tutorial provides code sample for it [1]. Besides, Mike Bayer also 
provides a blog post for it [2].

What I did is to re-implement it in OpenStack DB API modules in my development 
environment, using Galera cluster(causal-reads on). It has been running 
perfectly for more than a week. The routing session manager works well while 
maintaining data consistency.

Back to the performance-suck problem, theoretically causal-reads-on will 
definitely affect the overall performance of concurrent DB reads, but I cannot 
find any report(officially or unofficially) on 
causal-reads-performance-degradation. Actually in the production system of my 
company, the Galera performance is tuned via network round-trip time, network 
throughput, number of slave threads, keep-alive and wsrep flow control 
parameters.

All in all, firstly, transparent read/write separation is feasible using 
synchronous replication method. Secondly, it may help scale-out in large 
deployment without any code modification. Moreover, it needs fine-tuning (Of 
course, every production system needs it :-). Finally, I think if we can 
integrate it into oslo.db, it is a perfect plus for those who would like to 
deploy Galera (or other similar technology) as DB backend.

[1] 
http://docs.sqlalchemy.org/en/rel_0_9/orm/session.html#custom-vertical-partitioning
[2] 
http://techspot.zzzeek.org/2012/01/11/django-style-database-routers-in-sqlalchemy/
[3] Galera replication method: http://galeracluster.com/products/technology/


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo.db]A proposal for DB read/write separation

2014-08-10 Thread Mike Bayer

On Aug 10, 2014, at 11:17 AM, Li Ma skywalker.n...@gmail.com wrote:

 
 How about Galera multi-master cluster? As Mike Bayer said, it is virtually 
 synchronous by default. It is still possible that outdated rows are queried 
 that make results not stable.

not sure if I said that :).  I know extremely little about galera.


 
 
 Let's move forward to synchronous replication, like Galera with causal-reads 
 on. The dominant advantage is that it has consistent relational dataset 
 support. The disadvantage are that it uses optimistic locking and its 
 performance sucks (also said by Mike Bayer :-). For optimistic locking 
 problem, I think it can be dealt with by retry-on-deadlock. It's not the 
 topic here.

I *really* don’t think I said that, because I like optimistic locking, and I’ve 
never used Galera ;).

Where I am ignorant here is of what exactly occurs if you write some rows 
within a transaction with Galera, then do some reads in that same transaction.  
 I’d totally guess that Galera would need to first have SELECTs come from a 
slave node, then the moment it sees any kind of DML / writing, it transparently 
switches the rest of the transaction over to a writer node.   No idea, but it 
has to be something like that?   


 
 
 So, the transparent read/write separation is dependent on such an 
 environment. SQLalchemy tutorial provides code sample for it [1]. Besides, 
 Mike Bayer also provides a blog post for it [2].

So this thing with the “django-style routers”, the way that example is, it 
actually would work poorly with a Session that is not in “autocommit” mode, 
assuming you’re working with regular old databases that are doing some simple 
behind-the-scenes replication.   Because again, if you do a flush, those rows 
go to the master, if the transaction is still open, then reading from the 
slaves you won’t see the rows you just inserted.So in reality, that example 
is kind of crappy, if you’re in a transaction (which we are) you’d really need 
to be doing session.using_bind(“master”) all over the place, and that is 
already way too verbose and hardcoded.   I’m wondering why I didn’t make a huge 
note of that in the post.  The point of that article was more to show that hey, 
you *can* control it at this level if you want to but you need to know what 
you’re doing.

Just to put it out there, this is what I think good high/level master/slave 
separation in the app level (reiterating: *if we want it in the app level at 
all*) should approximately look like:

@transaction.writer
def read_and_write_something(arg1, arg2, …):
# …

@transaction.reader
def only_read_something(arg1, arg2, …):
# …

that way there is no awareness of master/slave anything, the underlying system 
can decide what “reader” and “writer” means.   Do in-app switching between two 
databases, send out some magic signals to some commercial clustering service, 
have the “readers” work in “autocommit” mode, or do nothing, whatever.  The 
code doesn’t decide this imperatively.But it isn’t 100% “transparent”, this 
small amount of declaration per-method is needed.


 
 What I did is to re-implement it in OpenStack DB API modules in my 
 development environment, using Galera cluster(causal-reads on). It has been 
 running perfectly for more than a week. The routing session manager works 
 well while maintaining data consistency.

OK so Galera would perhaps have some way to make this happen, and that’s great. 
   My understanding is that people are running Openstack already with Galera, 
that’s why we’re hitting issues with some of those SELECT..FOR UPDATEs that are 
being replaced with optimistic approaches as you mention. But beyond that 
this isn’t any kind of “change” to oslo.db or anything else.   Run Openstack 
with whatever database backend you want, ideally (that is my primary agenda, 
sorry MySQL vendors!).


 Finally, I think if we can integrate it into oslo.db, it is a perfect plus 
 for those who would like to deploy Galera (or other similar technology) as DB 
 backend.

this (the word “integrate”, and what does that mean) is really the only thing 
making me nervous.  If the integration here is the django blog post I have, 
it’s not going to work with transactions.   Either the system is magical enough 
that a single transaction can read/write from both sources midway and there is 
no “integration” needed, or the transaction has to be declared up front as 
reader or writer.  Or you don’t use transactions except for writers, which is 
essentially the same as “declaration up front”.

 
 [1] 
 http://docs.sqlalchemy.org/en/rel_0_9/orm/session.html#custom-vertical-partitioning
 [2] 
 http://techspot.zzzeek.org/2012/01/11/django-style-database-routers-in-sqlalchemy/
 [3] Galera replication method: http://galeracluster.com/products/technology/
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 

Re: [openstack-dev] [oslo.db]A proposal for DB read/write separation

2014-08-10 Thread Li Ma
 not sure if I said that :).  I know extremely little about galera.

Hi Mike Bayer, I'm so sorry I mistake you from Mike Wilson in the last post. 
:-) Also, say sorry to Mike Wilson.

 I’d totally guess that Galera would need to first have SELECTs come from a 
 slave node, then the moment it sees any kind of DML / writing, it 
 transparently switches the rest of the transaction over to a writer node.

You are totally right.

 
 @transaction.writer
 def read_and_write_something(arg1, arg2, …):
 # …
 
 @transaction.reader
 def only_read_something(arg1, arg2, …):
 # …

The first approach that I had in mind is the decorator-based method to 
separates read/write ops like what you said. To some degree, it is almost the 
same app-level approach to the master/slave configuration, due to transparency 
to developers. However, as I stated before, the current approach is merely used 
in OpenStack. Decorator is more friendly than use_slave_flag or something like 
that. If ideally transparency cannot be achieved, to say the least, 
decorator-based app-level switching is a great improvement, compared with the 
current implementation.

 OK so Galera would perhaps have some way to make this happen, and that's 
 great.

If any Galera expert here, please correct me. At least in my experiment, 
transactions work in that way.

 this (the word “integrate”, and what does that mean) is really the only thing 
 making me nervous.

Mike, just feel free. What I'd like to do is to add a django-style routing 
method as a plus in oslo.db, like:

[database]
# Original master/slave configuration
master_connection = 
slave_connection = 

# Only Support Synchronous Replication
enable_auto_routing = True

[db_cluster]
master_connection = 
master_connection = 
...
slave_connection = 
slave_connection = 
...

HOWEVER, I think it needs more investigation, so this is why I'd like to put it 
in the mailing list in the early stage to raise some discussions in depth. I'm 
not a Galera expert. I really appreciate any challenges here.

Thanks,
Li Ma


- Original Message -
From: Mike Bayer mba...@redhat.com
To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.org
Sent: 星期日, 2014年 8 月 10日 下午 11:57:47
Subject: Re: [openstack-dev] [oslo.db]A proposal for DB read/write separation


On Aug 10, 2014, at 11:17 AM, Li Ma skywalker.n...@gmail.com wrote:

 
 How about Galera multi-master cluster? As Mike Bayer said, it is virtually 
 synchronous by default. It is still possible that outdated rows are queried 
 that make results not stable.

not sure if I said that :).  I know extremely little about galera.


 
 
 Let's move forward to synchronous replication, like Galera with causal-reads 
 on. The dominant advantage is that it has consistent relational dataset 
 support. The disadvantage are that it uses optimistic locking and its 
 performance sucks (also said by Mike Bayer :-). For optimistic locking 
 problem, I think it can be dealt with by retry-on-deadlock. It's not the 
 topic here.

I *really* don’t think I said that, because I like optimistic locking, and I’ve 
never used Galera ;).

Where I am ignorant here is of what exactly occurs if you write some rows 
within a transaction with Galera, then do some reads in that same transaction.  
 I’d totally guess that Galera would need to first have SELECTs come from a 
slave node, then the moment it sees any kind of DML / writing, it transparently 
switches the rest of the transaction over to a writer node.   No idea, but it 
has to be something like that?   


 
 
 So, the transparent read/write separation is dependent on such an 
 environment. SQLalchemy tutorial provides code sample for it [1]. Besides, 
 Mike Bayer also provides a blog post for it [2].

So this thing with the “django-style routers”, the way that example is, it 
actually would work poorly with a Session that is not in “autocommit” mode, 
assuming you’re working with regular old databases that are doing some simple 
behind-the-scenes replication.   Because again, if you do a flush, those rows 
go to the master, if the transaction is still open, then reading from the 
slaves you won’t see the rows you just inserted.So in reality, that example 
is kind of crappy, if you’re in a transaction (which we are) you’d really need 
to be doing session.using_bind(“master”) all over the place, and that is 
already way too verbose and hardcoded.   I’m wondering why I didn’t make a huge 
note of that in the post.  The point of that article was more to show that hey, 
you *can* control it at this level if you want to but you need to know what 
you’re doing.

Just to put it out there, this is what I think good high/level master/slave 
separation in the app level (reiterating: *if we want it in the app level at 
all*) should approximately look like:

@transaction.writer
def read_and_write_something(arg1, arg2, …):
# …

@transaction.reader
def only_read_something(arg1, arg2

Re: [openstack-dev] [oslo.db]A proposal for DB read/write separation

2014-08-08 Thread Roman Podoliaka
Hi Li,

How are you going to make this separation transparent? I mean,
generally, in a function code, you can't know in advance if the
transaction will be read-only or it will contain an
INSERT/UPDATE/DELETE statement. On the other hand, as a developer, you
could analyze the DB queries that can be possibly issued by this
function and mark the function somehow, so that oslo.db would know for
which database connection the transaction should be created, but this
is essentially what slave_connection option is for and how it works
now.

Secondly, as you said, the key thing here is to separate reads and
writes. In order to make reads fast/reduce the load on your 'writable'
database, you'd move reads to asynchronous replicas. But you can't do
this transparently either, as there is a lot of places in our code, in
which we assume we are using the latest state of data, while
asynchronous replicas might actually be a little bit out of date. So,
in case of slave_connection, we use it only when it's ok for the code
to work with outdated rows, i.e. *explicitly* modify the existing
functions to work with slave_connection.

Thanks,
Roman

On Fri, Aug 8, 2014 at 7:03 AM, Li Ma skywalker.n...@gmail.com wrote:
 Getting a massive amount of information from data storage to be displayed is
 where most of the activity happens in OpenStack. The two activities of reading
 data and writing (creating, updating and deleting) data are fundamentally
 different.

 The optimization for these two opposite database activities can be done by
 physically separating the databases that service these two different
 activities. All the writes go to database servers, which then replicates the
 written data to the database server(s) dedicated to servicing the reads.

 Currently, AFAIK, many OpenStack deployment in production try to take
 advantage of MySQL (includes Percona or MariaDB) multi-master Galera cluster.
 It is possible to design and implement a read/write separation schema
 for such a DB cluster.

 Actually, OpenStack has a method for read scalability via defining
 master_connection and slave_connection in configuration, but this method
 lacks of flexibility due to deciding master or slave in the logical
 context(code). It's not transparent for application developer.
 As a result, it is not widely used in all the OpenStack projects.

 So, I'd like to propose a transparent read/write separation method
 for oslo.db that every project may happily takes advantage of it
 without any code modification.

 Moreover, I'd like to put it in the mailing list in advance to
 make sure it is acceptable for oslo.db.

 I'd appreciate any comments.

 br.
 Li Ma


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo.db]A proposal for DB read/write separation

2014-08-08 Thread Mike Bayer

On Aug 8, 2014, at 12:03 AM, Li Ma skywalker.n...@gmail.com wrote:

 
 So, I'd like to propose a transparent read/write separation method 
 for oslo.db that every project may happily takes advantage of it 
 without any code modification.


A single transaction begins, which is to emit a series of INSERT, UPDATE, and 
SELECT statements.   Are you proposing that this system in fact produce two 
separate transactions on two separate backends, and deliver the SELECT 
statements to the slave?   That approach isn’t feasible - SELECTs are part of a 
“write” transaction just as much as the other statements are (as they can be 
SELECTing locally uncommitted data), as they deliver data which is part of the 
transactional context as well as intended for those DML statements.   
Otherwise, by what system could this read/write be “transparent”?
reader/writer has to be at the transaction level, not the statement level, and 
without an up-front declaration as to whether a transaction is to be reader or 
writer, it’s not possible.



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo.db]A proposal for DB read/write separation

2014-08-08 Thread Mike Wilson
Li Ma,

This is interesting, In general I am in favor of expanding the scope of any
read/write separation capabilities that we have. I'm not clear what exactly
you are proposing, hopefully you can answer some of my questions inline.
The thing I had thought of immediately was detection of whether an
operation is read or write and integrating that into oslo.db or sqlalchemy.
Mike Bayer has some thoughts on that[1] and there are other approaches
around that can be copied/learned from. These sorts of things are clear to
me and while moving towards more transparency for the developer, still
require context. Please, share with us more details on your proposal.

-Mike

[1]
http://www.percona.com/doc/percona-xtradb-cluster/5.5/wsrep-system-index.html
[2]
http://techspot.zzzeek.org/2012/01/11/django-style-database-routers-in-sqlalchemy/


On Thu, Aug 7, 2014 at 10:03 PM, Li Ma skywalker.n...@gmail.com wrote:

 Getting a massive amount of information from data storage to be displayed
 is
 where most of the activity happens in OpenStack. The two activities of
 reading
 data and writing (creating, updating and deleting) data are fundamentally
 different.

 The optimization for these two opposite database activities can be done by
 physically separating the databases that service these two different
 activities. All the writes go to database servers, which then replicates
 the
 written data to the database server(s) dedicated to servicing the reads.


 Currently, AFAIK, many OpenStack deployment in production try to take
 advantage of MySQL (includes Percona or MariaDB) multi-master Galera
 cluster.
 It is possible to design and implement a read/write separation schema
 for such a DB cluster.


I just want to clarify here, are you suggesting that _all_ reads and _all_
writes would hit different databases? It would be interesting to see a
relational schema design that would allow that to work. That seems like
something that you wouldn't try in a relational database at all.



 Actually, OpenStack has a method for read scalability via defining
 master_connection and slave_connection in configuration, but this method
 lacks of flexibility due to deciding master or slave in the logical
 context(code). It's not transparent for application developer.
 As a result, it is not widely used in all the OpenStack projects.

 So, I'd like to propose a transparent read/write separation method
 for oslo.db that every project may happily takes advantage of it
 without any code modification.


The problem with making it transparent to the developer is that, well, you
can't unless your application is tolerant of old data in an asynchronous
replication world. If you are in a fully synchronous world you could fully
separate writes and reads, but what would be the point since your database
performance is now trash anyway. Please note that although Galera is a
considered a synchronous model it's not actually all the way there. You can
break the certification of course, but there are also things that are done
to keep the performance to an acceptable level. Take for example the
wswrep_causal_reads configuration parameter[2]. Without this sucker being
turned on you can't make read/write separation transparent to the
developer. Turning it on causes a significant performance degradation
unfortunately.

I feel like this is a problem fundamental to a consistent relational
dataset. If you are okay with eventual consistency it's okay, you can make
things transparent to the developer. But by it's very nature relational
datasets are well, relational, they need all the other pieces and those
pieces need to be consistent. I guess what I am saying is that your
proposal needs more details. Please respond with specifics and examples to
move the discussion forward.



 Moreover, I'd like to put it in the mailing list in advance to
 make sure it is acceptable for oslo.db.

 I'd appreciate any comments.

 br.
 Li Ma


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev