Re: [openstack-dev] [nova] Distributed Database

Jay Pipes Mon, 02 May 2016 10:47:08 -0700

On 05/02/2016 11:51 AM, Mike Bayer wrote:

On 05/02/2016 07:38 AM, Matthieu Simonin wrote:

As far as we understand the idea of an ORM is to hide the relational
database with an Object oriented API.


I actually disagree with that completely.  The reason ORMs are so
maligned is because of this misconception; developer attempts to use an
ORM so that they will need not have to have any awareness of their
database, how queries are constructed, or even its schema's design;
witness tools such as Django ORM and Rails ActiveRecord which promise
this.   You then end up with an inefficient and unextensible mess
because the developers never considered anything about how the database
works or how it is queried, nor do they even have easy ways to monitor
or control it while still making use of the tool.   There are many blog
posts and articles that discuss this and it is in general known as the
"object relational impedance mismatch".

SQLAlchemy's success comes from its rejection of this entire philosophy.
  The purpose of SQLAlchemy's ORM is not to "hide" anything but rather
to apply automation to the many aspects of relational database
communication as well as row->object mapping that otherwise express
themselves in an application as either a large amount of repetitive
boilerplate throughout an application or as an awkward series of ad-hoc
abstractions that don't really do the job very well.   SQLAlchemy is
designed to expose both the schema design as well as the structure of
queries completely.   My talk at [1] goes into this topic in detail
including specific API architectures that facilitate this concept.

It's for that reason that I've always rejected notions of attempting to
apply SQLAlchemy directly on top of a datastore that is explicitly
non-relational.   By doing so, you remove a vast portion of the
functionality that relational databases provide and there's really no
point in using a tool like SQLAlchemy that is very explicit about DDL
and SQL on top of that kind of database.

To effectively put SQLAlchemy on top of a non-relational datastore, what
you really want to do is build an entire SQL engine on top of it.  This
is actually feasible; I was doing work for the now-defunct FoundationDB
(was bought by Apple) who had a very good implementation of
SQL-on-top-of-distributed keystore going, and the Cockroach and TiDB
projects you mention are definitely the most appropriate choice to take
if a certain variety of distribution underneath SQL is desired.


Well said, Mike, on all points above.

<snip>

But also, w.r.t. Cells there seems to be some remaining debate over why
exactly a distributed approach is even needed.  As others have posted, a
single MySQL database, replicated across Galera or not, scales just fine
for far more data than Nova ever needs to store.  So it's not clear why
the need for a dramatic rewrite of its datastore is called for.

Cells(v1) in Nova *already has* completely isolated DB/MQ for each cell,and there are bunch ofduplicated-but-slightly-different-and-impossible-to-maintain code pathsin the scheduler and compute manager. Part of the cellsv2 effort is toremove these duplicated code paths.

Cells are, as much as anything else, an answer to an MQ scaling problem,less so an answer to a DB scaling problem. Having a single MQ bus fortens of thousands of compute nodes is just not tenable -- at least withthe message passing patterns and architecture that we use today...

Finally, Cells also represent a failure domain for the control plane. Ifa network partition occurs between a cell and the top-level API layer,no other cell is affected by the disruption.

Now, what does all this mean with regards to whether to use a singledistributed database solution versus a single RDBMS versus many isolatedRDBMS instances per cell? Not sure. Arguments can be made for all threeapproaches clearly. Depending on what folks' priorities are with regardsto simplicity, scale, and isolation of failure domains, the "right"choice is tough to determine.

On the one hand, using a single distributed datastore like Cassandra foreverything would make things conceptually easy to reason about and makeOpenStack clouds much easier to deploy at scale.

On the other hand, porting things to Cassandra (or any other NoSQLsolution) would require a complete overhaul of the way *certain* data isqueried in the Nova subsystems. Examples of Cassandra's poor fit forsome types of data are quota and resource usage aggregation queries.While Cassandra does support some aggregation via CQL in recentCassandra versions, Cassandra simply wasn't built for this kind of dataaccess pattern and doing anything reasonably complicated would requireusing Pig/Hive/Map-Reduce-y stuff, which kind of defeats the simplicityarguments.

Where queries against resource usage aggregate data are a great fit fora single RDBMS system, doing these queries across multiple RDBMS systems(i.e. in the child cell databases) is a recipe for nightmares and you'dessentially need to implement a Map-Reduce-y type of system to answerthese queries and you will lose transactional semantics when you do that.

This is why the decision was made to bring this set of data into atop-level DB (either the API database or a new broken-out scheduler DB).

Finally, doing list operations with pagination, sorting, and free-formfilters across dozens or more child cell databases is similarlynightmare-inducing. RDBMS (single or not) aren't a great fit for thistype of data access pattern. Instead, something like ElasticSearch is amuch better fit.

So, bottom line, there's no simple answer. At this point, my personalpreferred solution that I think balances simplicity with the need tohave useful and friendly APIs along with an ability to isolate a failuredomain as effectively as possible is the following:

1) Use a single top-level DB for the API database (mapping tables,request and build spec tables, instance types (flavors) and keypairs.This DB could be an RDBMS or Cassandra, depending on the deployer'spreferences around availability guarantees, existing sysadmin/DBAknowledge and other factors.

2) Use a single top-level RDBMS for the scheduler database (resourceclasses, resource providers, resource pools, aggregates, aggregatemapping tables, capabilities, inventories and allocations tables). Thequerying patterns for this type of data is specifically an excellent fitfor an RDBMS and a single RDBMS should be able to service a *very large*(500K+ resource providers) deployment with minimal caching needed tohandle quota, resource claim and resource capacity queries.

3) Use a single DB in each cell to store information about the instanceslocated in that cell and the instance tasks being performed againstthose instances. This DB could be either an RDBMS or Cassandra,depending on the deployer's preferences. The key here is to keep thecell a fully isolated failure domain.

4) Have all instance listing operations in the Compute API proxy to acall to Project Searchlight and ElasticSearch to handle the cross-cellqueries against the instance and instance update (task) information.

I think the above provides a pretty good balance of using tools thatfits the data access patterns properly and keeping the original idea ofisolated failure domains a strong priority for the architecture of Nova.


Thoughts,
-jay

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Distributed Database

Reply via email to