On 05/02/2016 11:51 AM, Mike Bayer wrote:
On 05/02/2016 07:38 AM, Matthieu Simonin wrote:
As far as we understand the idea of an ORM is to hide the relational
database with an Object oriented API.

I actually disagree with that completely.  The reason ORMs are so
maligned is because of this misconception; developer attempts to use an
ORM so that they will need not have to have any awareness of their
database, how queries are constructed, or even its schema's design;
witness tools such as Django ORM and Rails ActiveRecord which promise
this.   You then end up with an inefficient and unextensible mess
because the developers never considered anything about how the database
works or how it is queried, nor do they even have easy ways to monitor
or control it while still making use of the tool.   There are many blog
posts and articles that discuss this and it is in general known as the
"object relational impedance mismatch".

SQLAlchemy's success comes from its rejection of this entire philosophy.
  The purpose of SQLAlchemy's ORM is not to "hide" anything but rather
to apply automation to the many aspects of relational database
communication as well as row->object mapping that otherwise express
themselves in an application as either a large amount of repetitive
boilerplate throughout an application or as an awkward series of ad-hoc
abstractions that don't really do the job very well.   SQLAlchemy is
designed to expose both the schema design as well as the structure of
queries completely.   My talk at [1] goes into this topic in detail
including specific API architectures that facilitate this concept.

It's for that reason that I've always rejected notions of attempting to
apply SQLAlchemy directly on top of a datastore that is explicitly
non-relational.   By doing so, you remove a vast portion of the
functionality that relational databases provide and there's really no
point in using a tool like SQLAlchemy that is very explicit about DDL
and SQL on top of that kind of database.

To effectively put SQLAlchemy on top of a non-relational datastore, what
you really want to do is build an entire SQL engine on top of it.  This
is actually feasible; I was doing work for the now-defunct FoundationDB
(was bought by Apple) who had a very good implementation of
SQL-on-top-of-distributed keystore going, and the Cockroach and TiDB
projects you mention are definitely the most appropriate choice to take
if a certain variety of distribution underneath SQL is desired.

Well said, Mike, on all points above.

<snip>

But also, w.r.t. Cells there seems to be some remaining debate over why
exactly a distributed approach is even needed.  As others have posted, a
single MySQL database, replicated across Galera or not, scales just fine
for far more data than Nova ever needs to store.  So it's not clear why
the need for a dramatic rewrite of its datastore is called for.

Cells(v1) in Nova *already has* completely isolated DB/MQ for each cell, and there are bunch of duplicated-but-slightly-different-and-impossible-to-maintain code paths in the scheduler and compute manager. Part of the cellsv2 effort is to remove these duplicated code paths.

Cells are, as much as anything else, an answer to an MQ scaling problem, less so an answer to a DB scaling problem. Having a single MQ bus for tens of thousands of compute nodes is just not tenable -- at least with the message passing patterns and architecture that we use today...

Finally, Cells also represent a failure domain for the control plane. If a network partition occurs between a cell and the top-level API layer, no other cell is affected by the disruption.

Now, what does all this mean with regards to whether to use a single distributed database solution versus a single RDBMS versus many isolated RDBMS instances per cell? Not sure. Arguments can be made for all three approaches clearly. Depending on what folks' priorities are with regards to simplicity, scale, and isolation of failure domains, the "right" choice is tough to determine.

On the one hand, using a single distributed datastore like Cassandra for everything would make things conceptually easy to reason about and make OpenStack clouds much easier to deploy at scale.

On the other hand, porting things to Cassandra (or any other NoSQL solution) would require a complete overhaul of the way *certain* data is queried in the Nova subsystems. Examples of Cassandra's poor fit for some types of data are quota and resource usage aggregation queries. While Cassandra does support some aggregation via CQL in recent Cassandra versions, Cassandra simply wasn't built for this kind of data access pattern and doing anything reasonably complicated would require using Pig/Hive/Map-Reduce-y stuff, which kind of defeats the simplicity arguments.

Where queries against resource usage aggregate data are a great fit for a single RDBMS system, doing these queries across multiple RDBMS systems (i.e. in the child cell databases) is a recipe for nightmares and you'd essentially need to implement a Map-Reduce-y type of system to answer these queries and you will lose transactional semantics when you do that.

This is why the decision was made to bring this set of data into a top-level DB (either the API database or a new broken-out scheduler DB).

Finally, doing list operations with pagination, sorting, and free-form filters across dozens or more child cell databases is similarly nightmare-inducing. RDBMS (single or not) aren't a great fit for this type of data access pattern. Instead, something like ElasticSearch is a much better fit.

So, bottom line, there's no simple answer. At this point, my personal preferred solution that I think balances simplicity with the need to have useful and friendly APIs along with an ability to isolate a failure domain as effectively as possible is the following:

1) Use a single top-level DB for the API database (mapping tables, request and build spec tables, instance types (flavors) and keypairs. This DB could be an RDBMS or Cassandra, depending on the deployer's preferences around availability guarantees, existing sysadmin/DBA knowledge and other factors.

2) Use a single top-level RDBMS for the scheduler database (resource classes, resource providers, resource pools, aggregates, aggregate mapping tables, capabilities, inventories and allocations tables). The querying patterns for this type of data is specifically an excellent fit for an RDBMS and a single RDBMS should be able to service a *very large* (500K+ resource providers) deployment with minimal caching needed to handle quota, resource claim and resource capacity queries.

3) Use a single DB in each cell to store information about the instances located in that cell and the instance tasks being performed against those instances. This DB could be either an RDBMS or Cassandra, depending on the deployer's preferences. The key here is to keep the cell a fully isolated failure domain.

4) Have all instance listing operations in the Compute API proxy to a call to Project Searchlight and ElasticSearch to handle the cross-cell queries against the instance and instance update (task) information.

I think the above provides a pretty good balance of using tools that fits the data access patterns properly and keeping the original idea of isolated failure domains a strong priority for the architecture of Nova.

Thoughts,
-jay

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to