Re: [openstack-dev] [nova] Distributed Database

Mark Doffman Tue, 03 May 2016 17:10:17 -0700

This thread has been a depressing read.

I understand that the content is supposed to be distributed databasesbut for me it has become an inquisition of cellsV2.

Our question has clearly become "Should we continue efforts oncellsV2?", which I will address head-on.

We shouldn't be afraid to abandon CellsV2. If there are designs that areproven to be a better solution then our current momentum shouldn't keepus from an abrupt change. As someone who is working on this I have anattachment to the current design, but Its important for me to keep anopen mind.


Here are my *main* reasons for continuing work on CellsV2.

1. It provides a proven solution to an immediate message queue problem.

Yes CellsV2 is different to CellsV1, but the previous solution showedthat application-level sharding of the message queue can work. CellsV2provides this solution with a (moderately) easy upgrade path forexisting deployments. These deployments may not be comfortable withchanging MQ technologies or may already be using CellsV1. Applicationlevel sharding of the message queue is not pretty, but will work.


2. The 'complexity' of CellsV2 is vastly overstated.

Sure there is a-lot of *work* to do for cellsv2, but this doesn't implyincreased complexity: any refactoring requires work. CellsV1 addedcomplexity to our codebase, Cellsv2 does not. In-fact by clearlyseparating data that is 'owned'by the different services we have Ibelieve that we are improving the modularity and encapsulation presentin Nova.


3. CellsV2 does not prohibit *ANY* of the alternative scaling methods
   mentioned in this thread.

Really, it doesn't. Both message queue and database switching arecompletely optional. Both in the sense of running a single cell, andeven when running multiple cells. If anything, the ability to runseparate message queues and database connections could give us theability to trial these alternative technologies within a real, running,cloud.

Just imagine the ability to set up a cell in your existing cloud thatruns 0mq rather than rabbit. How about a NewSQL database integrated into an existing cloud? Both of these things may (With some work) be possible.

I could go on, but I won't. These are my main reasons and I'll stick tothem.

Its difficult to be proven wrong, but sometimes necessary to get thebest product that we can. I don't think that the existence ofalternative message queue and database options is enough to stop cellsV2work now. A proven solution, that meets the upgrade constraints that wehave in Nova, would be a good reason to do so. We should of-courseexplore other options, nothing we are doing prevents that. When theywork out, I'll be super excited.


Thanks

Mark

On 4/29/16 12:53 AM, Clint Byrum wrote:

Excerpts from Mike Bayer's message of 2016-04-28 22:16:54 -0500:


On 04/28/2016 08:25 PM, Edward Leafe wrote:

Your own tests showed that a single RDBMS instance doesn’t even break a sweat
under your test loads. I don’t see why we need to shard it in the first
place, especially if in doing so we add another layer of complexity and
another dependency in order to compensate for that choice. Cells are a useful
concept, but this proposed implementation is adding way too much complexity
and debt to make it worthwhile.


now that is a question I have also.  Horizontal sharding is usually for
the case where you need to store say, 10B rows, and you'd like to split
it up among different silos.  Nothing that I've seen about Nova suggests
this is a system with any large data requirements, or even medium size
data (a few million rows in relational databases is nothing).    I
didn't have the impression that this was the rationale behind Cells, it
seems like this is more of some kind of logical separation of some kind
that somehow suits some environments (but I don't know how).
Certainly, if you're proposing a single large namespace of data across a
partition of nonrelational databases, and then the data size itself is
not that large, as long as "a single namespace" is appropriate then
there's no reason to break out of more than one MySQL database.  There's
not much reason to transparently shard unless you are concerned about
adding limitless storage capacity.   The Cells sharding seems to be
intentionally explicit and non-transparent.


There's a bit more to it than the number of rows. There's also a desire
to limit failure domains. IMO, that is entirely unfounded, as I've run
thousands of servers that depended on a single pair of MySQL servers
using simple DRBD and pacemaker with a floating IP for failover. This
is the main reason MySQL is a thing... it can handle 100,000 concurrent
connections just fine, and the ecosystem around detecting and handling
failure/maintenance is mature.

The whole cells conversation, IMO, stems from the way we use RabbitMQ.
We should just stop doing that. I know as I move forward with our scaling
efforts, I'll be trying several RPC drivers and none of them will go
through RabbitMQ.

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Distributed Database

Reply via email to