On 09/30/2014 02:03 PM, Soren Hansen wrote:
Coming from Skype background I can assure your that you definitely can,
depending on your needs (and our experiments with e.g. MongoDB ended
very badly: it just died under IO loads, that our PostgreSQL treated
like normal). I mean, that's complex topic and I see a lot of people
switching to NoSQL and a lot of people switching from. NoSQL is not a
silver bullet for scalability. Just my 0.5.
2014-09-12 1:05 GMT+02:00 Jay Pipes <jaypi...@gmail.com>:
If Nova was to take Soren's advice and implement its data-access layer
on top of Cassandra or Riak, we would just end up re-inventing SQL
Joins in Python-land.
I may very well be wrong(!), but this statement makes it sound like you've
never used e.g. Riak. Or, if you have, not done so in the way it's
supposed to be used.
If you embrace an alternative way of storing your data, you wouldn't just
blindly create a container for each table in your RDBMS.
For example: In Nova's SQL-based datastore we have a table for security
groups and another for security group rules. Rows in the security group
rules table have a foreign key referencing the security group to which
they belong. In a datastore like Riak, you could have a security group
container where each value contains not just the security group
information, but also all the security group rules. No joins in
I've said it before, and I'll say it again. In Nova at least, the SQL
schema is complex because the problem domain is complex. That means
lots of relations, lots of JOINs, and that means the best way to query
for that data is via an RDBMS.
I was really hoping you could be more specific than "best"/"most
appropriate" so that we could have a focused discussion.
I don't think relying on a central data store is in any conceivable way
appropriate for a project like OpenStack. Least of all Nova.
I don't see how we can build a highly available, distributed service on
top of a centralized data store like MySQL.
/me disappears again
Tens or hundreds of thousands of nodes, spread across many, many racks
and datacentre halls are going to experience connectivity problems.
This means that some percentage of your infrastructure (possibly many
thousands of nodes, affecting many, many thousands of customers) will
find certain functionality not working on account of your datastore not
being reachable from the part of the control plane they're attempting to
use (or possibly only being able to read from it).
I say over and over again that people should own their own uptime.
Expect things to fail all the time. Do whatever you need to do to ensure
your service keeps working even when something goes wrong. Of course
this applies to our customers too. Even if we take the greatest care to
avoid downtime, customers should spread their workloads across multiple
availability zones and/or regions and probably even multiple cloud
providers. Their service towards their users is their responsibility.
However, our service towards our users is our responsibility. We should
take the greatest care to avoid having internal problems affect our
users. Building a massively distributed system like Nova on top of a
centralized data store is practically a guarantee of the opposite.
For complex control plane software like Nova, though, an RDBMS is the
best tool for the job given the current lay of the land in open source
data storage solutions matched with Nova's complex query and
What transactional requirements?
Folks in these other programs have actually, you know, thought about
these kinds of things and had serious discussions about alternatives.
It would be nice to have someone acknowledge that instead of snarky
comments implying everyone else "has it wrong".
I'm terribly sorry, but repeating over and over that an RDBMS is "the
best tool" without further qualification than "Nova's data model is
really complex" reads *exactly* like a snarky comment implying everyone
else "has it wrong".
OpenStack-dev mailing list