On 04/28/2016 08:44 AM, Edward Leafe wrote:
On Apr 24, 2016, at 3:28 PM, Robert Collins <robe...@robertcollins.net> wrote:

For instance, the things I think are essential for a distributed
database based datastore:
- good single-machine developer story. Must not need a physical
cluster to hack on OpenStack
- deal gracefully with single node/rack/site failures (when deployed
appropriately) - allow limiting failure domain impact
- straightforward programming model: wrong uses should be obvious to reviewers
- low latency performance with big datasets: e.g. nova list as an
admin should be able to get the Nth page as rapidly as the 2nd or 3rd.
- code to deliver that should be (approximately) no worse than the current code

Agree on all of these points, as well as the rest of your post.

After several hallway track discussions, as well as yesterday’s Cells V2 
discussion, I’ve written a follow-up post:

http://blog.leafe.com/index.php/2016/04/28/fragmented-data/

Feedback, of course, is welcomed!


Regarding ROME [1], I've taken a look at its source code and while it is certainly interesting, I wouldn't recommend lifting and moving all of Nova's database infrastructure onto it as a dependency within the near term, as the state of this code is very immature. SQLAlchemy itself was once immature as well, so there is no sin here, but that was eleven years ago.

The internals here are not only highly dependent on SQLAlchemy internals (pinned at the 0.9 series which is obsolete), it is using these APIs in a very brittle and non-performant way [2]. In this code example, the internal elements of SQLAlchemy expression objects are repeatedly run through str() which on each call runs a full string compilation step in order to test for what their actual type is. It can't be overstated how inappropriate this approach is and the author of the library would have benefited from reaching out to me in order to get some guidance on the correct way to introspect SQLAlchemy expression objects. Basic Python idioms like type checking also seem to be misunderstood [3].

I don't think anyone denies that Nova can use any kind of database backend but the point was raised that to start from scratch with an entirely new database approach is an enormous job. If the first step of that job is in fact "port SQLAlchemy and the relational model to Redis", that makes the job extremely more involved and I'd disagree with your post's assertion that "It's not too late" if this is the case. If the admission of ROME for Nova is that the relational model is in fact necessary for Nova, then that disqualifies NoSQL databases out of the gate - it's one thing to lament that MySQL is not as "distributed" out of the box as a NoSQL database, but it's another to lament that non-relational databases are not in fact relational.

[1] https://github.com/BeyondTheClouds/rome

[2] https://github.com/BeyondTheClouds/rome/blob/master/lib/rome/core/expression/expression.py#L172

[3] https://github.com/BeyondTheClouds/rome/blob/master/lib/rome/core/expression/expression.py#L102



-- Ed Leafe






__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to