Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting

Michael Chapman Fri, 03 Oct 2014 00:02:20 -0700

On Fri, Oct 3, 2014 at 4:05 AM, Soren Hansen <[email protected]> wrote:


> I'm sorry about my slow responses. For some reason, gmail didn't think
> this was an important e-mail :(
>
> 2014-09-30 18:41 GMT+02:00 Jay Pipes <[email protected]>:
> > On 09/30/2014 08:03 AM, Soren Hansen wrote:
> >> 2014-09-12 1:05 GMT+02:00 Jay Pipes <[email protected]>:
> > How would I go about getting the associated fixed IPs for a network?
> > The query to get associated fixed IPs for a network [1] in Nova looks
> > like this:
> >
> > SELECT
> >  fip.address,
> >  fip.instance_uuid,
> [...]
> > AND fip.instance_uuid IS NOT NULL
> > AND i.host = :host
> >
> > would I have a Riak container for virtual_interfaces that would also
> > have instance information, network information, fixed_ip information?
> > How would I accomplish the query against a derived table that gets the
> > minimum virtual interface ID for each instance UUID?
>
> What's a minimum virtual interface ID?
>
> Anyway, I think Clint answered this quite well.
>
> >>> I've said it before, and I'll say it again. In Nova at least, the
> >>> SQL schema is complex because the problem domain is complex. That
> >>> means lots of relations, lots of JOINs, and that means the best way
> >>> to query for that data is via an RDBMS.
> [...]
> >> I don't think relying on a central data store is in any conceivable
> >> way appropriate for a project like OpenStack. Least of all Nova.
> >>
> >> I don't see how we can build a highly available, distributed service
> >> on top of a centralized data store like MySQL.
> [...]
> > I don't disagree with anything you say above. At all.
>
> Really? How can you agree that we can't "build a highly available,
> distributed service on top of a centralized data store like MySQL" while
> also saying that the best way to handle data in Nova is in an RDBMS?
>
> >>> For complex control plane software like Nova, though, an RDBMS is
> >>> the best tool for the job given the current lay of the land in open
> >>> source data storage solutions matched with Nova's complex query and
> >>> transactional requirements.
> >> What transactional requirements?
> >
> https://github.com/openstack/nova/blob/stable/icehouse/nova/db/sqlalchemy/api.py#L1654
> > When you delete an instance, you don't want the delete to just stop
> > half-way through the transaction and leave around a bunch of orphaned
> > children.  Similarly, when you reserve something, it helps to not have
> > a half-finished state change that you need to go clean up if something
> > goes boom.
>
> Looking at that particular example, it's about deleting an instance and
> all its associated metadata. As we established earlier, these are things
> that would just be in the same key as the instance itself, so it'd just
> be a single key that would get deleted. Easy.
>
> That said, there will certainly be situations where there'll be a need
> for some sort of anti-entropy mechanism. It just so happens that those
> situations already exist. We're dealing with about a complex distributed
> system.  We're kidding ourselves if we think that any kind of
> consistency is guaranteed, just because our data store favours
> consistency over availability.
>
>
I apologize if I'm missing something, but doesn't denormalization to add
join support put the same value in many places, such that an update to that
value is no longer a single atomic transaction? This would appear to
counteract the requirement for strong consistency. If updating a single
value is atomic (as in Riak's consistent mode) then it might be possible to
construct a way to make multiple updates appear atomic, but it would add
many more transactions and many more quorum checks, which would reduce
performance to a crawl.

I also don't really see how a NoSQL system in strong consistency mode is
any different from running MySQL with galera in its failure modes. The
requirement for quorum makes the addition of nodes increase the potential
latency of writes (and reads in some cases) so having large scale doesn't
grant much benefit, if any. Quorum will also prevent nodes on the wrong
side of a partition from being able to access system state (or it will give
them stale state, which is probably just as bad in our case).

I think your goal of having state management that's able to handle network
partitions is a good one, but I don't think the solution is as simple as
swapping out where the state is stored. Maybe in some cases like
split-racks the system needs to react to a network partition by forming its
own independent cell with its own state storage, and when the network heals
it then merges back into the other cluster cleanly? That would be very
difficult to implement, but fun (for some definition of fun).

As a thought experiment, a while ago I considered what would happen if
instead of using a central store, I put a sqlite database behind every
daemon and allowed them to query each other for the data they needed, and
cluster if needed (using raft). Services like nova-scheduler need strong
consistency and would have to cluster to perform their role, but services
like nova-compute would simply need to store the data concerning the
resources they are responsible for. This follows the 'place state at the
edge' kind of design principles that have been discussed in various
circles. It falls down in a number of pretty obvious ways, and ultimately
it would require more work than I am able to put in, but I mention it
because perhaps it provides you with food for thought.

/architecture_astronaut

>
> https://github.com/openstack/nova/blob/stable/icehouse/nova/db/sqlalchemy/api.py#L3054
>
> Sure, quotas will require stronger consistency. Any NoSQL data store
> worth its salt gives you primitives to implement that.
>
>>> Folks in these other programs have actually, you know, thought about
> >>> these kinds of things and had serious discussions about
> >>> alternatives.  It would be nice to have someone acknowledge that
> >>> instead of snarky comments implying everyone else "has it wrong".
> >> I'm terribly sorry, but repeating over and over that an RDBMS is "the
> >> best tool" without further qualification than "Nova's data model is
> >> really complex" reads *exactly* like a snarky comment implying
> >> everyone else "has it wrong".
> > Sorry if I sound snarky. I thought your blog post was the definition
> > of snark.
>
> I don't see the relevance of the tone of my blog post?
>
> You say it would be nice if people did something other than offer snarky
> comments implying everyone else "has it wrong".  I'm just pointing out
> that such requests ring really hollow when put forth in the very e-mail
> where you snarkily tell everyone else that they have it wrong.
>
> Since you did bring up my blog post, I really am astounded you find it
> snarky.  It was intended to be constructive and forward looking. The
> first one in the series, perhaps, but certainly not the one linked in
> this thread.
>
> Perhaps I need to take writing classes.
>
> --
> Soren Hansen             | http://linux2go.dk/
> Ubuntu Developer         | http://www.ubuntu.com/
> OpenStack Developer      | http://www.openstack.org/
>
> _______________________________________________
> OpenStack-dev mailing list
> [email protected]
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting

Reply via email to