Re: [openstack-dev] [trove][all][tc] A proposal to rearchitect Trove

Zane Bitter Tue, 20 Jun 2017 14:00:07 -0700

On 20/06/17 11:45, Jay Pipes wrote:

Good discussion, Zane. Comments inline.

++

On 06/20/2017 11:01 AM, Zane Bitter wrote:
On 20/06/17 10:08, Jay Pipes wrote:
On 06/20/2017 09:42 AM, Doug Hellmann wrote:
Does "service VM" need to be a first-class thing?  Akanda creates
them, using a service user. The VMs are tied to a "router" which
is the billable resource that the user understands and interacts with
through the API.
Frankly, I believe all of these types of services should be built asapplications that run on OpenStack (or other) infrastructure. Inother words, they should not be part of the infrastructure itself.
There's really no need for a user of a DBaaS to have access to thehost or hosts the DB is running on. If the user really wanted that,they would just spin up a VM/baremetal server and install the thingthemselves.
Hey Jay,
I'd be interested in exploring this idea with you, because I thinkeveryone agrees that this would be a good goal, but at least in mymind it's not obvious what the technical solution should be.(Actually, I've read your email a bunch of times now, and I go backand forth on which one you're actually advocating for.) The twooptions, as I see it, are as follows:
1) The database VMs are created in the user's tena^W project. Theyconnect directly to the tenant's networks, are governed by the user'squota, and are billed to the project as Nova VMs (on top of whateveradditional billing might come along with the management services). A[future] feature in Nova (https://review.openstack.org/#/c/438134/)allows the Trove service to lock down access so that the user cannotactually interact with the server using Nova, but must go through theTrove API. On a cloud that doesn't include Trove, a user could runTrove as an application themselves and all it would have to dodifferently is not pass the service token to lock down the VM.
alternatively:
2) The database VMs are created in a project belonging to the operatorof the service. They're connected to the user's network through<magic>, and isolated from other users' databases running in the sameproject through <security groups? hierarchical projects? magic?>.Trove has its own quota management and billing. The user cannotinteract with the server using Nova since it is owned by a differentproject. On a cloud that doesn't include Trove, a user could run Troveas an application themselves, by giving it credentials for their ownproject and disabling all of the cross-tenant networking stuff.
None of the above :)
Don't think about VMs at all. Or networking plumbing. Or volume storageor any of that.


OK, but somebody has to ;)

Think only in terms of what a user of a DBaaS really wants. At the endof the day, all they want is an address in the cloud where they canpoint their application to write and read data from.
Do they want that data connection to be fast and reliable? Of course,but how that happens is irrelevant to them
Do they want that data to be safe and backed up? Of course, but how thathappens is irrelevant to them.

Fair enough. The world has changed a lot since RDS (which was the modelfor Trove) was designed, it's certainly worth reviewing the baseassumptions before embarking on a new design.

The problem with many of these high-level *aaS projects is that theyconsider their user to be a typical tenant of general cloudinfrastructure -- focused on launching VMs and creating volumes andnetworks etc. And the discussions around the implementation of theseprojects always comes back to minutia about how to set up securecommunication channels between a control plane message bus and theservice VMs.

Incidentally, the reason that discussions always come back to that isbecause OpenStack isn't very good at it, which is a huge problem notonly for the *aaS projects but for user applications in general runningon OpenStack.

If we had fine-grained authorisation and ubiquitous multi-tenantasynchronous messaging in OpenStack then I firmly believe that we, andapplication developers, would be in much better shape.

If you create these projects as applications that run on cloudinfrastructure (OpenStack, k8s or otherwise),

I'm convinced there's an interesting idea here, but the terminologyyou're using doesn't really capture it. When you say 'as applicationsthat run on cloud infrastructure', it sounds like you mean they shouldrun in a Nova VM, or in a Kubernetes cluster somewhere, rather than onthe OpenStack control plane. I don't think that's what you mean though,because you can (and IIUC Rackspace does) deploy OpenStack services thatway already, and it has no real effect on the architecture of thoseservices.

then the discussions focusinstead on how the real end-users -- the ones that actually call theAPIs and utilize the service -- would interact with the APIs and not theunderlying infrastructure itself.
Here's an example to think about...
What if a provider of this DBaaS service wanted to jam 100 databaseinstances on a single VM and provide connectivity to those databaseinstances to 100 different tenants?
Would those tenants know if those databases were all serviced from asingle database server process running on the VM?

You bet they would when one (or all) of the other 99 decided to run areally expensive query at an inopportune moment :)

Or 100 contains eachrunning a separate database server process? Or 10 containers running 10database server processes each?
No, of course not. And the tenant wouldn't care at all, because the

Well, if they had any kind of regulatory (or even performance)requirements then the tenant might care really quite a lot. But I takeyour point that many might not and it would be good to be able to offerthem lower cost options.

point of the DBaaS service is to get a database. It isn't to get one ormore VMs/containers/baremetal servers.

I'm not sure I entirely agree here. There are two kinds of DBaaS. One isa data API: a multitenant database a la DynamoDB. Those are very cool,and I'm excited about the potential to reduce the granularity of billingto a minimum, in much the same way Swift does for storage, and I'm sadthat OpenStack's attempt in this space (MagnetoDB) didn't work out. ButTrove is not that.

People use Trove because they want to use a *particular* database, butstill have all the upgrades, backups, &c. handled for them. Given thatthe choice of database is explicitly *not* abstracted away from them,things like how many different VMs/containers/baremetal servers thedatabase is running on are very much relevant IMHO, because what youwant depends on both the database and how you're trying to use it. Andbecause (afaik) none of them have native multitenancy, it's necessarythat no tenant should have to share with any other.

Essentially Trove operates at a moderate level of abstraction -somewhere between managing the database + the infrastructure it runs onyourself and just an API endpoint you poke data into. It also operatesat the coarse end of a granularity spectrum running fromVMs->Containers->pay as you go.

It's reasonable to want to move closer to the middle of the granularityspectrum. But you can't go all the way to the high abstraction/finegrained ends of the spectra (which turn out to be equivalent) withoutbecoming something qualitatively different.

At the end of the day, I think Trove is best implemented as a hostedapplication that exposes an API to its users that is entirely separatefrom the underlying infrastructure APIs like Cinder/Nova/Neutron.
This is similar to Kevin's k8s Operator idea, which I support but in ageneric fashion that isn't specific to k8s.
In the same way that k8s abstracts the underlying infrastructure (viaits "cloud provider" concept), I think that Trove and similar projectsneed to use a similar abstraction and focus on providing a different APIto their users that doesn't leak the underlying infrastructure APIconcepts out.

OK, so trying to summarise (stop me if I'm getting it wrong):essentially you support option (2) because it is a closed abstraction.Trove has its own quota management, billing, &c. and the user can't seethe VM, so the operator is free to substitute a different backend thatallocates compute capacity in finer-grained increments than Nova does.

Interestingly, that's only an issue because there is no finer-grainedcompute resource than a VM available through the OpenStack API. If therewere an OpenStack API (or even just a Keystone-authenticated API) to ashared, multitenant container orchestration cluster, this wouldn't be anissue. But apart from OpenShift, I can't think of any cloud servicethat's doing that - AWS, Google, OpenStack are all using the model wherethe COE cluster is deployed on VMs that are owned by a particulartenant. Of all the things you could run in containers on shared servers,databases have arguably the most to lose (performance, security) and theleast to gain (since they're by definition stateful). So my question is:if this is such a good idea for databases, why isn't anybody doing itfor everything container-based? i.e. instead of Magnum/Zun should wejust be working on a Keystone auth gateway for OpenShift (a.k.a. the_one_ thing that _everyone_ had hitherto agreed was definitely out ofscope :D )?

Until then it seems to me that the tradeoff is between decoupling itfrom the particular cloud it's running on so that users can optionallydeploy it standalone (essentially Vish's proposed solution for the *aaSservices from many moons ago) vs. decoupling it from OpenStack ingeneral so that the operator has more flexibility in how to deploy.

I'd love to be able to cover both - from a user using it standalone tospin up and manage a DB in containers on a shared PaaS, through to auser accessing it as a service to provide a DB running on a dedicated VMor bare metal server, and everything in between. I don't know is such athing is feasible. I suspect we're going to have to talk a lot about VMsand network plumbing and volume storage :)


cheers,
Zane.

Best,
-jay
Of course the current situation, as Amrith alluded to, where thedefault is option (1) except without the lock-down feature in Nova,though some operators are deploying option (2) but it's not testedupstream... clearly that's the worst of all possible worlds, and AIUInobody disagrees with that.
To my mind, (1) sounds more like "applications that run on OpenStack(or other) infrastructure", since it doesn't require stuff like theadmin-only cross-project networking that makes it effectively "part ofthe infrastructure itself" - as evidenced by the fact thatunprivileged users can run it standalone with little more than asimple auth middleware change. But I suspect you are going to usesimilar logic to argue for (2)? I'd be interested to hear your thoughts.
cheers,
Zane.
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:[email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [trove][all][tc] A proposal to rearchitect Trove

Reply via email to