Re: [openstack-dev] [nova] Manage multiple clusters using a single nova service
Hi, Sorry for taking such long time to chime in but these mails were sadly missed. Please see my inline comments below. My original concerns for the revert of the service were as follows: 1. What do we do about existing installation. This support was added at the end of Havana and it is in production. 2. I had concerns regarding the way in which the image cache would be maintained - that is each compute node has its own cache directory. So this may have had datastore issues. Over the last few weeks I have encountered some serious problems with the multi VC support. This is causing production setups to break (https://review.openstack.org/108225 is an example - this is due to https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L3368 ). This is due to the fact that the node may be updated at random places in the nova manager code (these may be bugs - but it does not work well with the multi cluster support). There are too many edge cases here and the code is not robust enough. If we do decide to go ahead with dropping the support, then we need to do the following: 1. Upgrade path: we need to have a well defined upgrade path that will enable an existing setup to upgrade from I to J (I do not think that we should leave this till K as there are too many pinpoints with the node management). 2. We need to make a few tweaks to the image cache path. My original concern was that each compute node has its own cache directory. After giving it some though this will be ok as long as we have each compute host using the same cache directory. The reason for this is that the locking for image handling is done external on the file system (https://github.com/openstack/nova/blob/master/nova/virt/vmwareapi/vmops.py #L319). So if we have multiple compute processes running on the same host then we are good. In addition to this we can make use of a shared files system and then we can have all compute nodes use the shared file system for the locking - win win :). If anyone gets to this stage in the thread then please see a fix for object support and aging (https://review.openstack.org/111996 - the object updates made earlier int he cycle caused a few problems - but I guess that the gate does not wait 24 hours to purge instances). In short I am in favor of removing the multi cluster support but we need to do the following: 1. Upgrade path 2. Investigate memory issues with nova compute 3. Tweak image cache path Thanks Gary On 7/15/14, 11:36 AM, "Matthew Booth" wrote: >On 14/07/14 09:34, Vaddi, Kiran Kumar wrote: >> Hi, >> >> >> >> In the Juno summit, it was discussed that the existing approach of >> managing multiple VMware Clusters using a single nova compute service is >> not preferred and the approach of one nova compute service representing >> one cluster should be looked into. >> >> >> >> We would like to retain the existing approach (till we have resolved the >> issues) for the following reasons: >> >> >> >> 1. Even though a single service is managing all the clusters, >> logically it is still one compute per cluster. To the scheduler each >> cluster is represented as individual computes. Even in the driver each >> cluster is represented separately. This is something that would not change with dropping the multi cluster support. The only change here is that additional processes will be running (please see below). >> >> >> >> 2. Since ESXi does not allow to run nova-compute service on the >> hypervisor unlike KVM, the service has to be run externally on a >> different server. Its easier from administration perspective to manage a >> single service than multiple. Yes, you have a good point here, but I think that at the end of the day we need a robust service and that service will be managed by external tools, for example chef, puppet etc. Unless it is a very small cloud. >> >> >> 3. Every connection to vCenter uses up ~140MB in the driver. If we >> were to manage each cluster by an individual service the memory consumed >> for 32 clusters will be high (~4GB). The newer versions support 64 >>clusters! I think that this is a bug and it needs to be fixed. I understand that this may affect a decision from today to tomorrow but it is not an architectural issue and can be resolved (and really should be resolved ASAP). I think that we need to open a bug for this and we should start to investigate - fixing this will enable whoever is running a service uses those resources elsewhere :) >> >> >> >> 4. There are existing customer installations that use the existing >> approach and therefore not enforce the new approach until it is simple >> to manage and not resource intensive. >> >> >> >> If the admin wants to use one service per cluster, it can be done with >> the existing driver. In the conf the admin has to specify a single >> cluster instead of a list of clusters. Therefore its better to give the >> admins the choice rather than enforcing one type of de
Re: [openstack-dev] [nova] Manage multiple clusters using a single nova service
>> I'm just do not suppor the idea that Nova needs to change its >> fundamental design in order to support the *design* of other host >> management platforms. > > The current implementation doesn't make nova change its design, the > scheduling decisions are still done by nova. Nova's design is not just "making the scheduling decisions" but also includes the deployment model, which is intended to be a single compute service tied to a single hypervisor. I think that's important for scale and failure isolation at least. > Its only the deployment that has been changed. Agree that there are > no separate topic-exchange queues for each cluster. I'm definitely with Jay here: I want to get away from hiding larger systems behind a single compute host/service. --Dan signature.asc Description: OpenPGP digital signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Manage multiple clusters using a single nova service
Answers to some of your concerns > Why can't ESXi hosts not run the nova-compute service? Is it like the > XenServer driver that has a pitifully old version of Python (2.4) that > constrains the code that is possible to run on it? If so, then I don't > really think the poor constraints of the hypervisor dom0 should mean > that Nova should change its design principles to accomodate. The > XenServer driver uses custom agents to get around this issue, IIRC. Why > can't the VCenter driver? ESXi hosts are generally operated in a lock-down mode where installation of agents is not allowed. All communication and tasks on the ESXi hosts must be done using vCenter. > The fact that each connection to vCenter uses 140MB of memory is > completely ridiculous. You can thank crappy SOAP for that, I believe. Yes, and the problem becomes bigger if we create multiple services > I'm just do not suppor the idea that Nova needs to > change its fundamental design in order to support the *design* of other > host management platforms. The current implementation doesn't make nova change its design, the scheduling decisions are still done by nova. Its only the deployment that has been changed. Agree that there are no separate topic-exchange queues for each cluster. Thanks Kiran > -Original Message- > From: Jay Pipes [mailto:jaypi...@gmail.com] > Sent: Tuesday, July 22, 2014 9:30 AM > To: openstack-dev@lists.openstack.org > Subject: Re: [openstack-dev] [nova] Manage multiple clusters using a single > nova service > > On 07/14/2014 04:34 AM, Vaddi, Kiran Kumar wrote: > > Hi, > > > > In the Juno summit, it was discussed that the existing approach of > > managing multiple VMware Clusters using a single nova compute service > > is not preferred and the approach of one nova compute service > > representing one cluster should be looked into. > > Even this is outside what I consider to be best practice for Nova, > frankly. The model of scale-out inside Nova is to have a nova-compute > worker responsible for only the distinct set of compute resources that > are provided by a single bare metal node. > > Unfortunately, with the introduction of the bare-metal driver in Nova, > as well as the "clustered hypervisors" like VCenter and Hyper-V, this > architectural design point was shot in the head, and now it is only > possible to scale the nova-compute <-> hypervisor communication layer > using a scale-up model instead of a scale-out model. This is a big deal, > and unfortunately, not enough discussion has been had around this, IMO. > > The proposed blueprint(s) around this and the code patches I've seen are > moving Nova in the opposite direction it needs to go, IMHO. > > > We would like to retain the existing approach (till we have resolved > > the issues) for the following reasons: > > > > 1.Even though a single service is managing all the clusters, > > logically it is still one compute per cluster. To the scheduler each > > cluster is represented as individual computes. Even in the driver > > each cluster is represented separately. > > How is this so? In Kanagaraj Manickam's proposed blueprint about this > [1], the proposed implementation would fork one process for each > hypervisor or cluster. However, the problem with this is that the > scheduler uses the single service record for the nova-compute worker to > determine whether or not the node is available to place resources on. > The servicegroup API would need to be refactored (rewritten, really) to > change its definition of a service to instead of being a single daemon, > now being a single process running within that daemon. Since the daemon > only responds to a single RPC target endpoint and rpc.call direct and > topic exchanges, all of that code would then need to be rewritten, or > code would need to be added to nova.manager to dispatch events sent to > the nova-compute's single RPC topic-exchange to one of the specific > processes that is responsible for a particular cluster. > > In short, a huge chunk of code would need to be refactored in order to > make Nova's worldview amenable to the design choices of certain > clustered hypervisors. That, IMHO, is not something to be taken lightly, > and not something we should even consider without a REALLY good reason. > And the use case of "Openstack is an platform and its good to provide > flexibility in it to accommodate different needs." is not a really good > reason, IMO. > > > 2.Since ESXi does not allow to run nova-compute service on the > > hypervisor unlike KVM, the service has to be run externally on a > > different server. Its easier from administration persp
Re: [openstack-dev] [nova] Manage multiple clusters using a single nova service
On 07/14/2014 04:34 AM, Vaddi, Kiran Kumar wrote: Hi, In the Juno summit, it was discussed that the existing approach of managing multiple VMware Clusters using a single nova compute service is not preferred and the approach of one nova compute service representing one cluster should be looked into. Even this is outside what I consider to be best practice for Nova, frankly. The model of scale-out inside Nova is to have a nova-compute worker responsible for only the distinct set of compute resources that are provided by a single bare metal node. Unfortunately, with the introduction of the bare-metal driver in Nova, as well as the "clustered hypervisors" like VCenter and Hyper-V, this architectural design point was shot in the head, and now it is only possible to scale the nova-compute <-> hypervisor communication layer using a scale-up model instead of a scale-out model. This is a big deal, and unfortunately, not enough discussion has been had around this, IMO. The proposed blueprint(s) around this and the code patches I've seen are moving Nova in the opposite direction it needs to go, IMHO. We would like to retain the existing approach (till we have resolved the issues) for the following reasons: 1.Even though a single service is managing all the clusters, logically it is still one compute per cluster. To the scheduler each cluster is represented as individual computes. Even in the driver each cluster is represented separately. How is this so? In Kanagaraj Manickam's proposed blueprint about this [1], the proposed implementation would fork one process for each hypervisor or cluster. However, the problem with this is that the scheduler uses the single service record for the nova-compute worker to determine whether or not the node is available to place resources on. The servicegroup API would need to be refactored (rewritten, really) to change its definition of a service to instead of being a single daemon, now being a single process running within that daemon. Since the daemon only responds to a single RPC target endpoint and rpc.call direct and topic exchanges, all of that code would then need to be rewritten, or code would need to be added to nova.manager to dispatch events sent to the nova-compute's single RPC topic-exchange to one of the specific processes that is responsible for a particular cluster. In short, a huge chunk of code would need to be refactored in order to make Nova's worldview amenable to the design choices of certain clustered hypervisors. That, IMHO, is not something to be taken lightly, and not something we should even consider without a REALLY good reason. And the use case of "Openstack is an platform and its good to provide flexibility in it to accommodate different needs." is not a really good reason, IMO. 2.Since ESXi does not allow to run nova-compute service on the hypervisor unlike KVM, the service has to be run externally on a different server. Its easier from administration perspective to manage a single service than multiple. Why can't ESXi hosts not run the nova-compute service? Is it like the XenServer driver that has a pitifully old version of Python (2.4) that constrains the code that is possible to run on it? If so, then I don't really think the poor constraints of the hypervisor dom0 should mean that Nova should change its design principles to accomodate. The XenServer driver uses custom agents to get around this issue, IIRC. Why can't the VCenter driver? 3.Every connection to vCenter uses up ~140MB in the driver. If we were to manage each cluster by an individual service the memory consumed for 32 clusters will be high (~4GB). The newer versions support 64 clusters! The fact that each connection to vCenter uses 140MB of memory is completely ridiculous. You can thank crappy SOAP for that, I believe. That said, Nova should not be changing its design principles to accommodate poor software of a driver. It raises questions on why exactly folks are even using OpenStack at all if they want to continue to use VCenter for host management, DRS, DPM, and the like. What advantage are they getting from OpenStack? If the idea is to move off of expensive VCenter-licensed clusters and on to a pure OpenStack infrastructure then, I don't see a point in supporting *more* clustered hypervisor features in the driver code at all. If the idea is to just "use what we know, don't rock the enterprise IT boat", then why use OpenStack at all? Look, I'm all for compatibility and transferability of different image formats, different underlying hypervisors, and the dream of interoperable clouds. I'm happy to see Nova support a wide variety of disk image formats and hypervisor features (note: VCenter isn't a hypervisor). I'm just do not suppor the idea that Nova needs to change its fundamental design in order to support the *design* of other host management platforms. Best, -jay [1] https://review.openstack.org/#/c/103054/ _
Re: [openstack-dev] [nova] Manage multiple clusters using a single nova service
On 14/07/14 09:34, Vaddi, Kiran Kumar wrote: > Hi, > > > > In the Juno summit, it was discussed that the existing approach of > managing multiple VMware Clusters using a single nova compute service is > not preferred and the approach of one nova compute service representing > one cluster should be looked into. > > > > We would like to retain the existing approach (till we have resolved the > issues) for the following reasons: > > > > 1. Even though a single service is managing all the clusters, > logically it is still one compute per cluster. To the scheduler each > cluster is represented as individual computes. Even in the driver each > cluster is represented separately. > > > > 2. Since ESXi does not allow to run nova-compute service on the > hypervisor unlike KVM, the service has to be run externally on a > different server. Its easier from administration perspective to manage a > single service than multiple. > > > > 3. Every connection to vCenter uses up ~140MB in the driver. If we > were to manage each cluster by an individual service the memory consumed > for 32 clusters will be high (~4GB). The newer versions support 64 clusters! > > > > 4. There are existing customer installations that use the existing > approach and therefore not enforce the new approach until it is simple > to manage and not resource intensive. > > > > If the admin wants to use one service per cluster, it can be done with > the existing driver. In the conf the admin has to specify a single > cluster instead of a list of clusters. Therefore its better to give the > admins the choice rather than enforcing one type of deployment. Does anybody recall the detail of why we wanted to remove this? There was unease over use of instance's node field in the db, but I don't recall why. Matt -- Matthew Booth Red Hat Engineering, Virtualisation Team Phone: +442070094448 (UK) GPG ID: D33C3490 GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev