Re: [openstack-dev] [nova] Manage multiple clusters using a single nova service

2014-08-07 Thread Gary Kotton
Hi,
Sorry for taking such  long time to chime in but these mails were sadly
missed. Please see my inline comments below. My original concerns for the
revert of the service were as follows:

1. What do we do about existing installation. This support was added at
the end of Havana and it is in production.
2. I had concerns regarding the way in which the image cache would be
maintained - that is each compute node has its own cache directory. So
this may have had datastore issues.

Over the last few weeks I have encountered some serious problems with the
multi VC support. This is causing production setups to break
(https://review.openstack.org/108225 is an example - this is due to
https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L3368
). This is due to the fact that the node may be updated at random places
in the nova manager code (these may be bugs - but it does not work well
with the multi cluster support). There are too many edge cases here and
the code is not robust enough.

If we do decide to go ahead with dropping the support, then we need to do
the following:
1. Upgrade path: we need to have a well defined upgrade path that will
enable an existing setup to upgrade from I to J (I do not think that we
should leave this till K as there are too many pinpoints with the node
management).
2. We need to make a few tweaks to the image cache path. My original
concern was that each compute node has its own cache directory. After
giving it some though this will be ok as long as we have each compute host
using the same cache directory. The reason for this is that the locking
for image handling is done external on the file system
(https://github.com/openstack/nova/blob/master/nova/virt/vmwareapi/vmops.py
#L319). So if we have multiple compute processes running on the same host
then we are good. In addition to this we can make use of a shared files
system and then we can have all compute nodes use the shared file system
for the locking - win win :). If anyone gets to this stage in the thread
then please see a fix for object support and aging
(https://review.openstack.org/111996 - the object updates made earlier int
he cycle caused a few problems - but I guess that the gate does not wait
24 hours to purge instances).

In short I am in favor of removing the multi cluster support but we need
to do the following:
1. Upgrade path
2. Investigate memory issues with nova compute
3. Tweak image cache path


Thanks
Gary

On 7/15/14, 11:36 AM, Matthew Booth mbo...@redhat.com wrote:

On 14/07/14 09:34, Vaddi, Kiran Kumar wrote:
 Hi,
 
  
 
 In the Juno summit, it was discussed that the existing approach of
 managing multiple VMware Clusters using a single nova compute service is
 not preferred and the approach of one nova compute service representing
 one cluster should be looked into.
 
  
 
 We would like to retain the existing approach (till we have resolved the
 issues) for the following reasons:
 
  
 
 1.   Even though a single service is managing all the clusters,
 logically it is still one compute per cluster. To the scheduler each
 cluster is represented as individual computes. Even in the driver each
 cluster is represented separately.

This is something that would not change with dropping the multi cluster
support.
The only change here is that additional processes will be running (please
see below).

 
  
 
 2.   Since ESXi does not allow to run nova-compute service on the
 hypervisor unlike KVM, the service has to be run externally on a
 different server. Its easier from administration perspective to manage a
 single service than multiple.

Yes, you have a good point here, but I think that at the end of the day we
need a robust service and that service will be managed by external tools,
for example chef, puppet etc. Unless it is a very small cloud.

  
 
 3.   Every connection to vCenter uses up ~140MB in the driver. If we
 were to manage each cluster by an individual service the memory consumed
 for 32 clusters will be high (~4GB). The newer versions support 64
clusters!

I think that this is a bug and it needs to be fixed. I understand that
this may affect a decision from today to tomorrow but it is not an
architectural issue and can be resolved (and really should be resolved
ASAP). I think that we need to open a bug for this and we should start to
investigate - fixing this will enable whoever is running a service uses
those resources elsewhere :)

 
  
 
 4.   There are existing customer installations that use the existing
 approach and therefore not enforce the new approach until it is simple
 to manage and not resource intensive.
 
  
 
 If the admin wants to use one service per cluster, it can be done with
 the existing driver. In the conf the admin has to specify a single
 cluster instead of a list of clusters. Therefore its better to give the
 admins the choice rather than enforcing one type of deployment.

This is a real pain point which we should address. I think that we 

Re: [openstack-dev] [nova] Manage multiple clusters using a single nova service

2014-07-23 Thread Vaddi, Kiran Kumar
Answers to some of your concerns

 Why can't ESXi hosts not run the nova-compute service? Is it like the
 XenServer driver that has a pitifully old version of Python (2.4) that
 constrains the code that is possible to run on it? If so, then I don't
 really think the poor constraints of the hypervisor dom0 should mean
 that Nova should change its design principles to accomodate. The
 XenServer driver uses custom agents to get around this issue, IIRC. Why
 can't the VCenter driver?

ESXi hosts are generally operated in a lock-down mode where installation of 
agents is not allowed.
All communication and tasks on the ESXi hosts must be done using vCenter.

 The fact that each connection to vCenter uses 140MB of memory is
 completely ridiculous. You can thank crappy SOAP for that, I believe.

Yes, and the problem becomes bigger if we create multiple services

 I'm just do not suppor the idea that Nova needs to
 change its fundamental design in order to support the *design* of other
 host management platforms.

The current implementation doesn't make nova change its design, the scheduling 
decisions are still done by nova.
Its only the deployment that has been changed. Agree that there are no separate 
topic-exchange queues for each cluster.

Thanks
Kiran

 -Original Message-
 From: Jay Pipes [mailto:jaypi...@gmail.com]
 Sent: Tuesday, July 22, 2014 9:30 AM
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] Manage multiple clusters using a single
 nova service
 
 On 07/14/2014 04:34 AM, Vaddi, Kiran Kumar wrote:
  Hi,
 
  In the Juno summit, it was discussed that the existing approach of
  managing multiple VMware Clusters using a single nova compute service
  is not preferred and the approach of one nova compute service
  representing one cluster should be looked into.
 
 Even this is outside what I consider to be best practice for Nova,
 frankly. The model of scale-out inside Nova is to have a nova-compute
 worker responsible for only the distinct set of compute resources that
 are provided by a single bare metal node.
 
 Unfortunately, with the introduction of the bare-metal driver in Nova,
 as well as the clustered hypervisors like VCenter and Hyper-V, this
 architectural design point was shot in the head, and now it is only
 possible to scale the nova-compute - hypervisor communication layer
 using a scale-up model instead of a scale-out model. This is a big deal,
 and unfortunately, not enough discussion has been had around this, IMO.
 
 The proposed blueprint(s) around this and the code patches I've seen are
 moving Nova in the opposite direction it needs to go, IMHO.
 
  We would like to retain the existing approach (till we have resolved
   the issues) for the following reasons:
 
  1.Even though a single service is managing all the clusters,
  logically it is still one compute per cluster. To the scheduler each
   cluster is represented as individual computes. Even in the driver
  each cluster is represented separately.
 
 How is this so? In Kanagaraj Manickam's proposed blueprint about this
 [1], the proposed implementation would fork one process for each
 hypervisor or cluster. However, the problem with this is that the
 scheduler uses the single service record for the nova-compute worker to
 determine whether or not the node is available to place resources on.
 The servicegroup API would need to be refactored (rewritten, really) to
 change its definition of a service to instead of being a single daemon,
 now being a single process running within that daemon. Since the daemon
 only responds to a single RPC target endpoint and rpc.call direct and
 topic exchanges, all of that code would then need to be rewritten, or
 code would need to be added to nova.manager to dispatch events sent to
 the nova-compute's single RPC topic-exchange to one of the specific
 processes that is responsible for a particular cluster.
 
 In short, a huge chunk of code would need to be refactored in order to
 make Nova's worldview amenable to the design choices of certain
 clustered hypervisors. That, IMHO, is not something to be taken lightly,
 and not something we should even consider without a REALLY good reason.
 And the use case of Openstack is an platform and its good to provide
 flexibility in it to accommodate different needs. is not a really good
 reason, IMO.
 
  2.Since ESXi does not allow to run nova-compute service on the
  hypervisor unlike KVM, the service has to be run externally on a
  different server. Its easier from administration perspective to
  manage a single service than multiple.
 
 Why can't ESXi hosts not run the nova-compute service? Is it like the
 XenServer driver that has a pitifully old version of Python (2.4) that
 constrains the code that is possible to run on it? If so, then I don't
 really think the poor constraints of the hypervisor dom0 should mean
 that Nova should change its design principles to accomodate. The
 XenServer driver uses custom agents

Re: [openstack-dev] [nova] Manage multiple clusters using a single nova service

2014-07-23 Thread Dan Smith
 I'm just do not suppor the idea that Nova needs to change its 
 fundamental design in order to support the *design* of other host 
 management platforms.
 
 The current implementation doesn't make nova change its design, the 
 scheduling decisions are still done by nova.

Nova's design is not just making the scheduling decisions but also
includes the deployment model, which is intended to be a single compute
service tied to a single hypervisor. I think that's important for scale
and failure isolation at least.

 Its only the deployment that has been changed. Agree that there are 
 no separate topic-exchange queues for each cluster.

I'm definitely with Jay here: I want to get away from hiding larger
systems behind a single compute host/service.

--Dan



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Manage multiple clusters using a single nova service

2014-07-21 Thread Jay Pipes

On 07/14/2014 04:34 AM, Vaddi, Kiran Kumar wrote:

Hi,

In the Juno summit, it was discussed that the existing approach of
managing multiple VMware Clusters using a single nova compute service
is not preferred and the approach of one nova compute service
representing one cluster should be looked into.


Even this is outside what I consider to be best practice for Nova,
frankly. The model of scale-out inside Nova is to have a nova-compute
worker responsible for only the distinct set of compute resources that
are provided by a single bare metal node.

Unfortunately, with the introduction of the bare-metal driver in Nova,
as well as the clustered hypervisors like VCenter and Hyper-V, this
architectural design point was shot in the head, and now it is only
possible to scale the nova-compute - hypervisor communication layer
using a scale-up model instead of a scale-out model. This is a big deal,
and unfortunately, not enough discussion has been had around this, IMO.

The proposed blueprint(s) around this and the code patches I've seen are
moving Nova in the opposite direction it needs to go, IMHO.


We would like to retain the existing approach (till we have resolved
 the issues) for the following reasons:

1.Even though a single service is managing all the clusters,
logically it is still one compute per cluster. To the scheduler each
 cluster is represented as individual computes. Even in the driver
each cluster is represented separately.


How is this so? In Kanagaraj Manickam's proposed blueprint about this
[1], the proposed implementation would fork one process for each
hypervisor or cluster. However, the problem with this is that the
scheduler uses the single service record for the nova-compute worker to
determine whether or not the node is available to place resources on.
The servicegroup API would need to be refactored (rewritten, really) to
change its definition of a service to instead of being a single daemon,
now being a single process running within that daemon. Since the daemon
only responds to a single RPC target endpoint and rpc.call direct and
topic exchanges, all of that code would then need to be rewritten, or
code would need to be added to nova.manager to dispatch events sent to
the nova-compute's single RPC topic-exchange to one of the specific
processes that is responsible for a particular cluster.

In short, a huge chunk of code would need to be refactored in order to
make Nova's worldview amenable to the design choices of certain
clustered hypervisors. That, IMHO, is not something to be taken lightly,
and not something we should even consider without a REALLY good reason.
And the use case of Openstack is an platform and its good to provide
flexibility in it to accommodate different needs. is not a really good
reason, IMO.


2.Since ESXi does not allow to run nova-compute service on the
hypervisor unlike KVM, the service has to be run externally on a
different server. Its easier from administration perspective to
manage a single service than multiple.


Why can't ESXi hosts not run the nova-compute service? Is it like the
XenServer driver that has a pitifully old version of Python (2.4) that
constrains the code that is possible to run on it? If so, then I don't
really think the poor constraints of the hypervisor dom0 should mean
that Nova should change its design principles to accomodate. The
XenServer driver uses custom agents to get around this issue, IIRC. Why
can't the VCenter driver?


3.Every connection to vCenter uses up ~140MB in the driver. If we
were to manage each cluster by an individual service the memory
consumed for 32 clusters will be high (~4GB). The newer versions
support 64 clusters!


The fact that each connection to vCenter uses 140MB of memory is
completely ridiculous. You can thank crappy SOAP for that, I believe.

That said, Nova should not be changing its design principles to
accommodate poor software of a driver.

It raises questions on why exactly folks are even using OpenStack at all
if they want to continue to use VCenter for host management, DRS, DPM,
and the like.

What advantage are they getting from OpenStack?

If the idea is to move off of expensive VCenter-licensed clusters and on
to a pure OpenStack infrastructure then, I don't see a point in
supporting *more* clustered hypervisor features in the driver code at
all. If the idea is to just use what we know, don't rock the enterprise
IT boat, then why use OpenStack at all?

Look, I'm all for compatibility and transferability of different image
formats, different underlying hypervisors, and the dream of
interoperable clouds. I'm happy to see Nova support a wide variety of
disk image formats and hypervisor features (note: VCenter isn't a
hypervisor). I'm just do not suppor the idea that Nova needs to
change its fundamental design in order to support the *design* of other
host management platforms.

Best,
-jay

[1] https://review.openstack.org/#/c/103054/

___

Re: [openstack-dev] [nova] Manage multiple clusters using a single nova service

2014-07-15 Thread Matthew Booth
On 14/07/14 09:34, Vaddi, Kiran Kumar wrote:
 Hi,
 
  
 
 In the Juno summit, it was discussed that the existing approach of
 managing multiple VMware Clusters using a single nova compute service is
 not preferred and the approach of one nova compute service representing
 one cluster should be looked into.
 
  
 
 We would like to retain the existing approach (till we have resolved the
 issues) for the following reasons:
 
  
 
 1.   Even though a single service is managing all the clusters,
 logically it is still one compute per cluster. To the scheduler each
 cluster is represented as individual computes. Even in the driver each
 cluster is represented separately.
 
  
 
 2.   Since ESXi does not allow to run nova-compute service on the
 hypervisor unlike KVM, the service has to be run externally on a
 different server. Its easier from administration perspective to manage a
 single service than multiple.
 
  
 
 3.   Every connection to vCenter uses up ~140MB in the driver. If we
 were to manage each cluster by an individual service the memory consumed
 for 32 clusters will be high (~4GB). The newer versions support 64 clusters!
 
  
 
 4.   There are existing customer installations that use the existing
 approach and therefore not enforce the new approach until it is simple
 to manage and not resource intensive.
 
  
 
 If the admin wants to use one service per cluster, it can be done with
 the existing driver. In the conf the admin has to specify a single
 cluster instead of a list of clusters. Therefore its better to give the
 admins the choice rather than enforcing one type of deployment.

Does anybody recall the detail of why we wanted to remove this? There
was unease over use of instance's node field in the db, but I don't
recall why.

Matt

-- 
Matthew Booth
Red Hat Engineering, Virtualisation Team

Phone: +442070094448 (UK)
GPG ID:  D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev