Re: [openstack-dev] [nova] Questions about guest NUMA and memory binding policies
Hi Liuji, I'm the owner of bp support-libvirt-vcpu-topology, There are four main reasons that I did not continue to work on it: 1. the design proposal has not confirmed by core developers of nova 2. this bp is not accepted in Icehouse development stage 3. Daniel expects that this bp should consider together with the other one numa-aware-cpu-binding, but I have no idea to do this for now 4. I have no enough time to do this at this moment 2014-03-05 Wangpan 发件人:"Liuji (Jeremy)" 发送时间:2014-03-05 15:02 主题:Re: [openstack-dev] [nova] Questions about guest NUMA and memory binding policies 收件人:"OpenStack Development Mailing List (not for usage questions)" 抄送:"Luohao \(brian\)","Yuanjing \(D\)" Hi Steve, Thanks for your reply. I didn't know why the blueprint numa-aware-cpu-binding seems to have no more progress until read the two mails mentioned in your mail. The use case analysis in the mails are very clear, they are also what I concern about. I agree that we shouldn't provide pCPU/vCPU mapping for the ending user and how to provide them for the user need more consideration. The use cases I concern more are the pCPU's exclusively use(pCPU:vCPU=1:1) and the guest numa. Thanks, Jeremy Liu > -Original Message- > From: Steve Gordon [mailto:sgor...@redhat.com] > Sent: Tuesday, March 04, 2014 10:29 AM > To: OpenStack Development Mailing List (not for usage questions) > Cc: Luohao (brian); Yuanjing (D) > Subject: Re: [openstack-dev] [nova] Questions about guest NUMA and memory > binding policies > > - Original Message - > > Hi, all > > > > I search the current blueprints and old mails in the mail list, but > > find nothing about Guest NUMA and setting memory binding policies. > > I just find a blueprint about vcpu topology and a blueprint about CPU > > binding. > > > > https://blueprints.launchpad.net/nova/+spec/support-libvirt-vcpu-topol > > ogy https://blueprints.launchpad.net/nova/+spec/numa-aware-cpu-binding > > > > Is there any plan for the guest NUMA and memory binding policies setting? > > > > Thanks, > > Jeremy Liu > > Hi Jeremy, > > As you've discovered there have been a few attempts at getting some work > started in this area. Dan Berrange outlined some of the possibilities in this > area > in a previous mailing list post [1] though it's multi-faceted, there are a > lot of > different ways to break it down. If you dig into the details you will note > that the > support-libvirt-vcpu-topology blueprint in particular got a fair way along > but > there were some concerns noted in the code reviews and on the list [2] around > the design. > > It seems like this is an area that there is a decent amount of interest in > and we > should work on list to flesh out a design proposal, ideally this would be > presented for further discussion at the Juno design summit. What are your > particular needs/desires from a NUMA aware nova scheduler? > > Thanks, > > Steve > > [1] > http://lists.openstack.org/pipermail/openstack-dev/2013-November/019715.h > tml > [2] > http://lists.openstack.org/pipermail/openstack-dev/2013-December/022940.h > tml > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Should we limit the disk IO bandwidth in copy_image while creating new instance?
Hi yunhong, I agree with you of the taking I/O bandwidth as a resource, but it may be not so easy to implement. Your another thinking about the launch time may be not so terrible, only the first boot it will be affected. 2014-02-17 Wangpan 发件人:yunhong jiang 发送时间:2014-02-15 08:21 主题:Re: [openstack-dev] [nova] Should we limit the disk IO bandwidth in copy_image while creating new instance? 收件人:"OpenStack Development Mailing List (not for usage questions)" 抄送: On Fri, 2014-02-14 at 10:22 +0100, Sylvain Bauza wrote: > Instead of limitating the consumed bandwidth by proposiong a > configuration flag (yet another one, and which default value to be > set ?), I would propose to only decrease the niceness of the process > itself, so that other processes would get first the I/O access. > That's not perfect I assume, but that's a quick workaround limitating > the frustration. > > > -Sylvain > Decrease goodness is good for a short term, Some small concerns are, will that cause long launch time if the host is I/O intensive? And if launch time is billed also, then not fair for the new instance also. I think the ideal world is I/O QoS like through cgroup, take I/O bandwidth as a resource, and take the copy_image as an consumption of the I/O bandwidth resource. Thanks --jyh > > 2014-02-14 4:52 GMT+01:00 Wangpan : > Currently nova doesn't limit the disk IO bandwidth in > copy_image() method while creating a new instance, so the > other instances on this host may be affected by this high disk > IO consuming operation, and some time-sensitive business(e.g > RDS instance with heartbeat) may be switched between master > and slave. > > So can we use the `rsync --bwlimit=${bandwidth} src dst` > command instead of `cp src dst` while copy_image in > create_image() of libvirt driver, the remote image copy > operation also can be limited by `rsync --bwlimit= > ${bandwidth}` or `scp -l=${bandwidth}`, this parameter > ${bandwidth} can be a new configuration in nova.conf which > allow cloud admin to config it, it's default value is 0 which > means no limitation, then the instances on this host will be > not affected while a new instance with not cached image is > creating. > > the example codes: > nova/virt/libvit/utils.py: > diff --git a/nova/virt/libvirt/utils.py > b/nova/virt/libvirt/utils.py > index e926d3d..5d7c935 100644 > --- a/nova/virt/libvirt/utils.py > +++ b/nova/virt/libvirt/utils.py > @@ -473,7 +473,10 @@ def copy_image(src, dest, host=None): > # sparse files. I.E. holes will not be written to > DEST, > # rather recreated efficiently. In addition, since > # coreutils 8.11, holes can be read efficiently too. > -execute('cp', src, dest) > +if CONF.mbps_in_copy_image > 0: > +execute('rsync', '--bwlimit=%s' % > CONF.mbps_in_copy_image * 1024, src, dest) > +else: > +execute('cp', src, dest) > else: > dest = "%s:%s" % (host, dest) > # Try rsync first as that can compress and create > sparse dest files. > @@ -484,11 +487,22 @@ def copy_image(src, dest, host=None): > # Do a relatively light weight test first, so > that we > # can fall back to scp, without having run out of > space > # on the destination for example. > -execute('rsync', '--sparse', '--compress', > '--dry-run', src, dest) > +if CONF.mbps_in_copy_image > 0: > +execute('rsync', '--sparse', '--compress', > '--dry-run', > +'--bwlimit=%s' % > CONF.mbps_in_copy_image * 1024, src, dest) > +else: > +execute('rsync', '--sparse', '--compress', > '--dry-run', src, dest) > except processutils.ProcessExecutionError: > -execute('scp', src, dest) > +if CONF.mbps_in_copy_image > 0: > +execute('scp', '-l', '%s' % > CONF.mbps_in_copy_image * 1024
Re: [openstack-dev] [nova] Should we limit the disk IO bandwidth in copy_image while creating new instance?
Hi sahid, I have tested `scp -l xxx src dst` (local scp copy) and believe that the `-l` option is invalid in this situation, it seems that `-l` only valid in remote copy. 2014-02-17 Wangpan 发件人:sahid 发送时间:2014-02-14 17:58 主题:Re: [openstack-dev] [nova] Should we limit the disk IO bandwidth in copy_image while creating new instance? 收件人:"OpenStack Development Mailing List (not for usage questions)" 抄送: It could be a good idea but as Sylvain said how to configure this? Then, what about using scp instead of rsync for a local copy? - Original Message - From: "Wangpan" To: "OpenStack Development Mailing List" Sent: Friday, February 14, 2014 4:52:20 AM Subject: [openstack-dev] [nova] Should we limit the disk IO bandwidth in copy_image while creating new instance? Currently nova doesn't limit the disk IO bandwidth in copy_image() method while creating a new instance, so the other instances on this host may be affected by this high disk IO consuming operation, and some time-sensitive business(e.g RDS instance with heartbeat) may be switched between master and slave. So can we use the `rsync --bwlimit=${bandwidth} src dst` command instead of `cp src dst` while copy_image in create_image() of libvirt driver, the remote image copy operation also can be limited by `rsync --bwlimit=${bandwidth}` or `scp -l=${bandwidth}`, this parameter ${bandwidth} can be a new configuration in nova.conf which allow cloud admin to config it, it's default value is 0 which means no limitation, then the instances on this host will be not affected while a new instance with not cached image is creating. the example codes: nova/virt/libvit/utils.py: diff --git a/nova/virt/libvirt/utils.py b/nova/virt/libvirt/utils.py index e926d3d..5d7c935 100644 --- a/nova/virt/libvirt/utils.py +++ b/nova/virt/libvirt/utils.py @@ -473,7 +473,10 @@ def copy_image(src, dest, host=None): # sparse files. I.E. holes will not be written to DEST, # rather recreated efficiently. In addition, since # coreutils 8.11, holes can be read efficiently too. -execute('cp', src, dest) +if CONF.mbps_in_copy_image > 0: +execute('rsync', '--bwlimit=%s' % CONF.mbps_in_copy_image * 1024, src, dest) +else: +execute('cp', src, dest) else: dest = "%s:%s" % (host, dest) # Try rsync first as that can compress and create sparse dest files. @@ -484,11 +487,22 @@ def copy_image(src, dest, host=None): # Do a relatively light weight test first, so that we # can fall back to scp, without having run out of space # on the destination for example. -execute('rsync', '--sparse', '--compress', '--dry-run', src, dest) +if CONF.mbps_in_copy_image > 0: +execute('rsync', '--sparse', '--compress', '--dry-run', +'--bwlimit=%s' % CONF.mbps_in_copy_image * 1024, src, dest) +else: +execute('rsync', '--sparse', '--compress', '--dry-run', src, dest) except processutils.ProcessExecutionError: -execute('scp', src, dest) +if CONF.mbps_in_copy_image > 0: +execute('scp', '-l', '%s' % CONF.mbps_in_copy_image * 1024 * 8, src, dest) +else: +execute('scp', src, dest) else: -execute('rsync', '--sparse', '--compress', src, dest) +if CONF.mbps_in_copy_image > 0: +execute('rsync', '--sparse', '--compress', +'--bwlimit=%s' % CONF.mbps_in_copy_image * 1024, src, dest) +else: +execute('rsync', '--sparse', '--compress', src, dest) 2014-02-14 Wangpan ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Should we limit the disk IO bandwidth in copy_image while creating new instance?
Hi Sylvain, The default value can be set to 0 ro -1, which means no limitation, I think decreasing the niceness of nova-compute or cp/rsync/scp will also need a configation, because we cann't decrease it manully while the copy_image is running. My another consideration is only decreasing the niceness of process will have bad effect, I have tested by `nice 19 cp src dst` and also `ionice -c 2 cp src dst`, the IO utils and bandwidth consumation seems to be not decreased as before. 2014-02-17 Wangpan 发件人:Sylvain Bauza 发送时间:2014-02-14 17:22 主题:Re: [openstack-dev] [nova] Should we limit the disk IO bandwidth in copy_image while creating new instance? 收件人:"OpenStack Development Mailing List (not for usage questions)" 抄送: Instead of limitating the consumed bandwidth by proposiong a configuration flag (yet another one, and which default value to be set ?), I would propose to only decrease the niceness of the process itself, so that other processes would get first the I/O access. That's not perfect I assume, but that's a quick workaround limitating the frustration. -Sylvain 2014-02-14 4:52 GMT+01:00 Wangpan : Currently nova doesn't limit the disk IO bandwidth in copy_image() method while creating a new instance, so the other instances on this host may be affected by this high disk IO consuming operation, and some time-sensitive business(e.g RDS instance with heartbeat) may be switched between master and slave. So can we use the `rsync --bwlimit=${bandwidth} src dst` command instead of `cp src dst` while copy_image in create_image() of libvirt driver, the remote image copy operation also can be limited by `rsync --bwlimit=${bandwidth}` or `scp -l=${bandwidth}`, this parameter ${bandwidth} can be a new configuration in nova.conf which allow cloud admin to config it, it's default value is 0 which means no limitation, then the instances on this host will be not affected while a new instance with not cached image is creating. the example codes: nova/virt/libvit/utils.py: diff --git a/nova/virt/libvirt/utils.py b/nova/virt/libvirt/utils.py index e926d3d..5d7c935 100644 --- a/nova/virt/libvirt/utils.py +++ b/nova/virt/libvirt/utils.py @@ -473,7 +473,10 @@ def copy_image(src, dest, host=None): # sparse files. I.E. holes will not be written to DEST, # rather recreated efficiently. In addition, since # coreutils 8.11, holes can be read efficiently too. -execute('cp', src, dest) +if CONF.mbps_in_copy_image > 0: +execute('rsync', '--bwlimit=%s' % CONF.mbps_in_copy_image * 1024, src, dest) +else: +execute('cp', src, dest) else: dest = "%s:%s" % (host, dest) # Try rsync first as that can compress and create sparse dest files. @@ -484,11 +487,22 @@ def copy_image(src, dest, host=None): # Do a relatively light weight test first, so that we # can fall back to scp, without having run out of space # on the destination for example. -execute('rsync', '--sparse', '--compress', '--dry-run', src, dest) +if CONF.mbps_in_copy_image > 0: +execute('rsync', '--sparse', '--compress', '--dry-run', +'--bwlimit=%s' % CONF.mbps_in_copy_image * 1024, src, dest) +else: +execute('rsync', '--sparse', '--compress', '--dry-run', src, dest) except processutils.ProcessExecutionError: -execute('scp', src, dest) +if CONF.mbps_in_copy_image > 0: +execute('scp', '-l', '%s' % CONF.mbps_in_copy_image * 1024 * 8, src, dest) +else: +execute('scp', src, dest) else: -execute('rsync', '--sparse', '--compress', src, dest) +if CONF.mbps_in_copy_image > 0: +execute('rsync', '--sparse', '--compress', +'--bwlimit=%s' % CONF.mbps_in_copy_image * 1024, src, dest) +else: +execute('rsync', '--sparse', '--compress', src, dest) 2014-02-14 Wangpan ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] Should we limit the disk IO bandwidth in copy_image while creating new instance?
Currently nova doesn't limit the disk IO bandwidth in copy_image() method while creating a new instance, so the other instances on this host may be affected by this high disk IO consuming operation, and some time-sensitive business(e.g RDS instance with heartbeat) may be switched between master and slave. So can we use the `rsync --bwlimit=${bandwidth} src dst` command instead of `cp src dst` while copy_image in create_image() of libvirt driver, the remote image copy operation also can be limited by `rsync --bwlimit=${bandwidth}` or `scp -l=${bandwidth}`, this parameter ${bandwidth} can be a new configuration in nova.conf which allow cloud admin to config it, it's default value is 0 which means no limitation, then the instances on this host will be not affected while a new instance with not cached image is creating. the example codes: nova/virt/libvit/utils.py: diff --git a/nova/virt/libvirt/utils.py b/nova/virt/libvirt/utils.py index e926d3d..5d7c935 100644 --- a/nova/virt/libvirt/utils.py +++ b/nova/virt/libvirt/utils.py @@ -473,7 +473,10 @@ def copy_image(src, dest, host=None): # sparse files. I.E. holes will not be written to DEST, # rather recreated efficiently. In addition, since # coreutils 8.11, holes can be read efficiently too. -execute('cp', src, dest) +if CONF.mbps_in_copy_image > 0: +execute('rsync', '--bwlimit=%s' % CONF.mbps_in_copy_image * 1024, src, dest) +else: +execute('cp', src, dest) else: dest = "%s:%s" % (host, dest) # Try rsync first as that can compress and create sparse dest files. @@ -484,11 +487,22 @@ def copy_image(src, dest, host=None): # Do a relatively light weight test first, so that we # can fall back to scp, without having run out of space # on the destination for example. -execute('rsync', '--sparse', '--compress', '--dry-run', src, dest) +if CONF.mbps_in_copy_image > 0: +execute('rsync', '--sparse', '--compress', '--dry-run', +'--bwlimit=%s' % CONF.mbps_in_copy_image * 1024, src, dest) +else: +execute('rsync', '--sparse', '--compress', '--dry-run', src, dest) except processutils.ProcessExecutionError: -execute('scp', src, dest) +if CONF.mbps_in_copy_image > 0: +execute('scp', '-l', '%s' % CONF.mbps_in_copy_image * 1024 * 8, src, dest) +else: +execute('scp', src, dest) else: -execute('rsync', '--sparse', '--compress', src, dest) +if CONF.mbps_in_copy_image > 0: +execute('rsync', '--sparse', '--compress', +'--bwlimit=%s' % CONF.mbps_in_copy_image * 1024, src, dest) +else: +execute('rsync', '--sparse', '--compress', src, dest) 2014-02-14 Wangpan___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Blueprint: standard specification of guest CPU topology
Hi Chet, I have read this patch which may be commited by your workmate https://review.openstack.org/#/c/63254 and I have a question to ask: Case 1: An user want to build a 8vcpu instance, there may have seven flavors with 8vcpu which have different topology extra specs: 1s*4c*2t (s=sockets, c=cores, t=threads) 1s*8c*1t 2s*4c*1t 2s*2c*2t 4s*2c*1t 4s*1c*2t 8s*1c*1t if the user is building this instance manaually, such as CLI or horizon, he know the supported topology of image, and can choose the flavor/topology he really want(eg. 2s*4c*1t), but if the user is building instance through nova RESTful api from another service such as heat, which flavor should be chosen and howt to choose? one more serious problem is that, even the user is a real people, he may don't know how to choose a flavor with best topology. We should choose the `better` topology, may not be the best one, for all users if he want(set the topology in image property), otherwise they use the default one(vcpu num=socket num). 2014-01-17 Wangpan 发件人:Chet Burgess 发送时间:2013-12-22 07:28 主题:Re: [openstack-dev] [Nova] Blueprint: standard specification of guest CPU topology 收件人:"OpenStack Development Mailing List (not for usage questions)" 抄送: After reading up on the proposed design I have some concerns, primarily around the use of image properties to represent the topology. While I see the relationship between images and CPU topology (as referenced in the wiki Windows licenses and its restrictions on sockets being a prime example) it seems very confusing to be defining information about the CPU topology in 2 places. Flavors already define a maximal number of CPUs that can be allocated and all scheduling decisions related to CPU today use the value of VCPU specified by the flavor. I foresee the following operational issues with having these split: Having CPU topology restrictions in the image may lead to the inability to resize VMs to take advantage of additional compute power. Its not uncommon in enterprise deployments for VMs to be resized as the need for the services running on the VM increases. If the image is defining a portion of the topology then resizing a VM may result in an incompatible topology or a sub-optimial topology. This could lead to resizes requiring a rebuild of the VM. A single image may have a number of valid CPU topologies. Work would have to be done to allow the user to select which topology they wanted or images would have to be duplicated multiple times just to specify alternate, valid CPU topologies. The flavor should specify the CPU topology as well as the maximum VCPU count. This should allow resizes to work with minimal change and it avoids the need for complex selection logic from multiple valid topologies, or duplication of images. Additionally, the path of least resistance is to simply represent this as extra_specs on the flavor. Finally extra_specs has the benefit of already being fully supported by the CLI and Horizon. Images would still need the ability to specify restrictions on the topology. It should be fairly easy to enhance the existing core filter of the scheduler to handle the basic compatibility checks required to validate that a a given image and flavor are compatible (Note: I suspect this has to occur regardless of the implementation as having the image specify the topology could still lead to incompatible combinations). Adding restrictions -- Chet Burgess Vice President, Engineering | Metacloud, Inc. Email: c...@metacloud.com | Tel: 855-638-2256, Ext. 2428 On Nov 19, 2013, at 4:15 , Daniel P. Berrange wrote: For attention of maintainers of Nova virt drivers A while back there was a bug requesting the ability to set the CPU topology (sockets/cores/threads) for guests explicitly https://bugs.launchpad.net/nova/+bug/1199019 I countered that setting explicit topology doesn't play well with booting images with a variety of flavours with differing vCPU counts. This led to the following change which used an image property to express maximum constraints on CPU topology (max-sockets/max-cores/ max-threads) which the libvirt driver will use to figure out the actual topology (sockets/cores/threads) https://review.openstack.org/#/c/56510/ I believe this is a prime example of something we must co-ordinate across virt drivers to maximise happiness of our users. There's a blueprint but I find the description rather hard to follow https://blueprints.launchpad.net/nova/+spec/support-libvirt-vcpu-topology So I've created a standalone wiki page which I hope describes the idea more clearly https://wiki.openstack.org/wiki/VirtDriverGuestCPUTopology Launchpad doesn't let me link the URL to the blueprint since I'm not the blurprint creator :-( Anyway this mail is to solicit input on the proposed standard way to express this which is hypervisor portable and the addition of some shared code for doing the calcu
Re: [openstack-dev] [Gating-Failures] Docs creation is failing
+1 http://logs.openstack.org/10/61310/2/check/gate-nova-docs/e4ca63f/console.html 2013-12-11 Wangpan 发件人:Gary Kotton 发送时间:2013-12-11 15:22 主题:[openstack-dev] [Gating-Failures] Docs creation is failing 收件人:"OpenStack Development Mailing List (not for usage questions)" 抄送: Hi, An example for this is: http://logs.openstack.org/94/59994/10/check/gate-nova-docs/b0f3910/console.html Any ideas? Thanks Gary___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Blueprint: standard specification of guest CPU topology
Hi Daniel, Thanks for your help in advance. I have read your wiki page and it explains this issue very clearly. But I have a question about the 'technical design', you give us a prototype method as below: def get_guest_cpu_topology(self, inst_type, image, preferred_topology, mandatory_topology): my question is that, how/where we can get these two parameters 'preferred_topology, mandatory_topology'? from the nova config file? or get from the hypervisor? Thanks again. 2013-11-20 Wangpan 发件人:"Daniel P. Berrange" 发送时间:2013-11-19 20:15 主题:[openstack-dev] [Nova] Blueprint: standard specification of guest CPU topology 收件人:"openstack-dev" 抄送: For attention of maintainers of Nova virt drivers A while back there was a bug requesting the ability to set the CPU topology (sockets/cores/threads) for guests explicitly https://bugs.launchpad.net/nova/+bug/1199019 I countered that setting explicit topology doesn't play well with booting images with a variety of flavours with differing vCPU counts. This led to the following change which used an image property to express maximum constraints on CPU topology (max-sockets/max-cores/ max-threads) which the libvirt driver will use to figure out the actual topology (sockets/cores/threads) https://review.openstack.org/#/c/56510/ I believe this is a prime example of something we must co-ordinate across virt drivers to maximise happiness of our users. There's a blueprint but I find the description rather hard to follow https://blueprints.launchpad.net/nova/+spec/support-libvirt-vcpu-topology So I've created a standalone wiki page which I hope describes the idea more clearly https://wiki.openstack.org/wiki/VirtDriverGuestCPUTopology Launchpad doesn't let me link the URL to the blueprint since I'm not the blurprint creator :-( Anyway this mail is to solicit input on the proposed standard way to express this which is hypervisor portable and the addition of some shared code for doing the calculations which virt driver impls can just all into rather than re-inventing I'm looking for buy-in to the idea from the maintainers of each virt driver that this conceptual approach works for them, before we go merging anything with the specific impl for libvirt. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] How to fix the race condition issue between deleting and soft reboot?
Hi John, I agree with you about that 'terminate should be able to happen at any time.' And I have checked the terminate_instance and reboot_instance method in compute manager, but they just log a warning while catching the InstanceNotFound exception, nothing else. And last but not least, I believe that the InstanceNotFound exception should not be raised while this race condition occurs, because the bug https://bugs.launchpad.net/nova/+bug/1246181 just resulting in an instance going to be a 'running deleted' one after termination. I want to re-explain the race condition here: 1. soft reboot an instance 2. the 'soft reboot' thread waits for the instance becoming 'shutdown' state(this may be a long period if the instance doesn't install acpid service) 3. terminate the instance during step #2 4. the 'terminate' thread waits for the instance becoming 'shutdown' state through a endless loopingcall '_wait_for_destroy', too 4. if the 'soft reboot' thread finds the instance becomed to 'shutdown' state firstly, and re-create/restart it before the 'terminate' thread, then the instance will stick to 'deleting' status and couldn't be deleted again, because the 'terminate' thread lost itself in the endless loopingcall '_wait_for_destroy'(the instance will never become to 'shutdown' state) and the lock in 'terminate_instance', this is the bug https://bugs.launchpad.net/nova/+bug/213 which has been fixed. 5. and on the other hand, if the 'terminate' thread finds the instance becoming to 'shutdown' state firstly in the loopingcall '_wait_for_destroy', and the 'soft reboot' thread re-create/restart it just before the 'terminate' thread deletes the instance files in the instance dir(disk, disk.local, libvirt.xml, console.log and so on), then the 'terminate' thread finishes successfully, the instance is deleted in the nova db, but it is still running in the hypervisor, this is the bug I want to fix this time https://bugs.launchpad.net/nova/+bug/1246181, you can find my FIXME comment here: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L924 2013-11-12 Wangpan 发件人:John Garbutt 发送时间:2013-11-11 20:11 主题:Re: [openstack-dev] [nova] How to fix the race condition issue between deleting and soft reboot? 收件人:"OpenStack Development Mailing List (not for usage questions)" 抄送: It seems we still agreed that terminate should be able to happen at any time. I thought I remembered some code in the manager that treats InstanceNotFound errors differently. I would rather we ensure InstanceNotFound is raised to indicate we have hit this race condition, and let the compute manager unify how we deal with that across all sorts of operations. John On 11 November 2013 02:57, Wangpan wrote: > Hi all, > > I want to re-ask this problem after the Hongkong summit, you may have time > to discuss this issue now. > Thanks a lot! > > 2013-11-11 > > Wangpan > > 发件人:"Wangpan" > 发送时间:2013-11-04 12:08 > 主题:[openstack-dev] [nova] How to fix the race condition issue between > deleting and soft reboot? > 收件人:"OpenStack Development Mailing List (not for usage > questions)" > 抄送: > > Hi all, > > I have a question about fixing a race condition issue between deleting and > soft reboot, > the issue is that: > 1. If we soft reboot an instance, and then delete it, the instance may not > be deleted and stand on deleting task state, this is because the bug below, > https://bugs.launchpad.net/nova/+bug/213 > and I have fixed this bug yet several months ago(just for libvirt driver). > 2. The other issue is, if the instance is rebooted just before deleting the > files under instance dir, then it may become to a running deleted one, and > this bug is at below: > https://bugs.launchpad.net/nova/+bug/1246181 > I want to fix it now, and I need your advice. > The commit is here: https://review.openstack.org/#/c/54477/ , you can post > your advice on gerrit or mail to me. > > The ways to fix bug #2 may be these(just for libvirt driver in my mind): > 1. Add a lock to reboot operation like the deleting operation, so the reboot > operation and the delete operation will be done in sequence. > But on the other hand, the soft reboot operation may cost 120s if the > instance doesn't support graceful shutdown, I think it is too long for a > user to delete an instance, so this may not be the best way. > 2. Check the instance state at the last of _cleanup method in libvirt > driver, and if it is still running, destroy it aga
Re: [openstack-dev] [nova] How to fix the race condition issue between deleting and soft reboot?
Hi all, I want to re-ask this problem after the Hongkong summit, you may have time to discuss this issue now. Thanks a lot! 2013-11-11 Wangpan 发件人:"Wangpan" 发送时间:2013-11-04 12:08 主题:[openstack-dev] [nova] How to fix the race condition issue between deleting and soft reboot? 收件人:"OpenStack Development Mailing List (not for usage questions)" 抄送: Hi all, I have a question about fixing a race condition issue between deleting and soft reboot, the issue is that: 1. If we soft reboot an instance, and then delete it, the instance may not be deleted and stand on deleting task state, this is because the bug below, https://bugs.launchpad.net/nova/+bug/213 and I have fixed this bug yet several months ago(just for libvirt driver). 2. The other issue is, if the instance is rebooted just before deleting the files under instance dir, then it may become to a running deleted one, and this bug is at below: https://bugs.launchpad.net/nova/+bug/1246181 I want to fix it now, and I need your advice. The commit is here: https://review.openstack.org/#/c/54477/ , you can post your advice on gerrit or mail to me. The ways to fix bug #2 may be these(just for libvirt driver in my mind): 1. Add a lock to reboot operation like the deleting operation, so the reboot operation and the delete operation will be done in sequence. But on the other hand, the soft reboot operation may cost 120s if the instance doesn't support graceful shutdown, I think it is too long for a user to delete an instance, so this may not be the best way. 2. Check the instance state at the last of _cleanup method in libvirt driver, and if it is still running, destroy it again. This way is usable but both Nikola Dipanov and I don't like this 'ugly' way. 3. Check the instance vm state and task state in nova db before booting in reboot, if it is deleted/deleting, stop the reboot process, this will access db at driver level, it is a 'ugly' way, too. Nikola suggests that 'maybe we can leverage task/vm states and refactor how reboot is done so we can back out of a reboot on a delete', but I think we should let user delete an instance at any time and any state, so the delete operation during 'soft reboot' may not be forbidden. Thanks and waiting for your voice! 2013-11-04 Wangpan___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] How to fix the race condition issue between deleting and soft reboot?
Hi all, I have a question about fixing a race condition issue between deleting and soft reboot, the issue is that: 1. If we soft reboot an instance, and then delete it, the instance may not be deleted and stand on deleting task state, this is because the bug below, https://bugs.launchpad.net/nova/+bug/213 and I have fixed this bug yet several months ago(just for libvirt driver). 2. The other issue is, if the instance is rebooted just before deleting the files under instance dir, then it may become to a running deleted one, and this bug is at below: https://bugs.launchpad.net/nova/+bug/1246181 I want to fix it now, and I need your advice. The commit is here: https://review.openstack.org/#/c/54477/ , you can post your advice on gerrit or mail to me. The ways to fix bug #2 may be these(just for libvirt driver in my mind): 1. Add a lock to reboot operation like the deleting operation, so the reboot operation and the delete operation will be done in sequence. But on the other hand, the soft reboot operation may cost 120s if the instance doesn't support graceful shutdown, I think it is too long for a user to delete an instance, so this may not be the best way. 2. Check the instance state at the last of _cleanup method in libvirt driver, and if it is still running, destroy it again. This way is usable but both Nikola Dipanov and I don't like this 'ugly' way. 3. Check the instance vm state and task state in nova db before booting in reboot, if it is deleted/deleting, stop the reboot process, this will access db at driver level, it is a 'ugly' way, too. Nikola suggests that 'maybe we can leverage task/vm states and refactor how reboot is done so we can back out of a reboot on a delete', but I think we should let user delete an instance at any time and any state, so the delete operation during 'soft reboot' may not be forbidden. Thanks and waiting for your voice! 2013-11-04 Wangpan___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] [Libvirt] Virtio-Serial support for Nova libvirt driver
Hi all, I'm the owner of this bp https://blueprints.launchpad.net/nova/+spec/qemu-guest-agent-support and Daniel Berrange gave me lots of help about implementing this bp, and the original idea of mine is the same as yours. So I think the opinion of Daniel will be very useful. 2013-09-25 Wangpan 发件人:balaji patnala 发送时间:2013-09-25 22:36 主题:Re: [openstack-dev] [Nova] [Libvirt] Virtio-Serial support for Nova libvirt driver 收件人:"OpenStack Development Mailing List" 抄送: Hi Haomai, Thanks for your interest on this. The code check-ins done against the below bp are more specific to Qemu Guest Agent. https://blueprints.launchpad.net/nova/+spec/qemu-guest-agent-support Our requirement is to enable Virtio-Serial Interface to the applications running in VM. Do you have the same requirement? We will share the draft BP on this. Any comments on this approach will be helpful. Regards, Balaji.P On Tue, Sep 24, 2013 at 8:10 PM, Haomai Wang wrote: On Sep 24, 2013, at 6:40 PM, P Balaji-B37839 wrote: > Hi, > > Virtio-Serial interface support for Nova - Libvirt is not available now. Some > VMs who wants to access the Host may need like running qemu-guest-agent or > any proprietary software want to use this mode of communication with Host. > > Qemu-GA uses virtio-serial communication. > > We want to propose a blue-print on this for IceHouse Release. > > Anybody interested on this. Great! We have common interest and I hope we can promote it for IceHouse. BTW, do you have a initial plan or description about it. And I think this bp may invoke. https://blueprints.launchpad.net/nova/+spec/qemu-guest-agent-support > > Regards, > Balaji.P > > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Best regards, Haomai Wang, UnitedStack Inc. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] FFE Request: Pass image meta to hard reboot for generating xml
Hi, I have a patch need to be reviewed, which fix the bug https://bugs.launchpad.net/nova/+bug/1206809 and this is the commit: https://review.openstack.org/#/c/39668/ I think this is a serious bug because we may lose some features supporting which enabled by the image properity after hard reboot. The description of this bug is below: Currently the libvirt xml of instance will be generated every time during hard reboot by get_guest_config and to_xml method in libvirt driver, but the image_meta is not passed into the get_guest_config method like spawn(), so some configs related to the image_meta will be lost(compared with the spawn process), we need to get the image metadata and passes it to the to_xml method during hard reboot. 2013-09-09 Wangpan___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [openstack][nova] an unit test problem
Hi experts, I have an odd unit test issue in the commit https://review.openstack.org/#/c/44639/ the test results are here: http://logs.openstack.org/39/44639/7/check/gate-nova-python27/4ddc671/testr_results.html.gz the not passed test is: nova.tests.compute.test_compute_api.ComputeCellsAPIUnitTestCase.test_delete_in_resized I have two questions about this issue: 1) why it is passed when I run it by 'testr run nova.tests.compute.test_compute_api.ComputeCellsAPIUnitTestCase.test_delete_in_resized' and also 'nosetests ' in my local venv 2) why the other test nova.tests.compute.test_compute_api.ComputeAPIUnitTestCase.test_delete_in_resized is passed, which also inherits from the class '_ComputeAPIUnitTestMixIn' because it is OK in my local venv, so I have no idea to fix it, anybody can give me some advice? Thanks a lot! 2013-09-04 Wangpan___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev