Re: [openstack-dev] [nova] "correct" API for getting image metadata for an instance ?

2015-05-28 Thread Vladik Romanovsky
> As part of the work to object-ify the image metadata dicts, I'm looking
> at the current way the libvirt driver fetches image metadata for an
> instance, in cases where the compute manager hasn't already passed it
> into the virt driver API. I see 2 methods that libvirt uses to get the
> image metadata
> 
>  - nova.utils.get_image_from_system_metadata(instance.system_metadata)
> 
>  It takes the system metadata stored against the instance
>  and turns it into image  metadata.
> 
> 
> - nova.compute.utils.get_image_metadata(context,
>  image_api,
>  instance.image_ref,
>instance)
> 
>  This tries to get metadata from the image api and turns
>  this into system metadata
> 
>  It then gets system metadata from the instance and merges
>  it from the data from the image
> 
>  It then calls nova.utils.get_image_from_system_metadata()
> 
>  IIUC, any changes against the image will override what
>  is stored against the instance
> 
> 
> 
> IIUC, when an instance is booted, the image metadata should be
> saved against the instance. So I'm wondering why we need to have
> code in compute.utils that merges back in the image metadata each
> time ?
> 
> Is this intentional so that we pull in latest changes from the
> image, to override what's previously saved on the instance ? If
> so, then it seems that we should have been consistent in using
> the compute_utils get_image_metadata() API everywhere.
> 
> It seems wrong though to pull in the latest metadata from the
> image. The libvirt driver makes various decisions at boot time
> about how to configure the guest based on the metadata. When we
> later do changes to that guest (snapshot, hotplug, etc, etc)
> we *must* use exactly the same image metadata we had at boot
> time, otherwise decisions we make will be inconsistent with how
> the guest is currently configured.
> 
> eg if you set  hw_disk_bus=virtio at boot time, and then later
> change the image to use hw_disk_bus=scsi, and then try to hotplug
> a new drive on the guest, we *must* operate wrt hw_disk_bus=virtio
> because the guest will not have any scsi bus present.

I agree, as well, that we should use the system_metadate instead of
getting the latest from Glance.

I just wish there would be an easy way to edit it, in order to update
some keys such as video driver, watchdog action, nic driver.. etc,
so it would be picked up on hard reboot, for example.


> 
> This says to me we should /never/ use the compute_utils
> get_image_metadata() API once the guest is running, and so we
> should convert libvirt to use nova.utils.get_image_from_system_metadata()
> exclusively.
> 
> It also makes me wonder how nova/compute/manager.py is obtaining image
> meta in cases where it passes it into the API and whether that needs
> changing at all.
> 
> 
> Regards,
> Daniel
> --
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org  -o- http://virt-manager.org :|
> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][libvirt] RFC: ensuring live migration ends

2015-02-02 Thread Vladik Romanovsky


- Original Message -
> From: "Daniel P. Berrange" 
> To: "Robert Collins" 
> Cc: "OpenStack Development Mailing List (not for usage questions)" 
> ,
> openstack-operat...@lists.openstack.org
> Sent: Monday, 2 February, 2015 5:56:56 AM
> Subject: Re: [openstack-dev] [nova][libvirt] RFC: ensuring live migration 
> ends
> 
> On Mon, Feb 02, 2015 at 08:24:20AM +1300, Robert Collins wrote:
> > On 31 January 2015 at 05:47, Daniel P. Berrange 
> > wrote:
> > > In working on a recent Nova migration bug
> > >
> > >   https://bugs.launchpad.net/nova/+bug/1414065
> > >
> > > I had cause to refactor the way the nova libvirt driver monitors live
> > > migration completion/failure/progress. This refactor has opened the
> > > door for doing more intelligent active management of the live migration
> > > process.
> > ...
> > > What kind of things would be the biggest win from Operators' or tenants'
> > > POV ?
> > 
> > Awesome. Couple thoughts from my perspective. Firstly, there's a bunch
> > of situation dependent tuning. One thing Crowbar does really nicely is
> > that you specify the host layout in broad abstract terms - e.g. 'first
> > 10G network link' and so on : some of your settings above like whether
> > to compress page are going to be heavily dependent on the bandwidth
> > available (I doubt that compression is a win on a 100G link for
> > instance, and would be suspect at 10G even). So it would be nice if
> > there was a single dial or two to set and Nova would auto-calculate
> > good defaults from that (with appropriate overrides being available).
> 
> I wonder how such an idea would fit into Nova, since it doesn't really
> have that kind of knowledge about the network deployment characteristics.
> 
> > Operationally avoiding trouble is better than being able to fix it, so
> > I quite like the idea of defaulting the auto-converge option on, or
> > perhaps making it controllable via flavours, so that operators can
> > offer (and identify!) those particularly performance sensitive
> > workloads rather than having to guess which instances are special and
> > which aren't.
> 
> I'll investigate the auto-converge further to find out what the
> potential downsides of it are. If we can unconditionally enable
> it, it would be simpler than adding yet more tunables.
> 
> > Being able to cancel the migration would be good. Relatedly being able
> > to restart nova-compute while a migration is going on would be good
> > (or put differently, a migration happening shouldn't prevent a deploy
> > of Nova code: interlocks like that make continuous deployment much
> > harder).
> > 
> > If we can't already, I'd like as a user to be able to see that the
> > migration is happening (allows diagnosis of transient issues during
> > the migration). Some ops folk may want to hide that of course.
> > 
> > I'm not sure that automatically rolling back after N minutes makes
> > sense : if the impact on the cluster is significant then 1 minute vs
> > 10 doesn't instrinsically matter: what matters more is preventing too
> > many concurrent migrations, so that would be another feature that I
> > don't think we have yet: don't allow more than some N inbound and M
> > outbound live migrations to a compute host at any time, to prevent IO
> > storms. We may want to log with NOTIFICATION migrations that are still
> > progressing but appear to be having trouble completing. And of course
> > an admin API to query all migrations in progress to allow API driven
> > health checks by monitoring tools - which gives the power to manage
> > things to admins without us having to write a probably-too-simple
> > config interface.
> 
> Interesting, the point about concurrent migrations hadn't occurred to
> me before, but it does of course make sense since migration is
> primarily network bandwidth limited, though disk bandwidth is relevant
> too if doing block migration.

Indeed, there was a lot time spent investigating this topic (in Ovirt again)
and eventually it was decided to expose a config option and allow 3 concurrent
migrations by default.

https://github.com/oVirt/vdsm/blob/master/lib/vdsm/config.py.in#L126

> 
> Regards,
> Daniel
> --
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org  -o- http://virt-manager.org :|
> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubs

Re: [openstack-dev] [nova][libvirt] RFC: ensuring live migration ends

2015-01-30 Thread Vladik Romanovsky


- Original Message -
> From: "Daniel P. Berrange" 
> To: openstack-dev@lists.openstack.org, openstack-operat...@lists.openstack.org
> Sent: Friday, 30 January, 2015 11:47:16 AM
> Subject: [openstack-dev] [nova][libvirt] RFC: ensuring live migration ends
> 
> In working on a recent Nova migration bug
> 
>   https://bugs.launchpad.net/nova/+bug/1414065
> 
> I had cause to refactor the way the nova libvirt driver monitors live
> migration completion/failure/progress. This refactor has opened the
> door for doing more intelligent active management of the live migration
> process.
> 
> As it stands today, we launch live migration, with a possible bandwidth
> limit applied and just pray that it succeeds eventually. It might take
> until the end of the universe and we'll happily wait that long. This is
> pretty dumb really and I think we really ought to do better. The problem
> is that I'm not really sure what "better" should mean, except for ensuring
> it doesn't run forever.
> 
> As a demo, I pushed a quick proof of concept showing how we could easily
> just abort live migration after say 10 minutes
> 
>   https://review.openstack.org/#/c/151665/
> 
> There are a number of possible things to consider though...
> 
> First how to detect when live migration isn't going to succeeed.
> 
>  - Could do a crude timeout, eg allow 10 minutes to succeeed or else.
> 
>  - Look at data transfer stats (memory transferred, memory remaining to
>transfer, disk transferred, disk remaining to transfer) to determine
>if it is making forward progress.

I think this is a better option. We could define a timeout for the progress
and cancel if there is no progress. IIRC there were similar debates about it
in Ovirt, we could do something similar:
https://github.com/oVirt/vdsm/blob/master/vdsm/virt/migration.py#L430

> 
>  - Leave it upto the admin / user to decided if it has gone long enough
> 
> The first is easy, while the second is harder but probably more reliable
> and useful for users.
> 
> Second is a question of what todo when it looks to be failing
> 
>  - Cancel the migration - leave it running on source. Not good if the
>admin is trying to evacuate a host.
> 
>  - Pause the VM - make it complete as non-live migration. Not good if
>the guest workload doesn't like being paused
> 
>  - Increase the bandwidth permitted. There is a built-in rate limit in
>QEMU overridable via nova.conf. Could argue that the admin should just
>set their desired limit in nova.conf and be done with it, but perhaps
>there's a case for increasing it in special circumstances. eg emergency
>evacuate of host it is better to waste bandwidth & complete the job,
>but for non-urgent scenarios better to limit bandwidth & accept failure ?
> 
>  - Increase the maximum downtime permitted. This is the small time window
>when the guest switches from source to dest. To small and it'll never
>switch, too large and it'll suffer unacceptable interuption.
> 

In my opinion, it would be great if we could play with bandwidth and downtime
before cancelling the migration or pausing.
However, It makes sense only if there is some kind of a progress in the transfer
stats and not a complete disconnect. In that case we should just cancel it.

> We could do some of these things automatically based on some policy
> or leave them upto the cloud admin/tenant user via new APIs
> 
> Third there's question of other QEMU features we could make use of to
> stop problems in the first place
> 
>  - Auto-converge flag - if you set this QEMU throttles back the CPUs
>so the guest cannot dirty ram pages as quickly. This is nicer than
>pausing CPUs altogether, but could still be an issue for guests
>which have strong performance requirements
> 
>  - Page compression flag - if you set this QEMU does compression of
>pages to reduce data that has to be sent. This is basically trading
>off network bandwidth vs CPU burn. Probably a win unless you are
>already highly overcomit on CPU on the host
> 
> Fourth there's a question of whether we should give the tenant user or
> cloud admin further APIs for influencing migration
> 
>  - Add an explicit API for cancelling migration ?
> 
>  - Add APIs for setting tunables like downtime, bandwidth on the fly ?
> 
>  - Or drive some of the tunables like downtime, bandwidth, or policies
>like cancel vs paused from flavour or image metadata properties ?
> 
>  - Allow operations like evacuate to specify a live migration policy
>eg switch non-live migrate after 5 minutes ?
> 
IMHO, an explicit API for cancelling migration is very much needed.
I remember cases when migrations took about 8 or hours, leaving the
admins helpless :)

Also, I very much like the idea of having tunables and policy to set
in the flavours and image properties.
To allow the administrators to set these as a "template" in the flavour
and also to let the users to update/override or "request" these options
as t

Re: [openstack-dev] [nova][NFV][qa] Testing NUMA, CPU pinning and large pages

2015-01-28 Thread Vladik Romanovsky


- Original Message -
> From: "Steve Gordon" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Tuesday, 27 January, 2015 9:46:44 AM
> Subject: Re: [openstack-dev] [nova][NFV][qa] Testing NUMA, CPU pinning and 
> large pages
> 
> ----- Original Message -
> > From: "Vladik Romanovsky" 
> > To: openstack-dev@lists.openstack.org
> > 
> > Hi everyone,
> > 
> > Following Steve Gordon's email [1], regarding CI for NUMA, SR-IOV, and
> > other
> > features, I'd like to start a discussion about the NUMA testing in
> > particular.
> > 
> > Recently we have started a work to test some of these features.
> > The current plan is to use the functional tests, in the Nova tree, to
> > exercise
> > the code paths for NFV use cases. In general, these will contain tests
> > to cover various scenarios regarding NUMA, CPU pinning, large pages and
> > validate a correct placement/scheduling.
> 
> Hi Vladik,
> 
> There was some discussion of the above at the Nova mid-cycle yesterday, are
> you able to give a quick update on any progress with regards to creation of
> the above functional tests?
> 

I have a some progress, however, currently I have some challenges with 
validating
the scheduler filters outcome. I'll try to post some of it in the coming days.

> > In addition to the functional tests in Nova, we have also proposed two
> > basic
> > scenarios in Tempest [2][3]. One to make sure that an instance can boot
> > with a
> > minimal NUMA configuration (a topology that every host should have) and
> > one that would request an "impossible" topology and fail with an expected
> > exception.
> 
> We also discussed the above tempest changes and they will likely receive some
> more review cycles as a result of this discussion but it looks like there is
> already some feedback from Nikola that needs to be addressed. More broadly
> for the list it looks like we need to determine whether adding a negative
> test in this case is a valid/desireable use of Tempest.

I have updated the tempest tests yesterday. The tests were waiting on a nova
patch to be merged: 
https://review.openstack.org/#/c/145312

However, unfortunately, I've discovered another bug in nova that prevents the
tests from passing, somehow I missed it in the previous attempt:
https://review.openstack.org/#/c/150694

Thanks,
Vladik

> 
> Thanks,
> 
> Steve
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] multi-queue virtio-net interface

2015-01-23 Thread Vladik Romanovsky
Unfortunately, I didn't get a feature freeze exception for this blueprint.
I will resubmit the spec in next cycle.

I think the best way for you to contribute is to review the spec,
when it's re-posted and +1 it, if you agree with the design.

Thanks,
Vladik 

- Original Message -
> From: "Steve Gordon" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Cc: mfiuc...@akamai.com
> Sent: Wednesday, 21 January, 2015 4:43:37 PM
> Subject: Re: [openstack-dev] multi-queue virtio-net interface
> 
> - Original Message -
> > From: "Rajagopalan Sivaramakrishnan" 
> > To: openstack-dev@lists.openstack.org
> > 
> > Hello,
> > We are hitting a performance bottleneck in the Contrail network
> > virtualization solution due to the virtio interface having a single
> > queue in VMs spawned using Openstack. There seems to be a blueprint to
> > address this by enabling multi-queue virtio-net at
> > 
> > https://blueprints.launchpad.net/nova/+spec/libvirt-virtio-net-multiqueue
> > 
> > It is not clear what the current status of this project is. We would be
> > happy
> > to contribute towards this effort if required. Could somebody please let us
> > know what the next steps should be to get this into an upcoming release?
> > 
> > Thanks,
> > 
> > Raja
> 
> The specification is up for review here:
> 
> https://review.openstack.org/#/c/128825/
> 
> There is an associated Feature Freeze Exception (FFE) email for this proposal
> here which would need to be approved for this to be included in Kilo:
> 
> 
> http://lists.openstack.org/pipermail/openstack-dev/2015-January/054263.html
> 
> Thanks,
> 
> Steve
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] request spec freeze exception for virtio-net multiqueue

2015-01-12 Thread Vladik Romanovsky
Hello,

I'd like to request an exception for virtio-net multiqueue feature. [1]
This is an important feature that aims to increase the total network throughput
in guests and not too hard to implement.

Thanks,
Vladik

[1] https://review.openstack.org/#/c/128825

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][NFV][qa] Testing NUMA, CPU pinning and large pages

2015-01-11 Thread Vladik Romanovsky

Hi everyone,

Following Steve Gordon's email [1], regarding CI for NUMA, SR-IOV, and other
features, I'd like to start a discussion about the NUMA testing in 
particular.


Recently we have started a work to test some of these features.
The current plan is to use the functional tests, in the Nova tree, to 
exercise

the code paths for NFV use cases. In general, these will contain tests
to cover various scenarios regarding NUMA, CPU pinning, large pages and
validate a correct placement/scheduling.

In addition to the functional tests in Nova, we have also proposed two basic
scenarios in Tempest [2][3]. One to make sure that an instance can boot 
with a

minimal NUMA configuration (a topology that every host should have) and
one that would request an "impossible" topology and fail with an expected
exception.

This work doesn't eliminate the need of testing on a real hardware, however,
these tests should provide coverage for the features that are currently 
being
submitted upstream and hopefully be a good starting point for future 
testing.


Thoughts?

Vladik

[1] 
http://lists.openstack.org/pipermail/openstack-dev/2014-November/050306.html

[2] https://review.openstack.org/143540
[3] https://review.openstack.org/143541


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova host-update gives error 'Virt driver does not implement host disabled status'

2014-11-26 Thread Vladik Romanovsky


- Original Message -
> From: "Vineet Menon" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Wednesday, 26 November, 2014 5:14:09 AM
> Subject: Re: [openstack-dev] [nova] nova host-update gives error 'Virt driver 
> does not implement host disabled
> status'
> 
> Hi Kevin,
> 
> Oh. Yes. That could be the problem.
> Thanks for pointing that out.
> 
> 
> Regards,
> 
> Vineet Menon
> 
> 
> On 26 November 2014 at 02:02, Chen CH Ji < jiche...@cn.ibm.com > wrote:
> 
> 
> 
> 
> 
> are you using libvirt ? it's not implemented
> ,guess your bug are talking about other hypervisors?
> 
> the message was printed here:
> http://git.openstack.org/cgit/openstack/nova/tree/nova/api/openstack/compute/contrib/hosts.py#n236
> 
> Best Regards!
> 
> Kevin (Chen) Ji 纪 晨
> 
> Engineer, zVM Development, CSTL
> Notes: Chen CH Ji/China/IBM@IBMCN Internet: jiche...@cn.ibm.com
> Phone: +86-10-82454158
> Address: 3/F Ring Building, ZhongGuanCun Software Park, Haidian District,
> Beijing 100193, PRC
> 
> Vineet Menon ---11/26/2014 12:10:39 AM---Hi, I'm trying to reproduce the bug
> https://bugs.launchpad.net/nova/+bug/1259535 .
> 
> From: Vineet Menon < mvineetme...@gmail.com >
> To: openstack-dev < openstack-dev@lists.openstack.org >
> Date: 11/26/2014 12:10 AM
> Subject: [openstack-dev] [nova] nova host-update gives error 'Virt driver
> does not implement host disabled status'
> 
> 
Hi Vinet, 

There are two methods in the API for changing the service/host status.
nova host-update and nova service-update.

Currently, in order to disable the service one should use the "nova 
service-update" command,
which maps to "service_update" method in the manager class.

"nova host-update" maps to set_host_enabled() methodin the virt drivers, which 
is not implemented
in the libvirt driver.
Not sure what is the purpose of this method, but libvirt driver doesn't 
implement it.

For a short period of time, this method was implemented, for a wrong reason, 
which was causing the bug in the title,
however, it was fix with https://review.openstack.org/#/c/61016

Let me know if you have any questions.

Thanks,
Vladik



> 
> 
> Hi,
> 
> I'm trying to reproduce the bug https://bugs.launchpad.net/nova/+bug/1259535
> .
> While trying to issue the command, nova host-update --status disable
> machine1, an error is thrown saying,
> 
> 
> ERROR (HTTPNotImplemented): Virt driver does not implement host disabled
> status. (HTTP 501) (Request-ID: req-1f58feda-93af-42e0-b7b6-bcdd095f7d8c)
> 
> What is this error about?
> 
> Regards,
> Vineet Menon
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Vladik Romanovsky
+1 

I very much agree with Dan's the propsal.

I am concerned about difficulties we will face with merging
patches that spreads accross various regions: manager, conductor, scheduler, 
etc..
However, I think, this is a small price to pay for having a more focused teams.

IMO, we will stiil have to pay it, the moment the scheduler will separate.

Regards,
Vladik

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] [feature freeze exception] FFE for libvirt-start-lxc-from-block-devices

2014-09-04 Thread Vladik Romanovsky
Hello,

I would like to ask for an extension for libvirt-start-lxc-from-block-devices 
feature. It has been previously pushed from Ice house to Juno.
The spec [1] has been approved. One of the patches is a bug fix. Another patch 
has been already approved and failed in the gate.
All patches has a +2 from Daniel Berrange.

The list of the remaining patches are in [2].


[1] https://review.openstack.org/#/c/88062
[2] 
https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/libvirt-start-lxc-from-block-devices,n,z

Thank you,
Vladik

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno

2014-09-03 Thread Vladik Romanovsky
+1

I had several pacthes in "start lxc from block device" series. The blueprint 
was waiting since Icehouse.
In Juno it was approved, however, besides Daniel Berrange no one was looking at 
these patches.
Now it's being pushed to Kilo, regadless of the fact that everything is +2ed.

Normally, I don't actively pursue people to get approvals, as I was getting 
angry pushback from cores,
at the begining of my way with openstack.

I don't understand what is the proper way to get work done.

Vladik 

- Original Message -
> From: "Solly Ross" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Wednesday, September 3, 2014 11:57:29 AM
> Subject: Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno
> 
> > I will follow up with a more detailed email about what I believe we are
> > missing, once the FF settles and I have applied some soothing creme to
> > my burnout wounds, but currently my sentiment is:
> > 
> > Contributing features to Nova nowadays SUCKS!!1 (even as a core
> > reviewer) We _have_ to change that!
> 
> I think this is *very* important.
> 
> 
> For instance, I have/had two patch series
> up. One is of length 2 and is relatively small.  It's basically sitting there
> with one "+2" on each patch.  I will now most likely have to apply for a FFE
> to get it merged, not because there's more changes to be made before it can
> get merged
> (there was one small nit posted yesterday) or because it's a huge patch that
> needs a lot
> of time to review, but because it just took a while to get reviewed by cores,
> and still only appears to have been looked at by one core.
> 
> For the other patch series (which is admittedly much bigger), it was hard
> just to
> get reviews (and it was something where I actually *really* wanted several
> opinions,
> because the patch series touched a couple of things in a very significant
> way).
> 
> Now, this is not my first contribution to OpenStack, or to Nova, for that
> matter.  I
> know things don't always get in.  It's frustrating, however, when it seems
> like the
> reason something didn't get in wasn't because it was fundamentally flawed,
> but instead
> because it didn't get reviews until it was too late to actually take that
> feedback into
> account, or because it just didn't get much attention review-wise at all.  If
> I were a
> new contributor to Nova who had successfully gotten a major blueprint
> approved and
> the implemented, only to see it get rejected like this, I might get turned
> off of Nova,
> and go to work on one of the other OpenStack projects that seemed to move a
> bit faster.
> 
> 
> So, it's silly to rant without actually providing any ideas on how to fix it.
> One suggestion would be, for each approved blueprint, to have one or two
> cores
> explicitly marked as being responsible for providing at least some feedback
> on
> that patch.  This proposal has issues, since we have a lot of blueprints and
> only
> twenty cores, who also have their own stuff to work on.  However, I think the
> general idea of having "guaranteed" reviewers is not unsound by itself.
> Perhaps
> we should have a loose tier of reviewers between "core" and "everybody else".
> These reviewers would be known good reviewers who would follow the
> implementation
> of particular blueprints if a core did not have the time.  Then, when those
> reviewers
> gave the "+1" to all the patches in a series, they could ping a core, who
> could feel
> more comfortable giving a "+2" without doing a deep inspection of the code.
> 
> That's just one suggestion, though.  Whatever the solution may be, this is a
> problem that we need to fix.  While I enjoyed going through the blueprint
> process
> this cycle (not sarcastic -- I actually enjoyed the whole "structured
> feedback" thing),
> the follow up to that was not the most pleasant.
> 
> One final note: the specs referenced above didn't get approved until Spec
> Freeze, which
> seemed to leave me with less time to implement things.  In fact, it seemed
> that a lot
> of specs didn't get approved until spec freeze.  Perhaps if we had more
> staggered
> approval of specs, we'd have more staggered submission of patches, and thus
> less of a
> sudden influx of patches in the couple weeks before feature proposal freeze.
> 
> Best Regards,
> Solly Ross
> 
> - Original Message -
> > From: "Nikola Đipanov" 
> > To: openstack-dev@lists.openstack.org
> > Sent: Wednesday, September 3, 2014 5:50:09 AM
> > Subject: Re: [openstack-dev] [Nova] Feature Freeze Exception process for
> > Juno
> > 
> > On 09/02/2014 09:23 PM, Michael Still wrote:
> > > On Tue, Sep 2, 2014 at 1:40 PM, Nikola Đipanov 
> > > wrote:
> > >> On 09/02/2014 08:16 PM, Michael Still wrote:
> > >>> Hi.
> > >>>
> > >>> We're soon to hit feature freeze, as discussed in Thierry's recent
> > >>> email. I'd like to outline the process for requesting a freeze
> > >>> exception:
> > >>>
> > >>> * your code must already be up for re

[openstack-dev] [nova][libvirt][lxc] Attach volumes to LXC is broken

2014-06-19 Thread Vladik Romanovsky
Hello Everyone,

I've been recently working on bug/1269990, re: "attached volumes to LXC are
being lost after reboot/power on", after fixing it, I've realized that the
initial attach volume to LXC operation is not functional at all. (bug/1330981)

I've described the problem in details in the bug. In essence, it all converges
to the fact that /dev/nbdX or /dev/loopX is being set as a root_device_name,
when LXC is being started. Later, while attaching a new volume, these devices
names are not being properly parsed. Nor can a disk_bus can be chosen for the 
new
volume.

Saving the nbd device as root_device_name was introduced by patch
6277f8aa9 - Change-Id: I063fd3a9856bba089bcde5cdefd2576e2eb2b0e9,
to fix a problem where nbd and loop devices are not properly disconnected while 
the LXC instance
 has been terminated.
These devices where leaking because, while starting LXC, we are unmounting the 
lxc rootfs,
and thus cleaning the LXC space in disk.clean_lxc_namespace() - 
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L3570
This is causing the disk api not able to find a relevant nbd/loop device to 
disconnect
while terminating a LXC instance.
https://github.com/openstack/nova/blob/master/nova/virt/disk/api.py#L250

The possible to solutions to these problems, I could come up with:
1. Stop saving nbd/loop devices as root_device_name, as well as, stop calling
   disk.clean_lxc_namespace(), letting the terminate_instance()
   unmount the LXC rootfs and disconnect relevant devices
   (relying on an existing mechanism). 
   This will also allow the attach_volume() to succeed.
   
2. Adjust get_next_device_name() method to explicitly handle nbd and loop 
devices in 
   https://github.com/openstack/nova/blob/master/nova/compute/utils.py#L129
   
3. Add an additional filed to the instance model, other then root_device_name,
   to save nbd/loop devices, that should be disconnected on instance 
termination.
   
Not sure which option is better. Also, it is not entirely clear to me why 
clean_lxc_namespace
was/is needed.

I'd like to get your opinion and feedback if I'm missing anything or the 
explanation was too confusing :)

Thanks,
Vladik 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] unittests are failing after change I5e4cb4a8

2014-01-06 Thread Vladik Romanovsky
Hello everyone,

I'm just wondering, is there anyone else who is affected by this bug: 
https://bugs.launchpad.net/nova/+bug/1266534 ?
It looks to me that everyone should be affected by it (after change 
https://review.openstack.org/#/c/61310 has been merged),
but I also see many tests in Jenkins which are passing.

I'm not sure if I am missing anything.

Thanks,
Vladik

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] VM diagnostics - V3 proposal

2013-12-19 Thread Vladik Romanovsky
Ah, I think I've responded too fast, sorry.

meter-list provides a list of various measurements that are being done per 
resource.
sample-list provides a list of samples per every meter: ceilometer sample-list 
--meter cpu_util -q resource_id=vm_uuid
These samples can be aggregated over a period of time per every meter and 
resource:
ceilometer statistics -m cpu_util -q 
'timestamp>START;timestamp<=END;resource_id=vm_uuid' --period 3600

Vladik



- Original Message -
> From: "Daniel P. Berrange" 
> To: "Vladik Romanovsky" 
> Cc: "OpenStack Development Mailing List (not for usage questions)" 
> , "John
> Garbutt" 
> Sent: Thursday, 19 December, 2013 10:37:27 AM
> Subject: Re: [openstack-dev] [nova] VM diagnostics - V3 proposal
> 
> On Thu, Dec 19, 2013 at 03:47:30PM +0100, Vladik Romanovsky wrote:
> > I think it was:
> > 
> > ceilometer sample-list -m cpu_util -q 'resource_id=vm_uuid'
> 
> Hmm, a standard devstack deployment of ceilometer doesn't seem to
> record any performance stats at all - just shows me the static
> configuration parameters :-(
> 
>  ceilometer meter-list  -q 'resource_id=296b22c6-2a4d-4a8d-a7cd-2d73339f9c70'
> +-+---+--+--+--+--+
> | Name| Type  | Unit | Resource ID
> | | User ID  | Project ID
> | |
> +-+---+--+--+--+--+
> | disk.ephemeral.size | gauge | GB   |
> | 296b22c6-2a4d-4a8d-a7cd-2d73339f9c70 | 96f9a624a325473daf4cd7875be46009 |
> | ec26984024c1438e8e2f93dc6a8c5ad0 |
> | disk.root.size  | gauge | GB   |
> | 296b22c6-2a4d-4a8d-a7cd-2d73339f9c70 | 96f9a624a325473daf4cd7875be46009 |
> | ec26984024c1438e8e2f93dc6a8c5ad0 |
> | instance| gauge | instance |
> | 296b22c6-2a4d-4a8d-a7cd-2d73339f9c70 | 96f9a624a325473daf4cd7875be46009 |
> | ec26984024c1438e8e2f93dc6a8c5ad0 |
> | instance:m1.small   | gauge | instance |
> | 296b22c6-2a4d-4a8d-a7cd-2d73339f9c70 | 96f9a624a325473daf4cd7875be46009 |
> | ec26984024c1438e8e2f93dc6a8c5ad0 |
> | memory  | gauge | MB   |
> | 296b22c6-2a4d-4a8d-a7cd-2d73339f9c70 | 96f9a624a325473daf4cd7875be46009 |
> | ec26984024c1438e8e2f93dc6a8c5ad0 |
> | vcpus   | gauge | vcpu |
> | 296b22c6-2a4d-4a8d-a7cd-2d73339f9c70 | 96f9a624a325473daf4cd7875be46009 |
> | ec26984024c1438e8e2f93dc6a8c5ad0 |
> +-+---+--+--+--+--+
> 
> 
> If the admin user can't rely on ceilometer guaranteeing availability of
> the performance stats at all, then I think having an API in nova to report
> them is in fact justifiable. In fact it is probably justifiable no matter
> what as a fallback way to check that VMs are doing in the fact of failure
> of ceilometer / part of the cloud infrastructure.
> 
> Daniel
> --
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org  -o- http://virt-manager.org :|
> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
> 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] VM diagnostics - V3 proposal

2013-12-19 Thread Vladik Romanovsky
Or

ceilometer meter-list -q resource_id='vm_uuid'

- Original Message -
> From: "Daniel P. Berrange" 
> To: "John Garbutt" 
> Cc: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Thursday, 19 December, 2013 9:34:02 AM
> Subject: Re: [openstack-dev] [nova] VM diagnostics - V3 proposal
> 
> On Thu, Dec 19, 2013 at 02:27:40PM +, John Garbutt wrote:
> > On 16 December 2013 15:50, Daniel P. Berrange  wrote:
> > > On Mon, Dec 16, 2013 at 03:37:39PM +, John Garbutt wrote:
> > >> On 16 December 2013 15:25, Daniel P. Berrange 
> > >> wrote:
> > >> > On Mon, Dec 16, 2013 at 06:58:24AM -0800, Gary Kotton wrote:
> > >> >> I'd like to propose the following for the V3 API (we will not touch
> > >> >> V2
> > >> >> in case operators have applications that are written against this –
> > >> >> this
> > >> >> may be the case for libvirt or xen. The VMware API support was added
> > >> >> in I1):
> > >> >>
> > >> >>  1.  We formalize the data that is returned by the API [1]
> > >> >
> > >> > Before we debate what standard data should be returned we need
> > >> > detail of exactly what info the current 3 virt drivers return.
> > >> > IMHO it would be better if we did this all in the existing wiki
> > >> > page associated with the blueprint, rather than etherpad, so it
> > >> > serves as a permanent historical record for the blueprint design.
> > >>
> > >> +1
> > >>
> > >> > While we're doing this I think we should also consider whether
> > >> > the 'get_diagnostics' API is fit for purpose more generally.
> > >> > eg currently it is restricted to administrators. Some, if
> > >> > not all, of the data libvirt returns is relevant to the owner
> > >> > of the VM but they can not get at it.
> > >>
> > >> Ceilometer covers that ground, we should ask them about this API.
> > >
> > > If we consider what is potentially in scope for ceilometer and
> > > subtract that from what the libvirt get_diagnostics impl currently
> > > returns, you pretty much end up with the empty set. This might cause
> > > us to question if 'get_diagnostics' should exist at all from the
> > > POV of the libvirt driver's impl. Perhaps vmware/xen return data
> > > that is out of scope for ceilometer ?
> > 
> > Hmm, a good point.
> 
> So perhaps I'm just being dumb, but I deployed ceilometer and could
> not figure out how to get it to print out the stats for a single
> VM from its CLI ? eg, can someone show me a command line invocation
> for ceilometer that displays CPU, memory, disk and network I/O stats
> in one go ?
> 
> 
> Daniel
> --
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org  -o- http://virt-manager.org :|
> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] VM diagnostics - V3 proposal

2013-12-19 Thread Vladik Romanovsky
I think it was:

ceilometer sample-list -m cpu_util -q 'resource_id=vm_uuid'

Vladik

- Original Message -
> From: "Daniel P. Berrange" 
> To: "John Garbutt" 
> Cc: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Thursday, 19 December, 2013 9:34:02 AM
> Subject: Re: [openstack-dev] [nova] VM diagnostics - V3 proposal
> 
> On Thu, Dec 19, 2013 at 02:27:40PM +, John Garbutt wrote:
> > On 16 December 2013 15:50, Daniel P. Berrange  wrote:
> > > On Mon, Dec 16, 2013 at 03:37:39PM +, John Garbutt wrote:
> > >> On 16 December 2013 15:25, Daniel P. Berrange 
> > >> wrote:
> > >> > On Mon, Dec 16, 2013 at 06:58:24AM -0800, Gary Kotton wrote:
> > >> >> I'd like to propose the following for the V3 API (we will not touch
> > >> >> V2
> > >> >> in case operators have applications that are written against this –
> > >> >> this
> > >> >> may be the case for libvirt or xen. The VMware API support was added
> > >> >> in I1):
> > >> >>
> > >> >>  1.  We formalize the data that is returned by the API [1]
> > >> >
> > >> > Before we debate what standard data should be returned we need
> > >> > detail of exactly what info the current 3 virt drivers return.
> > >> > IMHO it would be better if we did this all in the existing wiki
> > >> > page associated with the blueprint, rather than etherpad, so it
> > >> > serves as a permanent historical record for the blueprint design.
> > >>
> > >> +1
> > >>
> > >> > While we're doing this I think we should also consider whether
> > >> > the 'get_diagnostics' API is fit for purpose more generally.
> > >> > eg currently it is restricted to administrators. Some, if
> > >> > not all, of the data libvirt returns is relevant to the owner
> > >> > of the VM but they can not get at it.
> > >>
> > >> Ceilometer covers that ground, we should ask them about this API.
> > >
> > > If we consider what is potentially in scope for ceilometer and
> > > subtract that from what the libvirt get_diagnostics impl currently
> > > returns, you pretty much end up with the empty set. This might cause
> > > us to question if 'get_diagnostics' should exist at all from the
> > > POV of the libvirt driver's impl. Perhaps vmware/xen return data
> > > that is out of scope for ceilometer ?
> > 
> > Hmm, a good point.
> 
> So perhaps I'm just being dumb, but I deployed ceilometer and could
> not figure out how to get it to print out the stats for a single
> VM from its CLI ? eg, can someone show me a command line invocation
> for ceilometer that displays CPU, memory, disk and network I/O stats
> in one go ?
> 
> 
> Daniel
> --
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org  -o- http://virt-manager.org :|
> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova][libvirt]when deleting instance which is in migrating state, instance files can be stay in destination node forever

2013-12-16 Thread Vladik Romanovsky
I would block it in the API or have the API cancelling the migration first. 
I don't see a reason why to start an operation that is meant to fail, which 
also has a complex chain of event, following it failure.

Regardless of the above, I think that the suggested exception handling is 
needed in any case.


Vladik

- Original Message -
> From: "Loganathan Parthipan" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Monday, 16 December, 2013 8:25:09 AM
> Subject: Re: [openstack-dev] [Nova][libvirt]when deleting instance which is 
> in migrating state, instance files can be
> stay in destination node forever
> 
> 
> 
> Isn’t just handling the exception instance_not_found enough? By this time
> source would’ve been cleaned up. Destination VM resources will get cleaned
> up by the periodic task since the VM is not associated with this host. Am I
> missing something here?
> 
> 
> 
> 
> 
> 
> From: 王宏 [mailto:w.wangho...@gmail.com]
> Sent: 16 December 2013 11:32
> To: openstack-dev@lists.openstack.org
> Subject: [openstack-dev] [Nova][libvirt]when deleting instance which is in
> migrating state, instance files can be stay in destination node forever
> 
> 
> 
> 
> 
> Hi all.
> 
> 
> When I try to fix a bug: https://bugs.launchpad.net/nova/+bug/1242961 ,
> 
> 
> I get a trouble.
> 
> 
> 
> 
> 
> To reproduce the bug is very easy. Live migrate a vm in block_migration mode,
> 
> 
> and then delelte the vm immediately.
> 
> 
> 
> 
> 
> The reason of this bug is as follow:
> 
> 
> 1. Because live migrate costs more time, so the vm will be deleted
> sucessfully
> 
> 
> before live migrate complete. And then, we will get an exception while live
> 
> 
> migrating.
> 
> 
> 2. After live migrate failed, we start to rollback. But, in the rollback
> method
> 
> 
> we will get or modify the info of vm from db. Because the vm has been deleted
> 
> 
> already, so we will get instance_not_found exception and rollback will be
> 
> 
> faild too.
> 
> 
> 
> 
> 
> I have two ways to fix the bug:
> 
> 
> i)Add check in nova-api. When try to delete a vm, we return an error message
> if
> 
> 
> the vm_state is LIVE_MIGRATING. This way is very simple, but need to
> carefully
> 
> 
> consider. I have found a related discussion:
> 
> 
> http://lists.openstack.org/pipermail/openstack-dev/2013-October/017454.html ,
> but
> 
> 
> it has no result in the discussion.
> 
> 
> ii)Before live migrate we get all the data needed by rollback method, and add
> a
> 
> 
> new rollback method. The new method will clean up resources at destination
> based
> 
> 
> on the above data(The resouces at source has been already cleaned up by
> 
> 
> deleting).
> 
> 
> 
> 
> 
> I have no idea whitch one I should choose. Or, any other ideas?:)
> 
> 
> 
> 
> 
> Regards,
> 
> 
> wanghong
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Unified Guest Agent proposal

2013-12-12 Thread Vladik Romanovsky
Dmitry,

I understand that :)
The only hypervisor dependency it has is how it communicates with the host, 
while this can be extended and turned into a binding, so people could connect 
to it in multiple ways.

The real value, as I see it, is which features this guest agent already 
implements and the fact that this is a mature code base.

Thanks,
Vladik 

- Original Message -
> From: "Dmitry Mescheryakov" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Thursday, 12 December, 2013 12:27:47 PM
> Subject: Re: [openstack-dev] Unified Guest Agent proposal
> 
> Vladik,
> 
> Thanks for the suggestion, but hypervisor-dependent solution is exactly what
> scares off people in the thread :-)
> 
> Thanks,
> 
> Dmitry
> 
> 
> 2013/12/11 Vladik Romanovsky < vladik.romanov...@enovance.com >
> 
> 
> 
> Maybe it will be useful to use Ovirt guest agent as a base.
> 
> http://www.ovirt.org/Guest_Agent
> https://github.com/oVirt/ovirt-guest-agent
> 
> It is already working well on linux and windows and has a lot of
> functionality.
> However, currently it is using virtio-serial for communication, but I think
> it can be extended for other bindings.
> 
> Vladik
> 
> - Original Message -
> > From: "Clint Byrum" < cl...@fewbar.com >
> > To: "openstack-dev" < openstack-dev@lists.openstack.org >
> > Sent: Tuesday, 10 December, 2013 4:02:41 PM
> > Subject: Re: [openstack-dev] Unified Guest Agent proposal
> > 
> > Excerpts from Dmitry Mescheryakov's message of 2013-12-10 12:37:37 -0800:
> > > >> What is the exact scenario you're trying to avoid?
> > > 
> > > It is DDoS attack on either transport (AMQP / ZeroMQ provider) or server
> > > (Salt / Our own self-written server). Looking at the design, it doesn't
> > > look like the attack could be somehow contained within a tenant it is
> > > coming from.
> > > 
> > 
> > We can push a tenant-specific route for the metadata server, and a tenant
> > specific endpoint for in-agent things. Still simpler than hypervisor-aware
> > guests. I haven't seen anybody ask for this yet, though I'm sure if they
> > run into these problems it will be the next logical step.
> > 
> > > In the current OpenStack design I see only one similarly vulnerable
> > > component - metadata server. Keeping that in mind, maybe I just
> > > overestimate the threat?
> > > 
> > 
> > Anything you expose to the users is "vulnerable". By using the localized
> > hypervisor scheme you're now making the compute node itself vulnerable.
> > Only now you're asking that an already complicated thing (nova-compute)
> > add another job, rate limiting.
> > 
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Unified Guest Agent proposal

2013-12-10 Thread Vladik Romanovsky

Maybe it will be useful to use Ovirt guest agent as a base.

http://www.ovirt.org/Guest_Agent
https://github.com/oVirt/ovirt-guest-agent

It is already working well on linux and windows and has a lot of functionality.
However, currently it is using virtio-serial for communication, but I think it 
can be extended for other bindings.

Vladik

- Original Message -
> From: "Clint Byrum" 
> To: "openstack-dev" 
> Sent: Tuesday, 10 December, 2013 4:02:41 PM
> Subject: Re: [openstack-dev] Unified Guest Agent proposal
> 
> Excerpts from Dmitry Mescheryakov's message of 2013-12-10 12:37:37 -0800:
> > >> What is the exact scenario you're trying to avoid?
> > 
> > It is DDoS attack on either transport (AMQP / ZeroMQ provider) or server
> > (Salt / Our own self-written server). Looking at the design, it doesn't
> > look like the attack could be somehow contained within a tenant it is
> > coming from.
> > 
> 
> We can push a tenant-specific route for the metadata server, and a tenant
> specific endpoint for in-agent things. Still simpler than hypervisor-aware
> guests. I haven't seen anybody ask for this yet, though I'm sure if they
> run into these problems it will be the next logical step.
> 
> > In the current OpenStack design I see only one similarly vulnerable
> > component - metadata server. Keeping that in mind, maybe I just
> > overestimate the threat?
> > 
> 
> Anything you expose to the users is "vulnerable". By using the localized
> hypervisor scheme you're now making the compute node itself vulnerable.
> Only now you're asking that an already complicated thing (nova-compute)
> add another job, rate limiting.
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Recent change breaks manual control of service enabled / disabled status - suggest it is backed out and re-worked

2013-11-12 Thread Vladik Romanovsky
I'll work on Daniel's suggestion and will send a patch for review.

Vladik

- Original Message -
> From: "Daniel P. Berrange" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Tuesday, 12 November, 2013 5:32:28 AM
> Subject: Re: [openstack-dev] [Nova] Recent change breaks manual control of 
> service enabled / disabled status -
> suggest it is backed out and re-worked
> 
> On Mon, Nov 11, 2013 at 11:34:10AM +, Day, Phil wrote:
> > Hi Folks,
> > 
> > I'd like to get some eyes on a bug I just filed:
> > https://bugs.launchpad.net/nova/+bug/1250049
> > 
> > A recent change (https://review.openstack.org/#/c/52189/9 ) introduced the
> > automatic disable / re-enable of nova-compute when connection to libvirt
> > is lost and recovered.   The problem is that it doesn't take any account
> > of the fact that a cloud administrator may have other reasons for
> > disabling a service, and always put nova-compute back into an enabled
> > state.
> > 
> > The impact of this is pretty big for us - at any point in time we have a
> > number of servers disabled for various operational reasons, and there are
> > times when we need to restart libvirt as part of a deployment.  With this
> > change in place all of those hosts are returned to an enabled state, and
> > the reason that they were disabled is lost.
> > 
> > While I like the concept that an error condition like this should disable
> > the host from a scheduling perspective, I think it needs to be implemented
> > as an additional form of disablement (i.e a separate value kept in the
> > ServiceGroup API), not an override of the current one.
> > 
> > I'd like to propose that the current change is reverted as a priority, and
> > a new approach then submitted as a second step that works alongside the
> > current enable /disable reason.
> > 
> > Sorry for not catching this in the review stage - I didn't notice this one
> > at all.
> 
> It seems like it would be pretty easy to just use an explicit
> 'disable_reason'
> string value, and then only automatically re-enable if the string matches the
> one we set when disabling it originally. I think that should be easy enough
> for
> someone to do without needing to revert the entire original change.
> 
> Daniel
> --
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org  -o- http://virt-manager.org :|
> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][Libvirt] Disabling nova-compute when a connection to libvirt is broken.

2013-10-13 Thread Vladik Romanovsky
Thanks for the feedback.

I'm already working on this. Will send the patch for review, very soon.

Thanks,
Vladik

- Original Message -
From: "Lingxian Kong" 
To: "OpenStack Development Mailing List" 
Sent: Saturday, October 12, 2013 5:42:15 AM
Subject: Re: [openstack-dev] [nova][Libvirt] Disabling nova-compute when a 
connection to libvirt is broken.

+1 for me. And I am willing to be a volunteer. 


2013/10/12 Joe Gordon < joe.gord...@gmail.com > 



On Thu, Oct 10, 2013 at 4:47 AM, Vladik Romanovsky < 
vladik.romanov...@enovance.com > wrote: 


Hello everyone, 

I have been recently working on a migration bug in nova (Bug #1233184). 

I noticed that compute service remains available, even if a connection to 
libvirt is broken. 
I thought that it might be better to disable the service (using 
conductor.manager.update_service()) and resume it once it's connected again. 
(maybe keep the host_stats periodic task running or create a dedicated one, 
once it succeed, the service will become available again). 
This way new vms wont be scheduled nor migrated to the disconnected host. 

Any thoughts on that? 

Sounds reasonable to me. If we can't reach libvirt there isn't much that 
nova-compute can / should do. 


Is anyone already working on that? 

Thank you, 
Vladik 

___ 
OpenStack-dev mailing list 
OpenStack-dev@lists.openstack.org 
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 


___ 
OpenStack-dev mailing list 
OpenStack-dev@lists.openstack.org 
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 




-- 
 
Lingxian Kong 
Huawei Technologies Co.,LTD. 
IT Product Line CloudOS PDU 
China, Xi'an 
Mobile: +86-18602962792 
Email: konglingx...@huawei.com ; anlin.k...@gmail.com 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][Libvirt] Disabling nova-compute when a connection to libvirt is broken.

2013-10-10 Thread Vladik Romanovsky
Hello everyone,

I have been recently working on a migration bug in nova (Bug #1233184). 

I noticed that compute service remains available, even if a connection to 
libvirt is broken.
I thought that it might be better to disable the service (using 
conductor.manager.update_service()) and resume it once it's connected again. 
(maybe keep the host_stats periodic task running or create a dedicated one, 
once it succeed, the service will become available again).
This way new vms wont be scheduled nor migrated to the disconnected host.

Any thoughts on that?
Is anyone already working on that?

Thank you,
Vladik

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev