[openstack-dev] [nova] State of NUMA live migration
> Yes, this is still happening. Mea culpa for not carrying the ball and > maintaining visibility. There's work in nova to actually get it > working, and in intel-nfv-ci to lay down the groundwork for eventual > CI. > > In nova, the spec has been re-proposed for Stein [1]. There are some > differences from the Rocky version, but based on what I've heard was > discussed at Denver, it shouldn't be too controversial. There's a > couple of nova patches up as well [2], but that's still pretty WIP > given the changes in the spec. A bunch of patches from Rocky were > abandoned because they're no longer applicable. > > In intel-nfv-ci, there's a whole stack of changes [3] that are mostly > about technical debt and laying the groundwork to support multinode > test environments, but there's also a WIP patch in there [4] that'll > eventually actually test live migration. > > For now we have no upstream/public environment to run that on, so > anyone who's involved will need their own env if they want to run the > tests and/or play with the feature. Longer-term, I would like to have > some form of upstream CI testing this, be it in the vanilla nodepool > with nested virt and "fake" NUMA topologies, or a 3rd party CI with > resources provided by an interested stakeholder. Forgot the nova tag :( > [1] https://review.openstack.org/#/c/599587/ > [2] > https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/numa-aware-live-migration > [3] https://review.openstack.org/#/c/576602/ > [4] https://review.openstack.org/#/c/574871/6 -- -- Artom Lifshitz Software Engineer, OpenStack Compute DFG __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] State of NUMA live migration
Yes, this is still happening. Mea culpa for not carrying the ball and maintaining visibility. There's work in nova to actually get it working, and in intel-nfv-ci to lay down the groundwork for eventual CI. In nova, the spec has been re-proposed for Stein [1]. There are some differences from the Rocky version, but based on what I've heard was discussed at Denver, it shouldn't be too controversial. There's a couple of nova patches up as well [2], but that's still pretty WIP given the changes in the spec. A bunch of patches from Rocky were abandoned because they're no longer applicable. In intel-nfv-ci, there's a whole stack of changes [3] that are mostly about technical debt and laying the groundwork to support multinode test environments, but there's also a WIP patch in there [4] that'll eventually actually test live migration. For now we have no upstream/public environment to run that on, so anyone who's involved will need their own env if they want to run the tests and/or play with the feature. Longer-term, I would like to have some form of upstream CI testing this, be it in the vanilla nodepool with nested virt and "fake" NUMA topologies, or a 3rd party CI with resources provided by an interested stakeholder. [1] https://review.openstack.org/#/c/599587/ [2] https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/numa-aware-live-migration [3] https://review.openstack.org/#/c/576602/ [4] https://review.openstack.org/#/c/574871/6 __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [placement] The "intended purpose" of traits
> So from a code perspective _placement_ is completely agnostic to > whether a trait is "PCI_ADDRESS_01_AB_23_CD", "STORAGE_DISK_SSD", or > "JAY_LIKES_CRUNCHIE_BARS". Right, but words have meanings, and everyone is better off if that meaning is common amongst everyone doing the talking. So if placement understands traits as "a unitary piece of information that is either true or false" (ex: HAS_SSD), but nova understands it as "multiple pieces of information, all of which are either true or false" (ex: HAS_PCI_DE_AD_BE_EF), then that's asking for trouble. Can it work out? Probably, but it'll be more by accident that by design, sort of like French and Spanish sharing certain words, but then having some similar sounding words mean something completely different. > However, things which are using traits (e.g., nova, ironic) need to > make their own decisions about how the value of traits are > interpreted. Well... if placement is saying "here's the primitives I can work with and can expose to my users", but nova is saying "well, we like this one primitive, but what we really need is this other primitive, and you don't have it, but we can totally hack this first primitive that you do have to do what we want"... That's ugly. From what I understand, Nova needs *resources* (not resources providers) to have *quantities* of things, and this is not something that placement can currently work with, which is why we're having this flamewar ;) > I don't have a strong position on that except to say > that _if_ we end up in a position of there being lots of traits > willy nilly, people who have chosen to do that need to know that the > contract presented by traits right now (present or not present, no > value comprehension) is fixed. > > > I *do* see a problem with it, based on my experience in Nova where this kind > > of thing leads to ugly, unmaintainable, and incomprehensible code as I have > > pointed to in previous responses. > > I think there are many factors that have led to nova being > incomprehensible and indeed bad representations is one of them, but > I think reasonable people can disagree on which factors are the most > important and with sufficient discussion come to some reasonable > compromises. I personally feel that while the bad representations > (encoding stuff in strings or json blobs) thing is a big deal, > another major factor is a predilection to make new apis, new > abstractions, and new representations rather than working with and > adhering to the constraints of the existing ones. This leads to a > lot of code that encodes business logic in itself (e.g., several > different ways and layers of indirection to think about allocation > ratios) rather than working within strong and constraining > contracts. > > From my standpoint there isn't much to talk about here from a > placement code standpoint. We should clearly document the functional > contract (and stick to it) and we should come up with exemplars > for how to make the best use of traits. > > I think this conversation could allow us to find those examples. > > I don't, however, want placement to be a traffic officer for how > people do things. In the context of the orchestration between nova > and ironic and how that interaction happens, nova has every right to > set some guidelines if it needs to. > > -- > Chris Dent ٩◔̯◔۶ https://anticdent.org/ > freenode: cdent tw: > @anticdent______ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- -- Artom Lifshitz Software Engineer, OpenStack Compute DFG __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [infra][nova] Running NFV tests in CI
On Tue, Jul 24, 2018 at 12:30 PM, Clark Boylan wrote: > On Tue, Jul 24, 2018, at 9:23 AM, Artom Lifshitz wrote: >> Hey all, >> >> tl;dr Humbly requesting a handful of nodes to run NFV tests in CI >> >> Intel has their NFV tests tempest plugin [1] and manages a third party >> CI for Nova. Two of the cores on that project (Stephen Finucane and >> Sean Mooney) have now moved to Red Hat, but the point still stands >> that there's a need and a use case for testing things like NUMA >> topologies, CPU pinning and hugepages. >> >> At Red Hat, we also have a similar tempest plugin project [2] that we >> use for downstream whitebox testing. The scope is a bit bigger than >> just NFV, but the main use case is still testing NFV code in an >> automated way. >> >> Given that there's a clear need for this sort of whitebox testing, I >> would like to humbly request a handful of nodes (in the 3 to 5 range) >> from infra to run an "official" Nova NFV CI. The code doing the >> testing would initially be the current Intel plugin, bug we could have >> a separate discussion about keeping "Intel" in the name or forking >> and/or renaming it to something more vendor-neutral. > > The way you request nodes from Infra is through your Zuul configuration. Add > jobs to a project to run tests on the node labels that you want. Aha, thanks, I'll look into that. I was coming from a place of complete ignorance about infra. > > I'm guessing this process doesn't work for NFV tests because you have > specific hardware requirements that are not met by our current VM resources? > If that is the case it would probably be best to start by documenting what is > required and where the existing VM resources fall > short. Well, it should be possible to do most of what we'd like with nested virt and virtual NUMA topologies, though things like hugepages will need host configuration, specifically the kernel boot command [1]. Is that possible with the nodes we have? > In general though we operate on top of donated cloud resources, and if those > do not work we will have to identify a source of resources that would work. Right, as always it comes down to resources and money. I believe historically Red Hat has been opposed to running an upstream third party CI (this is by no means an official Red Hat position, just remembering what I think I heard), but I can always see what I can do. [1] https://docs.openstack.org/nova/latest/admin/huge-pages.html#enabling-huge-pages-on-the-host __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [infra][nova] Running NFV tests in CI
Hey all, tl;dr Humbly requesting a handful of nodes to run NFV tests in CI Intel has their NFV tests tempest plugin [1] and manages a third party CI for Nova. Two of the cores on that project (Stephen Finucane and Sean Mooney) have now moved to Red Hat, but the point still stands that there's a need and a use case for testing things like NUMA topologies, CPU pinning and hugepages. At Red Hat, we also have a similar tempest plugin project [2] that we use for downstream whitebox testing. The scope is a bit bigger than just NFV, but the main use case is still testing NFV code in an automated way. Given that there's a clear need for this sort of whitebox testing, I would like to humbly request a handful of nodes (in the 3 to 5 range) from infra to run an "official" Nova NFV CI. The code doing the testing would initially be the current Intel plugin, bug we could have a separate discussion about keeping "Intel" in the name or forking and/or renaming it to something more vendor-neutral. I won't be at PTG (conflict with personal travel), so I'm kindly asking Stephen and Sean to represent this idea in Denver. Cheers! [1] https://github.com/openstack/intel-nfv-ci-tests [2] https://review.rdoproject.org/r/#/admin/projects/openstack/whitebox-tempest-plugin __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][Cinder][Tempest] Help with tempest.api.compute.servers.test_device_tagging.TaggedAttachmentsTest.test_tagged_attachment needed
I've proposed [1] to add extra logging on the Nova side. Let's see if that helps us catch the root cause of this. [1] https://review.openstack.org/584032 On Thu, Jul 19, 2018 at 12:50 PM, Artom Lifshitz wrote: > Because we're waiting for the volume to become available before we > continue with the test [1], its tag still being present means Nova's > not cleaning up the device tags on volume detach. This is most likely > a bug. I'll look into it. > > [1] > https://github.com/openstack/tempest/blob/master/tempest/api/compute/servers/test_device_tagging.py#L378 > > On Thu, Jul 19, 2018 at 7:09 AM, Slawomir Kaplonski > wrote: >> Hi, >> >> Since some time we see that test >> tempest.api.compute.servers.test_device_tagging.TaggedAttachmentsTest.test_tagged_attachment >> is failing sometimes. >> Bug about that is reported for Tempest currently [1] but after small patch >> [2] was merged I was today able to check what cause this issue. >> >> Test which is failing is in [3] and it looks that everything is going fine >> with it up to last line of test. So volume and port are created, attached, >> tags are set properly, both devices are detached properly also and at the >> end test is failing as in >> http://169.254.169.254/openstack/latest/meta_data.json still has some device >> inside. >> And it looks now from [4] that it is volume which isn’t removed from this >> meta_data.json. >> So I think that it would be good if people from Nova and Cinder teams could >> look at it and try to figure out what is going on there and how it can be >> fixed. >> >> Thanks in advance for help. >> >> [1] https://bugs.launchpad.net/tempest/+bug/1775947 >> [2] https://review.openstack.org/#/c/578765/ >> [3] >> https://github.com/openstack/tempest/blob/master/tempest/api/compute/servers/test_device_tagging.py#L330 >> [4] >> http://logs.openstack.org/69/567369/15/check/tempest-full/528bc75/job-output.txt.gz#_2018-07-19_10_06_09_273919 >> >> — >> Slawek Kaplonski >> Senior software engineer >> Red Hat >> >> >> ______ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > -- > -- > Artom Lifshitz > Software Engineer, OpenStack Compute DFG -- -- Artom Lifshitz Software Engineer, OpenStack Compute DFG __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][Cinder][Tempest] Help with tempest.api.compute.servers.test_device_tagging.TaggedAttachmentsTest.test_tagged_attachment needed
Because we're waiting for the volume to become available before we continue with the test [1], its tag still being present means Nova's not cleaning up the device tags on volume detach. This is most likely a bug. I'll look into it. [1] https://github.com/openstack/tempest/blob/master/tempest/api/compute/servers/test_device_tagging.py#L378 On Thu, Jul 19, 2018 at 7:09 AM, Slawomir Kaplonski wrote: > Hi, > > Since some time we see that test > tempest.api.compute.servers.test_device_tagging.TaggedAttachmentsTest.test_tagged_attachment > is failing sometimes. > Bug about that is reported for Tempest currently [1] but after small patch > [2] was merged I was today able to check what cause this issue. > > Test which is failing is in [3] and it looks that everything is going fine > with it up to last line of test. So volume and port are created, attached, > tags are set properly, both devices are detached properly also and at the end > test is failing as in http://169.254.169.254/openstack/latest/meta_data.json > still has some device inside. > And it looks now from [4] that it is volume which isn’t removed from this > meta_data.json. > So I think that it would be good if people from Nova and Cinder teams could > look at it and try to figure out what is going on there and how it can be > fixed. > > Thanks in advance for help. > > [1] https://bugs.launchpad.net/tempest/+bug/1775947 > [2] https://review.openstack.org/#/c/578765/ > [3] > https://github.com/openstack/tempest/blob/master/tempest/api/compute/servers/test_device_tagging.py#L330 > [4] > http://logs.openstack.org/69/567369/15/check/tempest-full/528bc75/job-output.txt.gz#_2018-07-19_10_06_09_273919 > > — > Slawek Kaplonski > Senior software engineer > Red Hat > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- -- Artom Lifshitz Software Engineer, OpenStack Compute DFG __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard
> Side question... does either approach touch PCI device management during > live migration? Nope. I'd need to do some research to see what, if anything, is needed at the lower levels (kernel, libvirt) to enable this. > I ask because the only workloads I've ever seen that pin guest vCPU threads > to specific host processors -- or make use of huge pages consumed from a > specific host NUMA node -- have also made use of SR-IOV and/or PCI > passthrough. [1] > > If workloads that use PCI passthrough or SR-IOV VFs cannot be live migrated > (due to existing complications in the lower-level virt layers) I don't see > much of a point spending lots of developer resources trying to "fix" this > situation when in the real world, only a mythical workload that uses CPU > pinning or huge pages but *doesn't* use PCI passthrough or SR-IOV VFs would > be helped by it. It's definitely a paint point for at least some of our customers - I don't know their use cases exactly, but live migration with CPU pinning but no other "high performance" features has come up a few times in our downstream bug tracker. In any case, incremental progress is better than no progress at all, so if we can improve how NUMA live migration works, we'll be in a better position to make it work with PCI devices down the road. > [Mooney, Sean K] I would generally agree but with the extention of include > dpdk based vswitch like ovs-dpdk or vpp. > Cpu pinned or hugepage backed guests generally also have some kind of high > performance networking solution or use a hardware > Acclaortor like a gpu to justify the performance assertion that pinning of > cores or ram is required. > Dpdk networking stack would however not require the pci remaping to be > addressed though I belive that is planned to be added in stine. I think Stephen Finucane's NUMA-aware vswitches work depends on mine to work with live migration - ie, it'll work just fine on its own, but to live migrate an instance with a NUMA vswitch (I know I'm abusing language here, apologies) this spec will need to be implemented first. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard
> > As I understand it, Artom is proposing to have a larger race window, > essentially > from when the scheduler selects a node until the resource audit runs on > that node. > Exactly. When writing the spec I thought we could just call the resource tracker to claim the resources when the migration was done. However, when I started looking at the code in reaction to Sahid's feedback, I noticed that there's no way to do it without the MoveClaim context (right?) Keep in mind, we're not making any race windows worse - I'm proposing keeping the status quo and fixing it later with NUMA in placement (or the resource tracker if we can swing it). The resource tracker stuff is just so... opaque. For instance, the original patch [1] uses a mutated_migration_context around the pre_live_migration call to the libvirt driver. Would I still need to do that? Why or why not? At this point we need to commit to something and roll with it, so I'm sticking to the "easy way". If it gets shut down in code review, at least we'll have certainty on how to approach this next cycle. [1] https://review.openstack.org/#/c/244489/ > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard
> Adding > claims support later on wouldn't change any on-the-wire messaging, it would > just make things work more robustly. I'm not even sure about that. Assuming [1] has at least the right idea, it looks like it's an either-or kind of thing: either we use resource tracker claims and get the new instance NUMA topology that way, or do what was in the spec and have the dest send it to the source. That being said, I still think I'm still in favor of choosing the "easy" way out. For instance, [2] should fail because we can't access the api db from the compute node. So unless there's a simpler way, using RT claims would involve changing the RPC to add parameters to check_can_live_migration_destination, which, while not necessarily bad, seems like useless complexity for a thing we know will get ripped out. [1] https://review.openstack.org/#/c/576222/ [2] https://review.openstack.org/#/c/576222/3/nova/compute/manager.py@5897 __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard
> For what it's worth, I think the previous patch languished for a number of > reasons other than the complexity of the code...the original author left, > the coding style was a bit odd, there was an attempt to make it work even if > the source was an earlier version, etc. I think a fresh implementation > would be less complicated to review. I'm afraid of unknowns in the resource tracker and claims mechanism. For snips and giggles, I submitted a quick patch that attempts to use a claim [1] when live migrating instances. Assuming it somehow passes CI, I have no idea if I've just opened rabbit hole of people telling me "oh but you need to do this other thing in this other place." How knows the claims code well, anyways? [1] https://review.openstack.org/576222 __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard
Hey all, For Rocky I'm trying to get live migration to work properly for instances that have a NUMA topology [1]. A question that came up on one of patches [2] is how to handle resources claims on the destination, or indeed whether to handle that at all. The previous attempt's approach [3] (call it A) was to use the resource tracker. This is race-free and the "correct" way to do it, but the code is pretty opaque and not easily reviewable, as evidenced by [3] sitting in review purgatory for literally years. A simpler approach (call it B) is to ignore resource claims entirely for now and wait for NUMA in placement to land in order to handle it that way. This is obviously race-prone and not the "correct" way of doing it, but the code would be relatively easy to review. For the longest time, live migration did not keep track of resources (until it started updating placement allocations). The message to operators was essentially "we're giving you this massive hammer, don't break your fingers." Continuing to ignore resource claims for now is just maintaining the status quo. In addition, there is value in improving NUMA live migration *now*, even if the improvement is incomplete because it's missing resource claims. "Best is the enemy of good" and all that. Finally, making use of the resource tracker is just work that we know will get thrown out once we start using placement for NUMA resources. For all those reasons, I would favor approach B, but I wanted to ask the community for their thoughts. Thanks! [1] https://review.openstack.org/#/q/topic:bp/numa-aware-live-migration+(status:open+OR+status:merged) [2] https://review.openstack.org/#/c/567242/ [3] https://review.openstack.org/#/c/244489/ __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][cinder] Update (swap) of multiattach volume should not be allowed
volume. > > Yes, that is the commit message in its entirety. Of course, the commit had > no documentation at all in it, so there's no ability to understand what the > original use case really was here. > > https://review.openstack.org/#/c/28995/ > > If the use case was really "that a user needs to move volume data for > attached volumes", why not just pause the VM, detach the volume, do a > openstack volume migrate to the new destination, reattach the volume and > start the VM? That would mean no libvirt/QEMU-specific implementation > behaviour leaking out of the public HTTP API and allow the volume service > (Cinder) to do its job properly. > > >> With single attach that's exactly what they get: the end >> user should never notice. With multi-attach they don't get that. We're >> basically forking the shared volume at a point in time, with the >> instance which did the swap writing to the new location while all >> others continue writing to the old location. Except that even the fork >> is broken, because they'll get a corrupt, inconsistent copy rather >> than point in time. I can't think of a use case for this behaviour, >> and it certainly doesn't meet the original design intent. >> >> What they really want is for the multi-attached volume to be copied >> from location a to location b and for all attachments to be updated. >> Unfortunately I don't think we're going to be in a position to do that >> any time soon, but I also think users will be unhappy if they're no >> longer able to move data at all because it's multi-attach. We can >> compromise, though, if we allow a multiattach volume to be moved as >> long as it only has a single attachment. This means the operator can't >> move this data without disruption to users, but at least it's not >> fundamentally immovable. >> >> This would require some cooperation with cinder to achieve, as we need >> to be able to temporarily prevent cinder from allowing new >> attachments. A natural way to achieve this would be to allow a >> multi-attach volume with only a single attachment to be redesignated >> not multiattach, but there might be others. The flow would then be: >> >> Detach volume from server 2 >> Set multiattach=False on volume >> Migrate volume on server 1 >> Set multiattach=True on volume >> Attach volume to server 2 >> >> Combined with a patch to nova to disallow swap_volume on any >> multiattach volume, this would then be possible if inconvenient. >> >> Regardless of any other changes, though, I think it's urgent that we >> disable the ability to swap_volume a multiattach volume because we >> don't want users to start using this relatively new, but broken, >> feature. > > > Or we could deprecate the swap_volume Compute API operation and use Cinder > for all of this. > > But sure, we could also add more cruft to the Compute API and add more > conditional "it works but only when X" docs to the API reference. > > Just my two cents, > -jay > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- -- Artom Lifshitz Software Engineer, OpenStack Compute DFG __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tc][all] A culture change (nitpicking)
> On Tue, May 29, 2018 at 10:52:04AM -0400, Mohammed Naser wrote: > > :On Tue, May 29, 2018 at 10:43 AM, Artom Lifshitz wrote: > :> One idea would be that, once the meat of the patch > :> has passed multiple rounds of reviews and looks good, and what remains > :> is only nits, the reviewer themselves take on the responsibility of > :> pushing a new patch that fixes the nits that they found. > > Doesn't the above suggestion sufficiently address the concern below? > > :I'd just like to point out that what you perceive as a 'finished > :product that looks unprofessional' might be already hard enough for a > :contributor to achieve. We have a lot of new contributors coming from > :all over the world and it is very discouraging for them to have their > :technical knowledge and work be categorized as 'unprofessional' > :because of the language barrier. > : > :git-nit and a few minutes of your time will go a long way, IMHO. > > As very intermittent contributor and native english speaker with > relatively poor spelling and typing I'd be much happier with a > reviewer pushing a patch that fixes nits rather than having a ton of > inline comments that point them out. > > maybe we're all saying the same thing here? Yeah, I feel like we're all essentially in agreement that nits (of the English mistake of typo type) do need to get fixed, but sometimes (often?) putting the burden of fixing them on the original patch contributor is neither fair nor constructive. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tc][all] A culture change (nitpicking)
I dunno, there's a fine line to be drawn between getting a finished product that looks unprofessional (because of typos, English mistakes, etc), and nitpicking to the point of smothering and being counter-productive. One idea would be that, once the meat of the patch has passed multiple rounds of reviews and looks good, and what remains is only nits, the reviewer themselves take on the responsibility of pushing a new patch that fixes the nits that they found. On Tue, May 29, 2018 at 9:55 AM, Julia Kreger wrote: > During the Forum, the topic of review culture came up in session after > session. During these discussions, the subject of our use of nitpicks > were often raised as a point of contention and frustration, especially > by community members that have left the community and that were > attempting to re-engage the community. Contributors raised the point > of review feedback requiring for extremely precise English, or > compliance to a particular core reviewer's style preferences, which > may not be the same as another core reviewer. > > These things are not just frustrating, but also very inhibiting for > part time contributors such as students who may also be time limited. > Or an operator who noticed something that was clearly a bug and that > put forth a very minor fix and doesn't have the time to revise it over > and over. > > While nitpicks do help guide and teach, the consensus seemed to be > that we do need to shift the culture a little bit. As such, I've > proposed a change to our principles[1] in governance that attempts to > capture the essence and spirit of the nitpicking topic as a first > step. > > -Julia > - > [1]: https://review.openstack.org/570940 > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- -- Artom Lifshitz Software Engineer, OpenStack Compute DFG __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Default scheduler filters survey
Thanks everyone for your input! I wrote a small Python script [1] to present all your responses in an understandable format. Here's the output: Filters common to all deployments: {'ComputeFilter', 'ServerGroupAntiAffinityFilter'} Filter counts (out of 9 deployments): ServerGroupAntiAffinityFilter9 ComputeFilter9 AvailabilityZoneFilter 8 ServerGroupAffinityFilter8 AggregateInstanceExtraSpecsFilter8 ImagePropertiesFilter8 RetryFilter 7 ComputeCapabilitiesFilter5 AggregateCoreFilter 4 RamFilter4 PciPassthroughFilter 3 AggregateRamFilter 3 CoreFilter 2 DiskFilter 2 AggregateImagePropertiesIsolation2 SameHostFilter 2 AggregateMultiTenancyIsolation 1 NUMATopologyFilter 1 AggregateDiskFilter 1 DifferentHostFilter 1 Based on that, we can definitely say that SameHostFilter and DifferentHostFilter do *not* belong in the defaults. In fact, we got our defaults pretty spot on, based on this admittedly very limited dataset. The only frequently occurring filter that's not in our defaults is AggregateInstanceExtraSpecsFilter. [1] https://gist.github.com/notartom/0819df7c3cb9d02315bfabe5630385c9 On Fri, Apr 27, 2018 at 8:10 PM, Lingxian Kong wrote: > At Catalyst Cloud: > > RetryFilter > AvailabilityZoneFilter > RamFilter > ComputeFilter > AggregateCoreFilter > DiskFilter > AggregateInstanceExtraSpecsFilter > ImagePropertiesFilter > ServerGroupAntiAffinityFilter > SameHostFilter > > Cheers, > Lingxian Kong > > > On Sat, Apr 28, 2018 at 3:04 AM Jim Rollenhagen > wrote: >> >> On Wed, Apr 18, 2018 at 11:17 AM, Artom Lifshitz >> wrote: >>> >>> Hi all, >>> >>> A CI issue [1] caused by tempest thinking some filters are enabled >>> when they're really not, and a proposed patch [2] to add >>> (Same|Different)HostFilter to the default filters as a workaround, has >>> led to a discussion about what filters should be enabled by default in >>> nova. >>> >>> The default filters should make sense for a majority of real world >>> deployments. Adding some filters to the defaults because CI needs them >>> is faulty logic, because the needs of CI are different to the needs of >>> operators/users, and the latter takes priority (though it's my >>> understanding that a good chunk of operators run tempest on their >>> clouds post-deployment as a way to validate that the cloud is working >>> properly, so maybe CI's and users' needs aren't that different after >>> all). >>> >>> To that end, we'd like to know what filters operators are enabling in >>> their deployment. If you can, please reply to this email with your >>> [filter_scheduler]/enabled_filters (or >>> [DEFAULT]/scheduler_default_filters if you're using an older version) >>> option from nova.conf. Any other comments are welcome as well :) >> >> >> At Oath: >> >> AggregateImagePropertiesIsolation >> ComputeFilter >> CoreFilter >> DifferentHostFilter >> SameHostFilter >> ServerGroupAntiAffinityFilter >> ServerGroupAffinityFilter >> AvailabilityZoneFilter >> AggregateInstanceExtraSpecsFilter >> >> // jim >> >> __ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > -- -- Artom Lifshitz Software Engineer, OpenStack Compute DFG __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] Default scheduler filters survey
Hi all, A CI issue [1] caused by tempest thinking some filters are enabled when they're really not, and a proposed patch [2] to add (Same|Different)HostFilter to the default filters as a workaround, has led to a discussion about what filters should be enabled by default in nova. The default filters should make sense for a majority of real world deployments. Adding some filters to the defaults because CI needs them is faulty logic, because the needs of CI are different to the needs of operators/users, and the latter takes priority (though it's my understanding that a good chunk of operators run tempest on their clouds post-deployment as a way to validate that the cloud is working properly, so maybe CI's and users' needs aren't that different after all). To that end, we'd like to know what filters operators are enabling in their deployment. If you can, please reply to this email with your [filter_scheduler]/enabled_filters (or [DEFAULT]/scheduler_default_filters if you're using an older version) option from nova.conf. Any other comments are welcome as well :) Cheers! [1] https://bugs.launchpad.net/tempest/+bug/1628443 [2] https://review.openstack.org/#/c/561651/ __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] EC2 cleanup ?
> That is easier said than done. There have been a couple of related attempts > in the past: > > https://review.openstack.org/#/c/266425/ > > https://review.openstack.org/#/c/282872/ > > I don't remember exactly where those fell down, but it's worth looking at > this first before trying to do this again. Interesting. [1] exists, and I'm pretty sure that we ship it as part of Red Hat OpenStack (but I'm not a PM and this is not an official Red Hat stance, just me and my memory), so it works well enough. If we have things that depend on our in-tree ec2 api, maybe we need to get them moved over to [1]? [1] https://github.com/openstack/ec2-api __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Device tagging: rebuild config drive upon instance reboot to refresh metadata on it
> - virtio-vsock - think of this as UNIX domain sockets between the host and >guest. This is to deal with the valid use case of people wanting to use >a network protocol, but not wanting an real NIC exposed to the guest/host >for security concerns. As such I think it'd be useful to run the metadata >service over virtio-vsock as an option. It'd likely address at lesat some >people's security concerns wrt metadata service. It would also fix the >ability to use the metadat service in IPv6-only environments, as we would >not be using IP at all :-) Is this currently exposed by libvirt? I had a look at [1] and couldn't find any mention of 'vsock' or anything that resembles what you've described. [1] https://libvirt.org/formatdomain.html __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Device tagging: rebuild config drive upon instance reboot to refresh metadata on it
> But before doing that though, I think it'd be worth understanding whether > metadata-over-vsock support would be acceptable to people who refuse > to deploy metadata-over-TCPIP today. I wrote a thing [1], let's see what happens. [1] http://lists.openstack.org/pipermail/openstack-operators/2017-February/012724.html __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Device tagging: rebuild config drive upon instance reboot to refresh metadata on it
>> But before doing that though, I think it'd be worth understanding whether >> metadata-over-vsock support would be acceptable to people who refuse >> to deploy metadata-over-TCPIP today. > > Sure, although I'm still concerned that it'll effectively make tagged > hotplug libvirt-only. Upon rethink, that not strictly true, there's still the existing metadata service that works across all hypervisor drivers. I know we're far for feature parity across all virt drivers, but would metadata-over-vsock be acceptable? That's not even lack of feature parity, that's a specific feature being exposed in a different (and arguably worse) way depending on the virt driver. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Device tagging: rebuild config drive upon instance reboot to refresh metadata on it
> But before doing that though, I think it'd be worth understanding whether > metadata-over-vsock support would be acceptable to people who refuse > to deploy metadata-over-TCPIP today. Sure, although I'm still concerned that it'll effectively make tagged hotplug libvirt-only. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Device tagging: rebuild config drive upon instance reboot to refresh metadata on it
I don't think we're trying to re-invent configuration management in Nova. We have this problem where we want to communicate to the guest, from the host, a bunch of dynamic metadata that can change throughout the guest's lifetime. We currently have two possible avenues for this already in place, and both have problems: 1. The metadata service isn't universally deployed by operators for security and other reasons. 2. The config drive was never designed for dynamic metadata. So far in this thread we've mostly been discussing ways to shoehorn a solution into the config drive avenue, but that's going to be ugly no matter what because it was never designed for what we're trying to do in the first place. Some folks are saying that we admit that the config drive is only for static information and metadata that is known at boot time, and work on a third way to communicate dynamic metadata to the guest. I can get behind that 100%. I like the virtio-vsock option, but that's only supported by libvirt IIUC. We've got device tagging support in hyper-v as well, and xenapi hopefully on the way soon [1], so we need something a bit more universal. How about fixing up the metadata service to be more deployable, both in terms of security, and IPv6 support? [1] https://review.openstack.org/#/c/333781/ On Mon, Feb 20, 2017 at 10:35 AM, Clint Byrum wrote: > Excerpts from Jay Pipes's message of 2017-02-20 10:00:06 -0500: >> On 02/17/2017 02:28 PM, Artom Lifshitz wrote: >> > Early on in the inception of device role tagging, it was decided that >> > it's acceptable that the device metadata on the config drive lags >> > behind the metadata API, as long as it eventually catches up, for >> > example when the instance is rebooted and we get a chance to >> > regenerate the config drive. >> > >> > So far this hasn't really been a problem because devices could only be >> > tagged at instance boot time, and the tags never changed. So the >> > config drive was pretty always much up to date. >> > >> > In Pike the tagged device attachment series of patches [1] will >> > hopefully merge, and we'll be in a situation where device tags can >> > change during instance uptime, which makes it that much more important >> > to regenerate the config drive whenever we get a chance. >> > >> > However, when the config drive is first generated, some of the >> > information stored in there is only available at instance boot time >> > and is not persisted anywhere, as far as I can tell. Specifically, the >> > injected_files and admin_pass parameters [2] are passed from the API >> > and are not stored anywhere. >> > >> > This creates a problem when we want to regenerated the config drive, >> > because the information that we're supposed to put in it is no longer >> > available to us. >> > >> > We could start persisting this information in instance_extra, for >> > example, and pulling it up when the config drive is regenerated. We >> > could even conceivably hack something to read the metadata files from >> > the "old" config drive before refreshing them with new information. >> > However, is that really worth it? I feel like saying "the config drive >> > is static, deal with it - if you want to up to date metadata, use the >> > API" is an equally, if not more, valid option. >> >> Yeah, config drive should, IMHO, be static, readonly. If you want to >> change device tags or other configuration data after boot, use a >> configuration management system or something like etcd watches. I don't >> think Nova should be responsible for this. > > I tend to agree with you, and I personally wouldn't write apps that need > this. However, in the interest of understanding the desire to change this, > I think the scenario is this: > > 1) Servers are booted with {n_tagged_devices} and come up, actions happen > using automated thing that reads device tags and reacts accordingly. > > 2) A new device is added to the general configuration. > > 3) New servers configure themselves with the new devices automatically. But > existing servers do not have those device tags in their config drive. In > order to configure these, one would now have to write a fair amount of > orchestration to duplicate what already exists for new servers. > > While I'm a big fan of the cattle approach (just delete those old > servers!) I don't think OpenStack is constrained enough to say that > this is always going to be efficient. And writing two paths for server > configuration feels like repeating yoursel
Re: [openstack-dev] [nova] Device tagging: rebuild config drive upon instance reboot to refresh metadata on it
Config drive over read-only NFS anyone? A shared filesystem so that both Nova and the guest can do IO on it at the same time is indeed the proper way to solve this. But I'm afraid of the ramifications in terms of live migrations and all other operations we can do on VMs... Michael On Sun, Feb 19, 2017 at 6:12 AM, Steve Gordon wrote: > - Original Message - > > From: "Artom Lifshitz" > > To: "OpenStack Development Mailing List (not for usage questions)" < > openstack-dev@lists.openstack.org> > > Sent: Saturday, February 18, 2017 8:11:10 AM > > Subject: Re: [openstack-dev] [nova] Device tagging: rebuild config drive > upon instance reboot to refresh metadata on > > it > > > > In reply to Michael: > > > > > We have had this discussion several times in the past for other > reasons. > > > The > > > reality is that some people will never deploy the metadata API, so I > feel > > > like we need a better solution than what we have now. > > > > Aha, that's definitely a good reason to continue making the config > > drive a first-class citizen. > > The other reason is that the metadata API as it stands isn't an option for > folks trying to do IPV6-only IIRC. > > -Steve > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > -- Rackspace Australia __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Device tagging: rebuild config drive upon instance reboot to refresh metadata on it
A few good points were made: * the config drive could be VFAT, in which case we can't trust what's on it because the guest has write access * if the config drive is ISO9660, we can't selectively write to it, we need to regenerate the whole thing - but in this case it's actually safe to read from (right?) * the point about the precedent being set that the config drive doesn't change... I'm not sure I 100% agree. There's definitely a precedent that information on the config drive will remain present for the entire instance lifetime (so the admin_pass won't disappear after a reboot, even if using that "feature" in a workflow seems ludicrous), but we've made no promises that the information itself will remain constant. For example, nothing says the device metadata must remain unchanged after a reboot. Based on that here's what I propose: If the config drive is vfat, we can just update the information on it that we need to update. In the device metadata case, we write a new JSON file, overwriting the old one. If the config drive is ISO9660, we can safely read from it to fill in what information isn't persisted anywhere else, then update it with the new stuff we want to change. Then write out the new image. On Sat, Feb 18, 2017 at 12:36 PM, Dean Troyer wrote: > On Sat, Feb 18, 2017 at 10:23 AM, Clint Byrum wrote: >> But I believe Michael is not saying "it's unsafe to read the json >> files" but rather "it's unsafe to read the whole config drive". It's >> an ISO filesystem, so you can't write to it. You have to read the whole >> contents back into a directory and regenerate it. I'm guessing Michael >> is concerned that there is some danger in doing this, though I can't >> imagine what it is. > > Nova can be configured for config drive to be a VFAT filesystem, which > can not be trusted. Unfortunately this is (was??) required for > libvirt live migration to work so is likely to not be an edge case in > deployments. > > The safest read-back approach would be to generate both ISO9660 and > VFAT (if configured) and only read back from the ISO version. But > yuck, two config drive images...still better than passwords in the > database. > > dt > > -- > > Dean Troyer > dtro...@gmail.com > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- -- Artom Lifshitz __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Device tagging: rebuild config drive upon instance reboot to refresh metadata on it
In reply to Michael: > We have had this discussion several times in the past for other reasons. The > reality is that some people will never deploy the metadata API, so I feel > like we need a better solution than what we have now. Aha, that's definitely a good reason to continue making the config drive a first-class citizen. > However, I would consider it probably unsafe for the hypervisor to read the > current config drive to get values Yeah, I was using the word "hack" very generously ;) > and persisting things like the instance > root password in the Nova DB sounds like a bad idea too. I hadn't even thought of the security implication. That's a very good point, there's no way to persist admin_pass in securely. We'll have to read it at some point, so no amount of encryption will change anything. We can argue that since we already store admin_pass on the config drive, storing it in the database as well is OK (it's probably immediately changed anyways), but there's a difference between having it in a file on a single compute node, and in the database accessible by the entire deployment. In reply to Clint: > Agreed. What if we simply have a second config drive that is for "things > that change" and only rebuild that one on reboot? We've already set the precedent that there's a single config drive with the device tagging metadata on it, I don't think we can go back on that promise. So while we shouldn't read from the config drive to get current values in order to afterwards monolithically regenerate a new one, we could try just writing to the files we want changed. I'm thinking of a system where code that needs to change information on the config drive would have a way of telling it "here are the new values for device_metadata", and whenever we next get a chance, for example when the instance is rebooted, those values are saved on the config drive. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] Device tagging: rebuild config drive upon instance reboot to refresh metadata on it
Early on in the inception of device role tagging, it was decided that it's acceptable that the device metadata on the config drive lags behind the metadata API, as long as it eventually catches up, for example when the instance is rebooted and we get a chance to regenerate the config drive. So far this hasn't really been a problem because devices could only be tagged at instance boot time, and the tags never changed. So the config drive was pretty always much up to date. In Pike the tagged device attachment series of patches [1] will hopefully merge, and we'll be in a situation where device tags can change during instance uptime, which makes it that much more important to regenerate the config drive whenever we get a chance. However, when the config drive is first generated, some of the information stored in there is only available at instance boot time and is not persisted anywhere, as far as I can tell. Specifically, the injected_files and admin_pass parameters [2] are passed from the API and are not stored anywhere. This creates a problem when we want to regenerated the config drive, because the information that we're supposed to put in it is no longer available to us. We could start persisting this information in instance_extra, for example, and pulling it up when the config drive is regenerated. We could even conceivably hack something to read the metadata files from the "old" config drive before refreshing them with new information. However, is that really worth it? I feel like saying "the config drive is static, deal with it - if you want to up to date metadata, use the API" is an equally, if not more, valid option. Thoughts? I know y'all are flying out to the PTG, so I'm unlikely to get responses, but I've at least put my thoughts into writing, and will be able to refer to them later on :) [1] https://review.openstack.org/#/q/status:open+topic:bp/virt-device-tagged-attach-detach [2] https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L2667-L2672 -- Artom Lifshitz __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Device tag in the API breaks in the old microversion
The more urgent stuff has indeed merged - many thanks to Matt and other cores for getting this in quickly before rc1. The fixes to tests do indeed need more attention, which I will provide :) On Mon, Jan 30, 2017 at 8:49 PM, Matt Riedemann wrote: > On 1/26/2017 8:32 PM, Artom Lifshitz wrote: >> >> Since the consensus is to fix this with a new microversion, I've >> submitted some patches: >> >> * https://review.openstack.org/#/c/426030/ >> A spec for the new microversion in case folks want one. > > > Merged. > >> >> * https://review.openstack.org/#/c/424759/ >> The new microversion itself. I've already had feedback from Alex and >> Ghanshyam (thanks guys!), and I've tried to address it. > > > +2 from me, +1 from gmann. The Tempest patch for the 2.42 microversion is > here: > > https://review.openstack.org/#/c/426991/1 > >> >> * https://review.openstack.org/#/c/425876/ >> A patch to - as Alex and Sean suggested - stop passing plain string >> version to the schema extension point. >> > > Needs some work. > > > -- > > Thanks, > > Matt Riedemann > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- -- Artom Lifshitz __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Device tag in the API breaks in the old microversion
Since the consensus is to fix this with a new microversion, I've submitted some patches: * https://review.openstack.org/#/c/426030/ A spec for the new microversion in case folks want one. * https://review.openstack.org/#/c/424759/ The new microversion itself. I've already had feedback from Alex and Ghanshyam (thanks guys!), and I've tried to address it. * https://review.openstack.org/#/c/425876/ A patch to - as Alex and Sean suggested - stop passing plain string version to the schema extension point. On Tue, Jan 24, 2017 at 10:38 PM, Matt Riedemann wrote: > On 1/24/2017 8:16 PM, Alex Xu wrote: >> >> >> >> One other thing: we're going to need to also fix this in >> python-novaclient, which we might want to do first, or work >> concurrently, since that's going to give us the client side >> perspective on how gross it will be to deal with this issue. >> >> > > This is Andrey's patch to at least document the limitation: > > https://review.openstack.org/#/c/424745/ > > We'll have to fix the client to use the new microversion in Pike (or at > least release the fix in Pike) since the client release freeze is Thursday. > > > -- > > Thanks, > > Matt Riedemann > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- -- Artom Lifshitz __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Device tag in the API breaks in the old microversion
> So the current API behavior is as below: > > 2.32: BDM tag and network device tag added. > 2.33 - 2.36: 'tag' in the BDM disappeared. The network device tag still > works. > 2.37: The network device tag disappeared also. Thanks for the summary. For the visual minded like me, I made some ASCII art of the above: http://paste.openstack.org/raw/596225/ > There are few questions we should think about: > > 1. Should we fix that by Microversion? > Thanks to Chris Dent point that out in the review. I also think we need > to bump Microversion, which follow the rule of Microversion. I don't think we have a choice - we'd be adding new API parameters that didn't exist in, for example, 2.39. > 2. If we need Microversion, is that something we can do before release? > We are very close to the feature freeze. And in normal, we need spec for > microversion. Maybe we only can do that in Pike. For now we can update the > API-ref, and microversion history to notice that, maybe a reno also. I think it's too late before FF to do any functional fixes. I vote we document our screw up in the api-ref at least, and during Pike we can merge a new microversion that fixes this mess. > 2. How can we prevent that happened again? >Both of those patches were reviewed multiple cycles. But we still miss > that. It is worth to think about how to prevent that happened again. > >Talk with Sean. He suggests stop passing plain string version to the > schema extension point. We should always pass APIVersionRequest object > instead of plain string. Due to "version == APIVersionRequest('2.32')" is > always wrong, we should remove the '__eq__'. The developer should always use > the 'APIVersionRequest.matches' [3] method. This looks like a smart way to make sure all API version comparisons are of the less than/greater than kind. >That can prevent the first mistake we made. But nothing help for second > mistake. Currently we only run the test on the specific Microversion for the > specific interesting point. In the before, the new tests always inherits > from the previous microversion tests, just like [4]. That can test the old > API behavior won't be changed in the new microversion. But now, we said that > is waste, we didn't do that again just like [5]. Should we change that back? An idea would be to run all functional tests against 2.latest. This doesn't cover all microversions, but since as time progresses and 2.latest increases, all previous microversions will have been covered in the past, and it gives us some confidence that we didn't break anything. This doesn't work for patches that removed an API parameter for example, so those kinds of changes will have to be an exception. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][libvirt] Lets make libvirt's domain XML canonical
l of 'management': >> it's the minutia of the hypervisor. In order to persist all of these things >> in Nova we'd have to implement them explicitly, and when libvirt/kvm grows >> more stuff we'll have to do that too. We'll need to mirror the >> functionality of libvirt in Nova, feature for feature. This is a red flag >> for me, and I think it means we should switch to libvirt being canonical. >> >> I think we should be able to create a domain, but once created we should >> never redefine a domain. We can do adding and removing devices dynamically >> using libvirt's apis, secure in the knowledge that libvirt will persist >> this for us. When we upgrade the host, libvirt can ensure we don't break >> guests which are on it. Evacuate should be pretty much the only reason to >> start again. > > And in fact we do persist the guest XML with libvirt already. We sadly > never use that info though - we just blindly overwrite it every time > with newly generated XML. > > Fixing this should not be technically difficult for the most part. > >> I raised this in the live migration sub-team meeting, and the immediate >> response was understandably conservative. I think this solves more problems >> than it creates, though, and it would result in Nova's libvirt driver >> getting a bit smaller and a bit simpler. That's a big win in my book. > > I don't think it'll get significantly smaller/simpler, but it will > definitely be more intelligent and user friendly to do this IMHO. > As mentioned above, I think the windows license reactivation issue > alone is enough of a reason todo this. > > Regards, > Daniel > -- > |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| > |: http://libvirt.org -o- http://virt-manager.org :| > |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| > |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- -- Artom Lifshitz __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Non-priority feature freeze and FFEs
> The Hyper-V implementation of the bp virt-device-role-tagging is mergeable > [1]. The patch is quite simple, it got some reviews, and the tempest test > test_device_tagging [2] passed. [3] > > [1] https://review.openstack.org/#/c/331889/ > [2] https://review.openstack.org/#/c/305120/ > [3] http://64.119.130.115/debug/nova/331889/8/04-07-2016_19-43/results.html.gz For what it's worth, the implementation for libvirt and all the plumbing in the API, metadata API, compute manager, etc, has merged, so this can be thought of as a continuation of that same patch series. There's the XenAPI implementation [4] as well, but that's not mergeable in its current state. [4] https://review.openstack.org/#/c/333781/ __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Nova] Virt device role tagging for other virt drivers
Hey folks, For a while now we've been working on virt device role tagging. The full spec is here [1], but the quick gist of it is that device tagging is a way for the user to assign arbitrary string tags to either vNICs or block devices. Those tags then get exposed by the metadata API to the guest, along with other device metadata such as bus and address, for example PCI :00:02.0. This work is being done for the libvirt driver, and we would obviously love it if other drivers implemented the functionality. This email is meant to get this cooperation started. A good starting point for developers of other drivers is our own libvirt implementation [2]. The basic idea is that we use new objects from [3] to build the metadata hierarchy. The hierarchy is then saved in the database in the instance_extra table, of which you can see the details here [4]. This is pretty much the only functionality that other virt drivers would need to implement. Everything else (API, metadata API) is being handled by us, though of course we welcome your feedback. I hope I've been concise yet complete. If you have any questions don't hesitate to ask either vladikr or artom on IRC. Cheers! [1] http://specs.openstack.org/openstack/nova-specs/specs/newton/approved/virt-device-role-tagging.html [2] https://review.openstack.org/#/c/264016/42/nova/virt/libvirt/driver.py [3] https://github.com/openstack/nova/blob/master/nova/objects/virt_device_metadata.py [4] https://review.openstack.org/#/c/327920/ -- Artom Lifshitz __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Nova] Virtual device role tagging
Hello, I'd like to get the conversation started around a spec that my colleague Dan Berrange has proposed to the backlog. The spec [1] solves the problem of passing information about virtual devices into an instance. For example, in an instance with multiple network interfaces, each connected to profoundly different networks, software running inside the instance needs to know each NIC's role. Similarly, in an instance with multiple disks, each intended for different a usage, software inside the instance needs to know each disk's role. I feel like a lot of discussion will happen around this spec before it can be merged - hopefully in the M cycle - so I'm requesting comments and suggestions very early ;) Thanks all! [1] https://review.openstack.org/#/c/195662/1 __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Nova] Add config option for real deletes instead of soft-deletes
Hello, I'd like to gauge acceptance of introducing a feature that would give operators a config option to perform real database deletes instead of soft deletes. There's definitely a need for *something* that cleans up the database. There have been a few attempts at a DB purge engine [1][2][3][4][5], and archiving to shadow tables has been merged [6] (though that currently has some issues [7]). DB archiving notwithstanding, the general response to operators when they mention the database becoming too big seems to be "DIY cleanup." I would like to propose a different approach: add a config option that turns soft-deletes into real deletes, and start telling operators "if you turn this on, it's DIY backups." Would something like that be acceptable and feasible? I'm ready to put in the work to implement this, however searching the mailing list indicates that it would be somewhere between non trivial and impossible [8]. Before I start, I would like some confidence that it's closer to the former than the latter :) Cheers! [1] https://blueprints.launchpad.net/nova/+spec/db-purge-engine [2] https://blueprints.launchpad.net/nova/+spec/db-purge2 [3] https://blueprints.launchpad.net/nova/+spec/remove-db-archiving [4] https://blueprints.launchpad.net/nova/+spec/database-purge [5] https://blueprints.launchpad.net/nova/+spec/db-archiving [6] https://review.openstack.org/#/c/18493/ [7] https://review.openstack.org/#/c/109201/ [8] http://lists.openstack.org/pipermail/openstack-operators/2014-November/005591.html __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev