Re: [openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard

2018-06-21 Thread Chris Friesen
On 06/21/2018 07:04 AM, Artom Lifshitz wrote: As I understand it, Artom is proposing to have a larger race window, essentially from when the scheduler selects a node until the resource audit runs on that node. Exactly. When writing the spec I thought we could just call the

Re: [openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard

2018-06-21 Thread Chris Friesen
On 06/21/2018 07:50 AM, Mooney, Sean K wrote: -Original Message- From: Jay Pipes [mailto:jaypi...@gmail.com] Side question... does either approach touch PCI device management during live migration? I ask because the only workloads I've ever seen that pin guest vCPU threads to

Re: [openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard

2018-06-21 Thread Sahid Orentino Ferdjaoui
On Thu, Jun 21, 2018 at 09:36:58AM -0400, Jay Pipes wrote: > On 06/18/2018 10:16 AM, Artom Lifshitz wrote: > > Hey all, > > > > For Rocky I'm trying to get live migration to work properly for > > instances that have a NUMA topology [1]. > > > > A question that came up on one of patches [2] is

Re: [openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard

2018-06-21 Thread Artom Lifshitz
> Side question... does either approach touch PCI device management during > live migration? Nope. I'd need to do some research to see what, if anything, is needed at the lower levels (kernel, libvirt) to enable this. > I ask because the only workloads I've ever seen that pin guest vCPU threads

Re: [openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard

2018-06-21 Thread Mooney, Sean K
> -Original Message- > From: Jay Pipes [mailto:jaypi...@gmail.com] > Sent: Thursday, June 21, 2018 2:37 PM > To: openstack-dev@lists.openstack.org > Subject: Re: [openstack-dev] [nova] NUMA-aware live migration: easy but > incomplete vs complete but hard > >

Re: [openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard

2018-06-21 Thread Jay Pipes
On 06/18/2018 10:16 AM, Artom Lifshitz wrote: Hey all, For Rocky I'm trying to get live migration to work properly for instances that have a NUMA topology [1]. A question that came up on one of patches [2] is how to handle resources claims on the destination, or indeed whether to handle that

Re: [openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard

2018-06-21 Thread Artom Lifshitz
> > As I understand it, Artom is proposing to have a larger race window, > essentially > from when the scheduler selects a node until the resource audit runs on > that node. > Exactly. When writing the spec I thought we could just call the resource tracker to claim the resources when the

Re: [openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard

2018-06-21 Thread Sahid Orentino Ferdjaoui
On Mon, Jun 18, 2018 at 10:16:05AM -0400, Artom Lifshitz wrote: > Hey all, > > For Rocky I'm trying to get live migration to work properly for > instances that have a NUMA topology [1]. > > A question that came up on one of patches [2] is how to handle > resources claims on the destination, or

Re: [openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard

2018-06-20 Thread Chris Friesen
On 06/20/2018 10:00 AM, Sylvain Bauza wrote: When we reviewed the spec, we agreed as a community to say that we should still get race conditions once the series is implemented, but at least it helps operators. Quoting : "It would also be possible for another instance to steal NUMA resources

Re: [openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard

2018-06-20 Thread Sylvain Bauza
On Tue, Jun 19, 2018 at 9:59 PM, Artom Lifshitz wrote: > > Adding > > claims support later on wouldn't change any on-the-wire messaging, it > would > > just make things work more robustly. > > I'm not even sure about that. Assuming [1] has at least the right > idea, it looks like it's an

Re: [openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard

2018-06-19 Thread Chris Friesen
On 06/19/2018 01:59 PM, Artom Lifshitz wrote: Adding claims support later on wouldn't change any on-the-wire messaging, it would just make things work more robustly. I'm not even sure about that. Assuming [1] has at least the right idea, it looks like it's an either-or kind of thing: either we

Re: [openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard

2018-06-19 Thread Artom Lifshitz
> Adding > claims support later on wouldn't change any on-the-wire messaging, it would > just make things work more robustly. I'm not even sure about that. Assuming [1] has at least the right idea, it looks like it's an either-or kind of thing: either we use resource tracker claims and get the

Re: [openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard

2018-06-18 Thread Artom Lifshitz
> For what it's worth, I think the previous patch languished for a number of > reasons other than the complexity of the code...the original author left, > the coding style was a bit odd, there was an attempt to make it work even if > the source was an earlier version, etc. I think a fresh

Re: [openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard

2018-06-18 Thread Chris Friesen
On 06/18/2018 08:16 AM, Artom Lifshitz wrote: Hey all, For Rocky I'm trying to get live migration to work properly for instances that have a NUMA topology [1]. A question that came up on one of patches [2] is how to handle resources claims on the destination, or indeed whether to handle that

[openstack-dev] [nova] NUMA-aware live migration: easy but incomplete vs complete but hard

2018-06-18 Thread Artom Lifshitz
Hey all, For Rocky I'm trying to get live migration to work properly for instances that have a NUMA topology [1]. A question that came up on one of patches [2] is how to handle resources claims on the destination, or indeed whether to handle that at all. The previous attempt's approach [3]