On Mon, Jun 26, 2017 at 10:19:12AM +0200, Henning Schild wrote: > Am Sun, 25 Jun 2017 10:09:10 +0200 > schrieb Sahid Orentino Ferdjaoui <[email protected]>: > > > On Fri, Jun 23, 2017 at 10:34:26AM -0600, Chris Friesen wrote: > > > On 06/23/2017 09:35 AM, Henning Schild wrote: > > > > Am Fri, 23 Jun 2017 11:11:10 +0200 > > > > schrieb Sahid Orentino Ferdjaoui <[email protected]>: > > > > > > > > In Linux RT context, and as you mentioned, the non-RT vCPU can > > > > > acquire some guest kernel lock, then be pre-empted by emulator > > > > > thread while holding this lock. This situation blocks RT vCPUs > > > > > from doing its work. So that is why we have implemented [2]. > > > > > For DPDK I don't think we have such problems because it's > > > > > running in userland. > > > > > > > > > > So for DPDK context I think we could have a mask like we have > > > > > for RT and basically considering vCPU0 to handle best effort > > > > > works (emulator threads, SSH...). I think it's the current > > > > > pattern used by DPDK users. > > > > > > > > DPDK is just a library and one can imagine an application that has > > > > cross-core communication/synchronisation needs where the emulator > > > > slowing down vpu0 will also slow down vcpu1. You DPDK application > > > > would have to know which of its cores did not get a full pcpu. > > > > > > > > I am not sure what the DPDK-example is doing in this discussion, > > > > would that not just be cpu_policy=dedicated? I guess normal > > > > behaviour of dedicated is that emulators and io happily share > > > > pCPUs with vCPUs and you are looking for a way to restrict > > > > emulators/io to a subset of pCPUs because you can live with some > > > > of them beeing not 100%. > > > > > > Yes. A typical DPDK-using VM might look something like this: > > > > > > vCPU0: non-realtime, housekeeping and I/O, handles all virtual > > > interrupts and "normal" linux stuff, emulator runs on same pCPU > > > vCPU1: realtime, runs in tight loop in userspace processing packets > > > vCPU2: realtime, runs in tight loop in userspace processing packets > > > vCPU3: realtime, runs in tight loop in userspace processing packets > > > > > > In this context, vCPUs 1-3 don't really ever enter the kernel, and > > > we've offloaded as much kernel work as possible from them onto > > > vCPU0. This works pretty well with the current system. > > > > > > > > For RT we have to isolate the emulator threads to an additional > > > > > pCPU per guests or as your are suggesting to a set of pCPUs for > > > > > all the guests running. > > > > > > > > > > I think we should introduce a new option: > > > > > > > > > > - hw:cpu_emulator_threads_mask=^1 > > > > > > > > > > If on 'nova.conf' - that mask will be applied to the set of all > > > > > host CPUs (vcpu_pin_set) to basically pack the emulator threads > > > > > of all VMs running here (useful for RT context). > > > > > > > > That would allow modelling exactly what we need. > > > > In nova.conf we are talking absolute known values, no need for a > > > > mask and a set is much easier to read. Also using the same name > > > > does not sound like a good idea. > > > > And the name vcpu_pin_set clearly suggest what kind of load runs > > > > here, if using a mask it should be called pin_set. > > > > > > I agree with Henning. > > > > > > In nova.conf we should just use a set, something like > > > "rt_emulator_vcpu_pin_set" which would be used for running the > > > emulator/io threads of *only* realtime instances. > > > > I'm not agree with you, we have a set of pCPUs and we want to > > substract some of them for the emulator threads. We need a mask. The > > only set we need is to selection which pCPUs Nova can use > > (vcpus_pin_set). > > At that point it does not really matter whether it is a set or a mask. > They can both express the same where a set is easier to read/configure. > With the same argument you could say that vcpu_pin_set should be a mask > over the hosts pcpus. > > As i said before: vcpu_pin_set should be renamed because all sorts of > threads are put here (pcpu_pin_set?). But that would be a bigger change > and should be discussed as a seperate issue. > > So far we talked about a compute-node for realtime only doing realtime. > In that case vcpu_pin_set + emulator_io_mask would work. If you want to > run regular VMs on the same host, you can run a second nova, like we do. > > We could also use vcpu_pin_set + rt_vcpu_pin_set(/mask). I think that > would allow modelling all cases in just one nova. Having all in one > nova, you could potentially repurpose rt cpus to best-effort and back. > Some day in the future ...
That is not something we should allow or at least advertise. compute-node can't run both RT and non-RT guests and that because the nodes should have a kernel RT. We can't guarantee RT if both are on same nodes. The realtime nodes should be isolated by aggregates as you seem to do. > > > We may also want to have "rt_emulator_overcommit_ratio" to control > > > how many threads/instances we allow per pCPU. > > > > Not really sure to have understand this point? If it is to indicate > > that for a pCPU isolated we want X guest emulator threads, the same > > behavior is achieved by the mask. A host for realtime is dedicated for > > realtime, no overcommitment and the operators know the number of host > > CPUs, they can easily deduct a ratio and so the corresponding mask. > > Agreed. > > > > > > If on flavor extra-specs It will be applied to the vCPUs > > > > > dedicated for the guest (useful for DPDK context). > > > > > > > > And if both are present the flavor wins and nova.conf is > > > > ignored? > > > > > > In the flavor I'd like to see it be a full bitmask, not an > > > exclusion mask with an implicit full set. Thus the end-user could > > > specify "hw:cpu_emulator_threads_mask=0" and get the emulator > > > threads to run alongside vCPU0. > > > > Same here, I'm not agree, the only set is the vCPUs of the guest. Then > > we want a mask to substract some of them. > > The current mask is fine, but using the same name in nova.conf and in > the flavor does not seem like a good idea. I do not see any problem with that, only operators are going to set this option on nova.conf or flavor extra-specs. I think we are agree on the general aspect. I'm going to update the current spec for Q and see whether we could make it. s. > Henning > > > > Henning, there is no conflict, the nova.conf setting and the flavor > > > setting are used for two different things. > > > > > > Chris > > > > > > __________________________________________________________________________ > > > OpenStack Development Mailing List (not for usage questions) > > > Unsubscribe: > > > [email protected]?subject:unsubscribe > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > __________________________________________________________________________ > > OpenStack Development Mailing List (not for usage questions) > > Unsubscribe: > > [email protected]?subject:unsubscribe > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
