Hi Igor, Thanks for replying back and sorry for the late reply. I wanted to make sure that I've addressed your comments internally and I've at-least one solution fully tested and working with adjusted main series (upcoming RFC V6) after addressing your major comments. I'm further open to your suggestions.
I've just floated V2 series of this ACPI part addressing your major concerns: #1. Dropped `acpi_persistence` logic from the ACPI code. Now, all the vCPUs are always part of the `possible_cpus` list. This has reduced code. #2. Removed the `ACPICpuStatus::is_present` state and corresponding ACPI CPU_PRESENT flag from the ACPI interface and it logic in the CPUs AML. #3. Introduced the conditional inclusion of the CPU HP VMSD in the GED's VMSD using .needed() hook. #4. Fixed the make-qtest/bios-acpi-test failure on x86/{q35,pc} platforms because of the change in the CPUs AML code #5. Adjusted the main series (upcoming RFC V6) with above logic and have found it working. I've requested other stake holders to test the ACPI part and the complete RFC V6 as well. At present I've not removed `is_enabled` part from the migration code which you requested to make it part of the `CPUState`. I was wondering if there was a way to keep it in the migration state without breaking x86 migration code? A humble request: If not, do you think we can go ahead with acceptance of rest of the patches for this cycle and drop the last patch? Arch agnostic ACPI V2 series: https://lore.kernel.org/qemu-devel/20241031200502.3869-1-salil.me...@huawei.com/T/#mf10104510269d5c290622a0471f0997ad058e397 Upcoming RFC V6, Support of Virtual CPU Hotplug for ARMv8 Arch Link: https://github.com/salil-mehta/qemu/commits/virt-cpuhp-armv8/rfc-v6-rc4 Please find my replies inline for the rest of the comments. Many thanks! Best regards Salil. > From: Igor Mammedov <imamm...@redhat.com> > Sent: Friday, October 25, 2024 2:52 PM > To: Salil Mehta <salil.me...@opnsrc.net> > Cc: Salil Mehta <salil.me...@huawei.com>; qemu-devel@nongnu.org; > qemu-...@nongnu.org; m...@redhat.com; m...@kernel.org; jean- > phili...@linaro.org; Jonathan Cameron > <jonathan.came...@huawei.com>; lpieral...@kernel.org; > peter.mayd...@linaro.org; richard.hender...@linaro.org; > andrew.jo...@linux.dev; da...@redhat.com; phi...@linaro.org; > eric.au...@redhat.com; w...@kernel.org; a...@kernel.org; > oliver.up...@linux.dev; pbonz...@redhat.com; gs...@redhat.com; > raf...@kernel.org; borntrae...@linux.ibm.com; alex.ben...@linaro.org; > npig...@gmail.com; hars...@linux.ibm.com; li...@armlinux.org.uk; > dar...@os.amperecomputing.com; il...@os.amperecomputing.com; > vis...@os.amperecomputing.com; karl.heub...@oracle.com; > miguel.l...@oracle.com; zhukeqian <zhukeqi...@huawei.com>; > wangxiongfeng (C) <wangxiongfe...@huawei.com>; wangyanan (Y) > <wangyana...@huawei.com>; jiakern...@gmail.com; > maob...@loongson.cn; lixiang...@loongson.cn; shahu...@redhat.com; > zhao1....@intel.com; Linuxarm <linux...@huawei.com>; > gustavo.rom...@linaro.org > Subject: Re: [PATCH V1 1/4] hw/acpi: Initialize ACPI Hotplug CPU Status with > Support for vCPU `Persistence` > > On Mon, 21 Oct 2024 22:50:40 +0100 > Salil Mehta <salil.me...@opnsrc.net> wrote: > > > Hi Igor, > > > > Thanks for taking time to review and sorry for not being prompt. I was > > in transit due to some difficult personal situation. > > > > On Fri, Oct 18, 2024 at 3:11 PM Igor Mammedov > <imamm...@redhat.com> wrote: > > > > > On Mon, 14 Oct 2024 20:22:02 +0100 > > > Salil Mehta <salil.me...@huawei.com> wrote: > > > > > > > Certain CPU architecture specifications [1][2][3] prohibit changes > > > > to CPU > > > ^^^^^^^^^ these do not > > > point to specs but never mind > > > > > > > presence after the kernel has booted. This limitation exists > > > > because > > > many system > > > > initializations rely on the exact CPU count at boot time and do > > > > not > > > expect it to > > > > change later. For example, components like interrupt controllers, > > > > which > > > are > > > > closely tied to CPUs, or various per-CPU features, may not support > > > configuration > > > > changes once the kernel has been initialized. This presents a > > > > challenge > > > for > > > > virtualization features such as vCPU hotplug. > > > > > > well, x86 (incl cpu,interrupt ctrls) also doesn't have architectural > > > hotplug. > > > It's just OEM managed to implement it regardless and then bothered > > > to make OS changes to work with that. > > > It's just ARM community doesn't want to go there at this point of > > > time but using above reasoning as justification for this series > > > doesn't look good to me. > > > > > > > > > There is a difference though. vCPU presence cannot be changed after > > the system has initialized in the ARM world due to the Architectural > > constraint. > > I think > > in the x86 world > > it is allowed? > > As far as I know x86 doesn't have arch defined hotplug, but vendors > managed to implement it somehow and then made sure that OS running on > top could stomach it. sure, I understand. ARM specification imposes restrictions on any such change. It is well controlled by the architecture team within ARM which includes the hardware guys. We've worked very closely with ARM in the past 4 years to be able to check the viability of any such change but it was repeatedly rejected by the architecture team. In fact, Jean-phillipe from Linaro/ARM (CC'ed in this series) tried different approach using this series but it required change in the specification. AFAIK, it was attempted but was rejected. Please follow below link: https://op-lists.linaro.org/archives/list/linaro-open-discussi...@op-lists.linaro.org/message/X74JS6P2N4AUWHHATJJVVFDI2EMDZJ74/ We've come to current stage after lots of discussions and attempts while co-working with ARM/Linaro and other companies using Linaro-open-discussions as a common platform for co-working for over past 3 years. > > > > So what ARM would like to support is not CPU hotplug but rather a > > > fixed system with standby CPUs (that can be powered on/off on > demand). > > > With ACPI spec amended to support that (MADT present/enabled > > > changes), it's good enough reason to add 'enabled' handling to acpi > > > cpu-hotplug code instead of inventing alternative one that would do > > > almost the same. > > > > > > > > > Not sure what you mean here but this is what is being done. > it was ack for 'enabled' flag in cphp acpi abi. > The rest was digression wrt 'present' hack in ACPI code. Ok. I've incorporated your suggestions in V2 series > > > But lets get rid of (can't/may not) language above and use standby > > > CPUs reasoning to avoid any confusion. > > > > > > > > > I've to disagree here. Standy-by means you will have to keep all the vCPUs > > states > > realized and the objects will always exist. This is a problem for larger > > systems > > with vCPU hotplug support as it drastically affects the boot times. > > > > see comment below about boot times. > > > Please > > check > > the KVM Forum 2023 slides for objective values. > > For sure we can conjure unicorns especially virtual ones, > it doesn't mean that we should do it though. > > > It's been a long time since this series was actually conceived and at that > > time we wanted > > to use it for kata containers but later with the gradual adoption of > > various microvms > > and the cloud hypervisor requirements have bit changed. Boot time still > > remains one > > of the major requirements. Not bounding boot time while using this > feature > > will > > seriously affect performance if the number of vCPUs increases > > again wrt boot time, see comment below. > > > > > PS: > > > I'm taking about hw hotplug (at QEMU level) and not kernel hotplug > > > (where it's at logical cpu level). > > > > > > > sure > > > > > > > > > > > To address this issue, introduce an `is_enabled` state in the > > > `AcpiCpuStatus`, > > > > which reflects whether a vCPU has been hot-plugged or hot- > unplugged in > > > QEMU, > > > > marking it as (un)available in the Guest Kernel. > > > good so far > > > > > > > The `is_present` state should > > > > be set based on the `acpi_persistent` flag. In cases where unplugged > > > vCPUs need > > > > to be deliberately simulated in the ACPI to maintain a persistent view > > > of vCPUs, > > > > this flag ensures the guest kernel continues to see those vCPUs. > > > > > > that's where I have to disagree, vCPU is present when a corresponding > QOM > > > object > > > exists. Faking it's presence will only confuse at the end. > > > > > > > > > Not faking care of it will mean realization of the unnecessary vCPU > > threads during > > the VM init time, which in turn will increase the boot time. Trade-off > > between more > > clean design and performance. > > I very much prefer clean design, which should result in > less code/less bugs/easier maintenance and less regressions > when someone fixes intentional hacks, with a good reasoning that > hardware doesn't work that way. I've removed the `acpi_persistent` logic in the just floated V2 series. Yes, code has reduced. Please have a look. > > wrt boot time, see below. > > > > I get that you want to reuse device_add/del interface, but that leads to > > > pulling the leg here and there to make thing fit. That in short therm > > > (while someone remembers all tricks) might work for some, but long > therm > > > it's not sustainable). > > > > > > > > > I do not understand why? > > It's a complicated set of hacks all over place to make something that > do not exists in the real world work somehow. All vCPUs are now part of the `possible_cpus_list` and ACPI CPU state is now consistent with QOM vCPU object state all the time. > > While at the same time there is alternative approach that mimics ARM > hardware > behavior and is supported by vendor (standby cpus). > That also will be much more simple way, while providing similar to hotplug > functionality. You still need some changes in the ACPI specifications to allow delay online'ing the vCPUs. Also, we need a way to avoid accesses to the GIC CPU Interfaces while the so called `stand-by` vCPUs are not operational but are part of the larger `possible_cpus_list`. The existing changes in the GICv3 code and the ACPI (GICC::flags::online-capable) part does exactly the same thing. Hot(un)plug hooks are boiler plate code part of the hotplug framework which x86 also implements. > > > Maybe instead of pushing device_add/del, we should rather implement > > > standby CPU model here, as ARM folks expect it to be. > > > i.e. instead of device_add/del add 'enabled' property to ARM vCPU, > > > and let management to turn on/off vCPUs on demand. > > > (and very likely this model could be reused by other sock based > platforms) > > > For end-user it would be the same as device_add/del (i.e. vCPU > becomes > > > usable/unsable) > > > > > > One of the main goals is to facilitate seamless migration from the x86 > > world to > > ARM world. Hence, preservation of the x86 interface is important. It is a > > requirement. > > Just imagine if Apple had to change end user interface experience after > > migrating iOS > > from x86 to ARM architecture. ? > > About that I wouldn't worry much as it's secondary. > Management can adapt to call 'qom set' instead of calling device_add/del. > It definitely shouldn't be the thing that turns design decisions > into the bad model. I'm using exactly the same model which x86 is also using. We want to emulate x86 interface here. > > > I'd bet it would simplify your ARM CPU hotplug series a lot, > > > since you won't have to fake vCPU object lifetime and do > > > non trivial tricks to make it all work. > > > Which it turn will make ARM hotplug series much more approachable > > > for reviewers /and whomever comes later across that code/. > > > > > > > Tricks are just for ACPI and GIC and nothing else. Rest is a > > boilerplate code of the > > vCPU hotplug framework which x86 is also using. > > looking at your v5 ARM cphp series, it does a bunch refactoring > to handle CPU absence whre there should be one by design. I'll be separating out some common refactoring which can be floated independently of the vCPU hotplug code. May be this might help in alleviating your concerns. > > While in standby cpus, that won't be needed. > (one would need a code to enable/disable CPUs, plus making ACPI > deliver those events, but that's pretty much all. i.e. do what real > hw can do) By keeping all the objects as part of the `possible_cpus_list` we are indeed creating stand-by CPUs only. Rest is the interface part, I've a prototype ready for that as well but then that would be part of main series sometime later. > > > > Regardless of said, we would still need changes to ACPI cphp code, > > > see my comments inline. Sure, thanks for your valuable comments. I've tried to address most of them. Please have a look at V2 series and let me know if you still have concerns. I'll try my best to address them. [...] > > > > diff --git a/cpu-target.c b/cpu-target.c > > > > index 499facf774..c8a29ab495 100644 > > > > --- a/cpu-target.c > > > > +++ b/cpu-target.c > > > > @@ -200,6 +200,7 @@ static Property cpu_common_props[] = { > > > > */ > > > > DEFINE_PROP_LINK("memory", CPUState, memory, > TYPE_MEMORY_REGION, > > > > MemoryRegion *), > > > > + DEFINE_PROP_BOOL("acpi-persistent", CPUState, acpi_persistent, > > > false), > > > > > > I agree with Gavin, it's not CPU property/business, but a platform one. > > > > > > Pass it as argument to cpu_hotplug_hw_init(), > > > and maybe rename to always_present. > > > Then make sure that it's configurable in GED (which calls the function), > > > and just turn it on for arm/virt machine. > > > Other platforms might want to use x86 approach with GED and have > > > vCPU actually disappearing. /loongson and maybe risc-v/ > > > > > > > can we move it to Machine level or fetch it through firmware? > > following would do: > 1. add to ged a new property to opt-in into this > 2. set the new property from arm's board_init where GED is created > 3. when GED calls cpu_hotplug_hw_init(), pass property value as an > argument Because I've dropped the `acpi_persistent` a special flag to distinguish the disabled vCPUs is not required. Please check the V2 series. [...] > > > > > #endif > > > > DEFINE_PROP_END_OF_LIST(), > > > > }; > > > > diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c > > > > index 5cb60ca8bc..083c4010c2 100644 > > > > --- a/hw/acpi/cpu.c > > > > +++ b/hw/acpi/cpu.c > > > > @@ -225,7 +225,40 @@ void cpu_hotplug_hw_init(MemoryRegion > *as, Object > > > *owner, > > > > state->dev_count = id_list->len; > > > > state->devs = g_new0(typeof(*state->devs), state->dev_count); > > > > for (i = 0; i < id_list->len; i++) { > > > > - state->devs[i].cpu = CPU(id_list->cpus[i].cpu); > > > > + struct CPUState *cpu = CPU(id_list->cpus[i].cpu); > > > > + /* > > > > + * In most architectures, CPUs that are marked as ACPI > > > 'present' are > > > > + * also ACPI 'enabled' by default. These states remain > > > consistent at > > > > + * both the QOM and ACPI levels. > > > > + */ > > > > + if (cpu) { > > > > + state->devs[i].is_enabled = true; > > > > + state->devs[i].is_present = true; > > > > + state->devs[i].cpu = cpu; > > > > + } else { > > > > + state->devs[i].is_enabled = false; > > > > + /* > > > > + * In some architectures, even 'unplugged' or 'disabled' > > > QOM CPUs > > > > + * may be exposed as ACPI 'present.' This approach > provides > > > a > > > > + * persistent view of the vCPUs to the guest kernel. This > > > could be > > > > + * due to an architectural constraint that requires every > > > per-CPU > > > > + * component to be present at boot time, meaning the exact > > > count of > > > > + * vCPUs must be known and cannot be altered after the > > > kernel has > > > > + * booted. As a result, the vCPU states at the QOM and > ACPI > > > levels > > > > + * might become inconsistent. However, in such cases, the > > > presence > > > > + * of vCPUs has been deliberately simulated at the ACPI > > > level. > > > > + */ > > > > > > if cpus are not 'simulated', you will not need comments explaining that > all > > > over place and whole hunk would likely go away. > > > > > > > I can reduce the content if you wish. > > you have those comments for a reason since the code does unexpected > 'simulate' thing. Removing that would mask intentional behavior > for a reader later down the road. Dropped the `acpi_persistent` logic and the related comments. [...] > > > > diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h > > > > index 32654dc274..bd3f9973c9 100644 > > > > --- a/include/hw/acpi/cpu.h > > > > +++ b/include/hw/acpi/cpu.h > > > > @@ -26,6 +26,8 @@ typedef struct AcpiCpuStatus { > > > > uint64_t arch_id; > > > > bool is_inserting; > > > > bool is_removing; > > > > + bool is_present; > > > with always_present, it might be better to move field to > CPUHotplugState > > > as it's not per vCPU anymore, and in standby case state->devs[i].cpu > > > should work as implicit present flag. (see below wrt doc first comment) > > > > > > > I'm still not convinced of the stand-by approach because of the > performance > > impact > > it will have upon increasing the number of possible vCPUs and its worst > > case is > > unbounded. That's a major problem. > > # of vCPUs is always bound by host capabilities and by modeled hardware. > > From guest point of view both approaches should save time as > guest won't try to online non-present or disabled CPUs > (with large machines guest boot time usually dwarfs whatever > QEMU does before the guest, +UEFI adds to insult event more). > And fast booting large VM (with non present CPUs) initially, > is just postponing problem the the later time when hotplugging > CPUs causes worse timing (since guest effectively does stop_machine) > for 'online'-ing to happen. > > Regardless, standby CPUs is what ARM folks have chosen to support. > In this case, I'd pick arch preferred way anytime vs some boot time > improvements. It's unfortunate for all of us but ARM folks are legally binded not to comment on the Qemu code. But Jean-phillipe Brucker from Linaro has indeed gone through the Qemu patches earlier in year 2021. https://op-lists.linaro.org/archives/list/linaro-open-discussi...@op-lists.linaro.org/message/X74JS6P2N4AUWHHATJJVVFDI2EMDZJ74/ We had tried to solve the problem using just PSCI approach earlier. `Stand-by` CPUs is just a cosmetic term. By having all the vCPUs part of the `possible_cpus_list` and parking disabled vCPUs. We've indeed kept the vCPUs on `stand-by`. This now is the existing approach in the upcoming RFC V6. ACPI CPU state is consistent with QOM vCPUs state. > > If arm_cpu_realizefn() is still expensive for your case and better > init times are needed, fix it instead of working around it by faking > hotplug on QEMU side and moving issue to the later time. > That way will benefit not only hotplug case, but also huge VMs case. > (good example is x86, where it scales well without any hotplug) Sure, agreed. that’s definitely one of the way to improve but it can also be add-on? > > PS: > Out of curiosity, I've just fed qemu to perf on Ampere host. > There are a number of low hanging fruits that would reduce consumed > CPU cycles, for those who wishes to improve performance. > > Some would lead to overall improvements, other could be > done when vCPU is disabled. > > ex: with -enable-kvm -smp 100 > - 26.18% arm_cpu_realizefn > + 8.86% cpu_reset <- likely is not necessary for disabled vCPUs, as one > has to call it again when enabling vCPU > + 4.53% register_cp_regs_for_features > + 4.32% arm_cpu_register_gdb_regs_for_features <- do we need it > when gdbstub is not enabled? > + 4.09% init_cpreg_list <- naive refactoring makes it a fraction of > percent > + 3.40% qemu_init_vcpu > 0.59% trace_init_vcpu > > with above hacks it goes down to 11% (12% for x86_cpu_realizefn). > However wall-clock wise compared to x86, the arm/virt doesn't scale well. > So there is something else tgetting in the way (i.e. something stalls vCPU > creation on ARM?). Thanks for these leads but I will need to spend time in this direction to check and verify these. I'll get back to you regarding your question. > > And well, I' might be comparing apples to oranges here (old E5-2630v3 vs > some 32core 3.3Ghz Ampere cpu). True. > > PS2: > create-gic() is another candidate for improvement, it spends a lot > of time on setting GPIO properties. Yes, indeed. These are very helpful. I will ensure that I look into this direction as well regardless of the vCPU hotplug patches. Maybe we can combine all good things together. > > > > > + bool is_enabled; > > > I'd move introduction of this field into a separate patch. > > > > > > BTW: new ABI/fields accessible by guest should be described in > > > docs/specs/acpi_cpu_hotplug.rst. > > > It would be better to have the spec as patch 1st, that we all agree on > > > and then follow with implementation. > > > > > > > We can do that. > > > > > > > And also include there an expected workflow for standby case. > > > > > > > > We need more discussion on this as our requirements for which we > floated > ^^^^^^^^^^^^^^ > > this > > feature are not being met using stand-by cases. > > A list of them would be a good starting point for discussion. Sure, agreed. I'll jot them down as part of the main series. > Perhaps requirements should be made more realistic and be re-evaluated. Requirements are clearly indicated on the cover-letter of the main series. Best regards Salil.