Hi Igor, I've fixed most of the x86 problems you had commented in the V1 patch-set in the recently floated V3 ACPI patch-set. This includes removing the `is_{enabled,present}` ACPI CPU States for which you expressed apprehensions. Sorry, there was a miss from my side in the V2 related to the CPUs AML code for x86 platform. I've fixed that as well.
Please let me know if this patch-set addresses all your concerns. I am open to any suggestions you deem necessary for its acceptance in this cycle. Acceptance of Arch agnostic ACPI changes in this cycle will really help all of us. https://lore.kernel.org/qemu-devel/20241103102419.202225-1-salil.me...@huawei.com/ Many thanks! Best regards Salil. > From: Salil Mehta > Sent: Friday, November 1, 2024 10:54 AM > To: 'Igor Mammedov' <imamm...@redhat.com>; Salil Mehta > <salil.me...@opnsrc.net> > > Hi Igor, > > Thanks for replying back and sorry for the late reply. I wanted to make sure > that I've addressed your comments internally and I've at-least one solution > fully tested and working with adjusted main series (upcoming RFC V6) after > addressing your major comments. I'm further open to your suggestions. > > I've just floated V2 series of this ACPI part addressing your major concerns: > > #1. Dropped `acpi_persistence` logic from the ACPI code. Now, all the vCPUs > are always part of the `possible_cpus` list. This has reduced code. > #2. Removed the `ACPICpuStatus::is_present` state and corresponding ACPI > CPU_PRESENT flag from the ACPI interface and it logic in the CPUs AML. > #3. Introduced the conditional inclusion of the CPU HP VMSD in the GED's > VMSD using .needed() hook. > #4. Fixed the make-qtest/bios-acpi-test failure on x86/{q35,pc} platforms > because of the change in the CPUs AML code #5. Adjusted the main > series (upcoming RFC V6) with above logic and have > found it working. I've requested other stake holders to test the ACPI > part and the complete RFC V6 as well. > > At present I've not removed `is_enabled` part from the migration code > which you requested to make it part of the `CPUState`. I was wondering if > there was a way to keep it in the migration state without breaking x86 > migration code? > > A humble request: > If not, do you think we can go ahead with acceptance of rest of the patches > for this cycle and drop the last patch? > > > Arch agnostic ACPI V2 series: > https://lore.kernel.org/qemu-devel/20241031200502.3869-1- > salil.me...@huawei.com/T/#mf10104510269d5c290622a0471f0997ad058e39 > 7 > > Upcoming RFC V6, Support of Virtual CPU Hotplug for ARMv8 Arch > Link: https://github.com/salil-mehta/qemu/commits/virt-cpuhp-armv8/rfc- > v6-rc4 > > > Please find my replies inline for the rest of the comments. > > > Many thanks! > > Best regards > Salil. > > > > From: Igor Mammedov <imamm...@redhat.com> > > Sent: Friday, October 25, 2024 2:52 PM > > To: Salil Mehta <salil.me...@opnsrc.net> > > Cc: Salil Mehta <salil.me...@huawei.com>; qemu-devel@nongnu.org; > > qemu-...@nongnu.org; m...@redhat.com; m...@kernel.org; jean- > > phili...@linaro.org; Jonathan Cameron > <jonathan.came...@huawei.com>; > > lpieral...@kernel.org; peter.mayd...@linaro.org; > > richard.hender...@linaro.org; andrew.jo...@linux.dev; > > da...@redhat.com; phi...@linaro.org; eric.au...@redhat.com; > > w...@kernel.org; a...@kernel.org; oliver.up...@linux.dev; > > pbonz...@redhat.com; gs...@redhat.com; raf...@kernel.org; > > borntrae...@linux.ibm.com; alex.ben...@linaro.org; > npig...@gmail.com; > > hars...@linux.ibm.com; li...@armlinux.org.uk; > > dar...@os.amperecomputing.com; il...@os.amperecomputing.com; > > vis...@os.amperecomputing.com; karl.heub...@oracle.com; > > miguel.l...@oracle.com; zhukeqian <zhukeqi...@huawei.com>; > > wangxiongfeng (C) <wangxiongfe...@huawei.com>; wangyanan (Y) > > <wangyana...@huawei.com>; jiakern...@gmail.com; > maob...@loongson.cn; > > lixiang...@loongson.cn; shahu...@redhat.com; zhao1....@intel.com; > > Linuxarm <linux...@huawei.com>; gustavo.rom...@linaro.org > > Subject: Re: [PATCH V1 1/4] hw/acpi: Initialize ACPI Hotplug CPU > > Status with Support for vCPU `Persistence` > > > > On Mon, 21 Oct 2024 22:50:40 +0100 > > Salil Mehta <salil.me...@opnsrc.net> wrote: > > > > > Hi Igor, > > > > > > Thanks for taking time to review and sorry for not being prompt. I > > was > in transit due to some difficult personal situation. > > > > > > On Fri, Oct 18, 2024 at 3:11 PM Igor Mammedov > > <imamm...@redhat.com> wrote: > > > > > > > On Mon, 14 Oct 2024 20:22:02 +0100 > > Salil Mehta > > <salil.me...@huawei.com> wrote: > > > > > > > > > Certain CPU architecture specifications [1][2][3] prohibit > > changes > > > to CPU > > > > ^^^^^^^^^ these do not > > > > point to specs but never mind > > > > > > > > > presence after the kernel has booted. This limitation exists > > > > > because > > many system > > > initializations rely on the exact > > CPU count at boot time and do > > > not > > expect it to > > > > > change later. For example, components like interrupt controllers, > > > > > which > > are > > > closely tied to CPUs, or various per-CPU > > features, may not support > > configuration > > > changes once the > > kernel has been initialized. This presents a > > > challenge > > for > > > > > virtualization features such as vCPU hotplug. > > > > > > > > well, x86 (incl cpu,interrupt ctrls) also doesn't have > > architectural > > hotplug. > > > > It's just OEM managed to implement it regardless and then > > bothered > > to make OS changes to work with that. > > > > It's just ARM community doesn't want to go there at this point of > > > > time but using above reasoning as justification for this series > > > > doesn't look good to me. > > > > > > > > > > > > > There is a difference though. vCPU presence cannot be changed after > > > the system has initialized in the ARM world due to the Architectural > > > constraint. > > > I think > > > in the x86 world > > > it is allowed? > > > > As far as I know x86 doesn't have arch defined hotplug, but vendors > > managed to implement it somehow and then made sure that OS running > on > > top could stomach it. > > > sure, I understand. ARM specification imposes restrictions on any such > change. It is well controlled by the architecture team within ARM which > includes the hardware guys. We've worked very closely with ARM in the > past 4 years to be able to check the viability of any such change but it was > repeatedly rejected by the architecture team. > > In fact, Jean-phillipe from Linaro/ARM (CC'ed in this series) tried different > approach using this series but it required change in the specification. > AFAIK, it was attempted but was rejected. Please follow below link: > > https://op-lists.linaro.org/archives/list/linaro-open-discussions@op- > lists.linaro.org/message/X74JS6P2N4AUWHHATJJVVFDI2EMDZJ74/ > > > We've come to current stage after lots of discussions and attempts while co- > working with ARM/Linaro and other companies using Linaro-open- > discussions as a common platform for co-working for over past 3 years. > > > > > > > > So what ARM would like to support is not CPU hotplug but rather a > > > > fixed system with standby CPUs (that can be powered on/off on > > demand). > > > > With ACPI spec amended to support that (MADT present/enabled > > > > changes), it's good enough reason to add 'enabled' handling to acpi > > > > cpu-hotplug code instead of inventing alternative one that would do > > > > almost the same. > > > > > > > > > > > > > Not sure what you mean here but this is what is being done. > > it was ack for 'enabled' flag in cphp acpi abi. > > The rest was digression wrt 'present' hack in ACPI code. > > > Ok. I've incorporated your suggestions in V2 series > > > > > > But lets get rid of (can't/may not) language above and use > > standby > > CPUs reasoning to avoid any confusion. > > > > > > > > > > > > > I've to disagree here. Standy-by means you will have to keep all > > the vCPUs > states > realized and the objects will always exist. > > This is a problem for larger > systems > with vCPU hotplug support > > as it drastically affects the boot times. > > > > > > > see comment below about boot times. > > > > > Please > > > check > > > the KVM Forum 2023 slides for objective values. > > > > For sure we can conjure unicorns especially virtual ones, it doesn't > > mean that we should do it though. > > > > > It's been a long time since this series was actually conceived and > > at that > time we wanted > to use it for kata containers but later > > with the gradual adoption of > various microvms > and the cloud > > hypervisor requirements have bit changed. Boot time still > remains > > one > of the major requirements. Not bounding boot time while using > > this feature > will > seriously affect performance if the number of > > vCPUs increases > > > > again wrt boot time, see comment below. > > > > > > > > PS: > > > > I'm taking about hw hotplug (at QEMU level) and not kernel > > hotplug > > (where it's at logical cpu level). > > > > > > > > > > sure > > > > > > > > > > > > > > > To address this issue, introduce an `is_enabled` state in the > > > > `AcpiCpuStatus`, > > > which reflects whether a vCPU has been > > hot-plugged or hot- unplugged in > > QEMU, > > > marking it as > > (un)available in the Guest Kernel. > > > > good so far > > > > > > > > > The `is_present` state should > > > > > be set based on the `acpi_persistent` flag. In cases where > > unplugged > > vCPUs need > > > to be deliberately simulated in the > > ACPI to maintain a persistent view > > of vCPUs, > > > this flag > > ensures the guest kernel continues to see those vCPUs. > > > > > > > > that's where I have to disagree, vCPU is present when a > > corresponding QOM > > object > > exists. Faking it's presence will > > only confuse at the end. > > > > > > > > > > > > > Not faking care of it will mean realization of the unnecessary > > vCPU > threads during > the VM init time, which in turn will > > increase the boot time. Trade-off > between more > clean design and > > performance. > > > > I very much prefer clean design, which should result in less > > code/less bugs/easier maintenance and less regressions when someone > > fixes intentional hacks, with a good reasoning that hardware doesn't > > work that way. > > > I've removed the `acpi_persistent` logic in the just floated V2 series. > Yes, code has reduced. Please have a look. > > > > > > wrt boot time, see below. > > > > > > I get that you want to reuse device_add/del interface, but that > > leads to > > pulling the leg here and there to make thing fit. That > > in short therm > > (while someone remembers all tricks) might work > > for some, but long therm > > it's not sustainable). > > > > > > > > > > > > > I do not understand why? > > > > It's a complicated set of hacks all over place to make something that > > do not exists in the real world work somehow. > > > All vCPUs are now part of the `possible_cpus_list` and ACPI CPU state is now > consistent with QOM vCPU object state all the time. > > > > > While at the same time there is alternative approach that mimics ARM > > hardware behavior and is supported by vendor (standby cpus). > > That also will be much more simple way, while providing similar to > > hotplug functionality. > > > You still need some changes in the ACPI specifications to allow delay > online'ing the vCPUs. Also, we need a way to avoid accesses to the GIC CPU > Interfaces while the so called `stand-by` vCPUs are not operational but are > part of the larger `possible_cpus_list`. The existing changes in the GICv3 > code and the ACPI > (GICC::flags::online-capable) part does exactly the same thing. > > Hot(un)plug hooks are boiler plate code part of the hotplug framework > which > x86 also implements. > > > > > > Maybe instead of pushing device_add/del, we should rather > > implement > > standby CPU model here, as ARM folks expect it to be. > > > > i.e. instead of device_add/del add 'enabled' property to ARM > > vCPU, > > and let management to turn on/off vCPUs on demand. > > > > (and very likely this model could be reused by other sock based > > platforms) > > > > For end-user it would be the same as device_add/del (i.e. vCPU > > becomes > > usable/unsable) > > > One of the main goals is to > > facilitate seamless migration from the x86 > world to > ARM world. > > Hence, preservation of the x86 interface is important. It is a > > > requirement. > > > Just imagine if Apple had to change end user interface experience > > after > migrating iOS > from x86 to ARM architecture. ? > > > > About that I wouldn't worry much as it's secondary. > > Management can adapt to call 'qom set' instead of calling device_add/del. > > It definitely shouldn't be the thing that turns design decisions > > into the bad model. > > > I'm using exactly the same model which x86 is also using. We want to > emulate > x86 interface here. > > > > > > I'd bet it would simplify your ARM CPU hotplug series a lot, > > > > since you won't have to fake vCPU object lifetime and do > > non > > trivial tricks to make it all work. > > > > Which it turn will make ARM hotplug series much more approachable > > > > for reviewers /and whomever comes later across that code/. > > > > > > > > > > Tricks are just for ACPI and GIC and nothing else. Rest is a > > > boilerplate code of the > vCPU hotplug framework which x86 is also > > using. > > > > looking at your v5 ARM cphp series, it does a bunch refactoring to > > handle CPU absence whre there should be one by design. > > > I'll be separating out some common refactoring which can be floated > independently of the vCPU hotplug code. May be this might help in > alleviating your concerns. > > > > > > While in standby cpus, that won't be needed. > > (one would need a code to enable/disable CPUs, plus making ACPI > > deliver those events, but that's pretty much all. i.e. do what real > > hw can do) > > > By keeping all the objects as part of the `possible_cpus_list` we are indeed > creating stand-by CPUs only. Rest is the interface part, I've a prototype > ready for that as well but then that would be part of main series sometime > later. > > > > > > > Regardless of said, we would still need changes to ACPI cphp > > code, > > see my comments inline. > > > Sure, thanks for your valuable comments. I've tried to address most of > them. > Please have a look at V2 series and let me know if you still have concerns. > I'll > try my best to address them. > > > [...] > > > > > diff --git a/cpu-target.c b/cpu-target.c > > > index > > 499facf774..c8a29ab495 100644 > > > --- a/cpu-target.c > > > +++ > > b/cpu-target.c > > > @@ -200,6 +200,7 @@ static Property > > cpu_common_props[] = { > > > > > */ > > > > > DEFINE_PROP_LINK("memory", CPUState, memory, > > TYPE_MEMORY_REGION, > > > > > MemoryRegion *), > > > > > + DEFINE_PROP_BOOL("acpi-persistent", CPUState, > acpi_persistent, > > > > false), > > > > > > > > I agree with Gavin, it's not CPU property/business, but a platform > one. > > > > > > > > Pass it as argument to cpu_hotplug_hw_init(), > > and maybe > > rename to always_present. > > > > Then make sure that it's configurable in GED (which calls the > > function), > > and just turn it on for arm/virt machine. > > > > Other platforms might want to use x86 approach with GED and have > > > > vCPU actually disappearing. /loongson and maybe risc-v/ > > > > > > can we move it to Machine level or fetch it through firmware? > > > > following would do: > > 1. add to ged a new property to opt-in into this > > 2. set the new property from arm's board_init where GED is created > > 3. when GED calls cpu_hotplug_hw_init(), pass property value as an > > argument > > > Because I've dropped the `acpi_persistent` a special flag to distinguish the > disabled vCPUs is not required. Please check the V2 series. > > [...] > > > > > > > #endif > > > > > DEFINE_PROP_END_OF_LIST(), > > > > > }; > > > > > diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c > > > index > > 5cb60ca8bc..083c4010c2 100644 > > > --- a/hw/acpi/cpu.c > > > +++ > > b/hw/acpi/cpu.c > > > @@ -225,7 +225,40 @@ void > > cpu_hotplug_hw_init(MemoryRegion *as, Object > > *owner, > > > > > state->dev_count = id_list->len; > > > > > state->devs = g_new0(typeof(*state->devs), state->dev_count); > > > > > for (i = 0; i < id_list->len; i++) { > > > > > - state->devs[i].cpu = CPU(id_list->cpus[i].cpu); > > > > > + struct CPUState *cpu = CPU(id_list->cpus[i].cpu); > > > > > + /* > > > > > + * In most architectures, CPUs that are marked as ACPI > > > > 'present' are > > > > > + * also ACPI 'enabled' by default. These states remain > > > > consistent at > > > > > + * both the QOM and ACPI levels. > > > > > + */ > > > > > + if (cpu) { > > > > > + state->devs[i].is_enabled = true; > > > > > + state->devs[i].is_present = true; > > > > > + state->devs[i].cpu = cpu; > > > > > + } else { > > > > > + state->devs[i].is_enabled = false; > > > > > + /* > > > > > + * In some architectures, even 'unplugged' or > 'disabled' > > > > QOM CPUs > > > > > + * may be exposed as ACPI 'present.' This approach > provides > > > > a > > > > > + * persistent view of the vCPUs to the guest kernel. > This > > > > could be > > > > > + * due to an architectural constraint that requires > every > > > > per-CPU > > > > > + * component to be present at boot time, meaning the > exact > > > > count of > > > > > + * vCPUs must be known and cannot be altered after the > > > > kernel has > > > > > + * booted. As a result, the vCPU states at the QOM and > ACPI > > > > levels > > > > > + * might become inconsistent. However, in such cases, > the > > > > presence > > > > > + * of vCPUs has been deliberately simulated at the ACPI > > > > level. > > > > > + */ > > > > > > > > if cpus are not 'simulated', you will not need comments > > explaining that all > > over place and whole hunk would likely go > > away. > > > > > > > > > > I can reduce the content if you wish. > > > > you have those comments for a reason since the code does unexpected > > 'simulate' thing. Removing that would mask intentional behavior for a > > reader later down the road. > > > Dropped the `acpi_persistent` logic and the related comments. > > > [...] > > > > > > diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h > > > > > index 32654dc274..bd3f9973c9 100644 > > > --- > > a/include/hw/acpi/cpu.h > > > +++ b/include/hw/acpi/cpu.h > > > @@ > > -26,6 +26,8 @@ typedef struct AcpiCpuStatus { > > > > > uint64_t arch_id; > > > > > bool is_inserting; > > > > > bool is_removing; > > > > > + bool is_present; > > > > with always_present, it might be better to move field to > > CPUHotplugState > > as it's not per vCPU anymore, and in standby case > > state->devs[i].cpu > > should work as implicit present flag. (see > > below wrt doc first comment) > > > > I'm still not convinced of the > > stand-by approach because of the performance > impact > it will > > have upon increasing the number of possible vCPUs and its worst > > > case is > unbounded. That's a major problem. > > > > # of vCPUs is always bound by host capabilities and by modeled > hardware. > > > > From guest point of view both approaches should save time as guest > > won't try to online non-present or disabled CPUs (with large machines > > guest boot time usually dwarfs whatever QEMU does before the guest, > > +UEFI adds to insult event more). > > And fast booting large VM (with non present CPUs) initially, is just > > postponing problem the the later time when hotplugging CPUs causes > > worse timing (since guest effectively does stop_machine) for > > 'online'-ing to happen. > > > > Regardless, standby CPUs is what ARM folks have chosen to support. > > In this case, I'd pick arch preferred way anytime vs some boot time > > improvements. > > > It's unfortunate for all of us but ARM folks are legally binded not to > comment on the Qemu code. But Jean-phillipe Brucker from Linaro has > indeed gone through the Qemu patches earlier in year 2021. > > https://op-lists.linaro.org/archives/list/linaro-open-discussions@op- > lists.linaro.org/message/X74JS6P2N4AUWHHATJJVVFDI2EMDZJ74/ > > We had tried to solve the problem using just PSCI approach earlier. > > `Stand-by` CPUs is just a cosmetic term. By having all the vCPUs part of the > `possible_cpus_list` and parking disabled vCPUs. We've indeed kept the > vCPUs on `stand-by`. This now is the existing approach in the upcoming RFC > V6. ACPI CPU state is consistent with QOM vCPUs state. > > > > > > If arm_cpu_realizefn() is still expensive for your case and better > > init times are needed, fix it instead of working around it by faking > > hotplug on QEMU side and moving issue to the later time. > > That way will benefit not only hotplug case, but also huge VMs case. > > (good example is x86, where it scales well without any hotplug) > > > Sure, agreed. that’s definitely one of the way to improve but it can also be > add-on? > > > > > > PS: > > Out of curiosity, I've just fed qemu to perf on Ampere host. > > There are a number of low hanging fruits that would reduce consumed > > CPU cycles, for those who wishes to improve performance. > > > > Some would lead to overall improvements, other could be done when > > vCPU is disabled. > > > > ex: with -enable-kvm -smp 100 > > - 26.18% arm_cpu_realizefn > > + 8.86% cpu_reset <- likely is not necessary for disabled > > vCPUs, as one has to call it again when enabling vCPU > > + 4.53% register_cp_regs_for_features > > + 4.32% arm_cpu_register_gdb_regs_for_features <- do we need > > it when gdbstub is not enabled? > > + 4.09% init_cpreg_list <- naive refactoring makes it a fraction of > percent > > + 3.40% qemu_init_vcpu > > 0.59% trace_init_vcpu > > > > with above hacks it goes down to 11% (12% for x86_cpu_realizefn). > > However wall-clock wise compared to x86, the arm/virt doesn't scale well. > > So there is something else tgetting in the way (i.e. something stalls > > vCPU creation on ARM?). > > > Thanks for these leads but I will need to spend time in this direction to > check > and verify these. I'll get back to you regarding your question. > > > > > > And well, I' might be comparing apples to oranges here (old E5-2630v3 > > vs some 32core 3.3Ghz Ampere cpu). > > True. > > > > > PS2: > > create-gic() is another candidate for improvement, it spends a lot > > of time on setting GPIO properties. > > Yes, indeed. These are very helpful. I will ensure that I look into this > direction as well regardless of the vCPU hotplug patches. Maybe we can > combine all good things together. > > > > > > > > > + bool is_enabled; > > > > I'd move introduction of this field into a separate patch. > > > > > > > > BTW: new ABI/fields accessible by guest should be described in > > > > docs/specs/acpi_cpu_hotplug.rst. > > > > It would be better to have the spec as patch 1st, that we all > > agree on > > and then follow with implementation. > > > > > > > > > > We can do that. > > > > > > > > > > And also include there an expected workflow for standby case. > > > > > > > > > > > > We need more discussion on this as our requirements for which we > > floated > > ^^^^^^^^^^^^^^ > this > > > feature are not being met using stand-by cases. > > > > A list of them would be a good starting point for discussion. > > Sure, agreed. I'll jot them down as part of the main series. > > > Perhaps requirements should be made more realistic and be re- > evaluated. > > > Requirements are clearly indicated on the cover-letter of the main series. > > > Best regards > Salil.