Re: [PATCH 0/9] arm64: Stolen time support
Hi Steven, On 2020/7/27 18:48, Steven Price wrote: > On 21/07/2020 04:26, zhukeqian wrote: >> Hi Steven, > > Hi Keqian, > >> On 2019/8/2 22:50, Steven Price wrote: >>> This series add support for paravirtualized time for arm64 guests and >>> KVM hosts following the specification in Arm's document DEN 0057A: >>> >>> https://developer.arm.com/docs/den0057/a >>> >>> It implements support for stolen time, allowing the guest to >>> identify time when it is forcibly not executing. >>> >>> It doesn't implement support for Live Physical Time (LPT) as there are >>> some concerns about the overheads and approach in the above >> Do you plan to pick up LPT support? As there is demand of cross-frequency >> migration >> (from older platform to newer platform). > > I don't have any plans to pick up the LPT support at the moment - feel free > to pick it up! ;) > >> I am not clear about the overheads and approach problem here, could you >> please >> give some detail information? Maybe we can work together to solve these >> concerns. :-) > > Fundamentally the issue here is that LPT only solves one small part of > migration between different hosts. To successfully migrate between hosts with > different CPU implementations it is also necessary to be able to virtualise > various ID registers (e.g. MIDR_EL1, REVIDR_EL1, AIDR_EL1) which we have no > support for currently. > Yeah, currently we are trying to do both timer freq virtualization and CPU feature virtualization. > The problem with just virtualising the registers is how you handle errata. > The guest will currently use those (and other) ID registers to decide whether > to enable specific errata workarounds. But what errata should be enabled for > a guest which might migrate to another host? > Thanks for pointing this out. I think the most important thing is that we should introduce a concept named CPU baseline which represents a standard platform. If we bring up a guest with a specific CPU baseline, then this guest can only run on a platform that is compatible with this CPU baseline. So "baseline" and "compatible" are the key point to promise successful cross-platform migration. > What we ideally need is a mechanism to communicate to the guest what > workarounds are required to successfully run on any of the hosts that the > guest may be migrated to. You may also have the situation where the > workarounds required for two hosts are mutually incompatible - something > needs to understand this and do the "right thing" (most likely just reject > this situation, i.e. prevent the migration). > > There are various options here: e.g. a para-virtualised interface to describe > the workarounds (but this is hard to do in an OS-agnostic way), or virtual-ID > registers describing an idealised environment where no workarounds are > required (and only hosts that have no errata affecting a guest would be able > to provide this). > My idea is similar with the "idealised environment", but errata workaround still exists. We do not provide para-virtualised interface, and migration is restricted between platforms that are compatible with baseline. Baseline should has two aspects: CPU feature and errata. These platforms that are compatible with a specific baseline should have the corresponding CPU feature and errata. > Given the above complexity and the fact that Armv8.6-A standardises the > frequency to 1GHz this didn't seem worth continuing with. So LPT was dropped > from the spec and patches to avoid holding up the stolen time support. > > However, if you have a use case which doesn't require such a generic > migration (e.g. perhaps old and new platforms are based on the same IP) then > it might be worth looking at bring this back. But to make the problem > solvable it either needs to be restricted to platforms which are > substantially the same (so the errata list will be identical), or there's > work to be done in preparation to deal with migrating a guest successfully > between hosts with potentially different errata requirements. > > Can you share more details about the hosts that you are interested in > migrating between? Here we have new platform with 1GHz timer, and old platform is 100MHZ, so we want to solve the cross-platform migration firstly. Thanks, Keqian > > Thanks, > > Steve > . >
Re: [PATCH 0/9] arm64: Stolen time support
On 21/07/2020 04:26, zhukeqian wrote: Hi Steven, Hi Keqian, On 2019/8/2 22:50, Steven Price wrote: This series add support for paravirtualized time for arm64 guests and KVM hosts following the specification in Arm's document DEN 0057A: https://developer.arm.com/docs/den0057/a It implements support for stolen time, allowing the guest to identify time when it is forcibly not executing. It doesn't implement support for Live Physical Time (LPT) as there are some concerns about the overheads and approach in the above Do you plan to pick up LPT support? As there is demand of cross-frequency migration (from older platform to newer platform). I don't have any plans to pick up the LPT support at the moment - feel free to pick it up! ;) I am not clear about the overheads and approach problem here, could you please give some detail information? Maybe we can work together to solve these concerns. :-) Fundamentally the issue here is that LPT only solves one small part of migration between different hosts. To successfully migrate between hosts with different CPU implementations it is also necessary to be able to virtualise various ID registers (e.g. MIDR_EL1, REVIDR_EL1, AIDR_EL1) which we have no support for currently. The problem with just virtualising the registers is how you handle errata. The guest will currently use those (and other) ID registers to decide whether to enable specific errata workarounds. But what errata should be enabled for a guest which might migrate to another host? What we ideally need is a mechanism to communicate to the guest what workarounds are required to successfully run on any of the hosts that the guest may be migrated to. You may also have the situation where the workarounds required for two hosts are mutually incompatible - something needs to understand this and do the "right thing" (most likely just reject this situation, i.e. prevent the migration). There are various options here: e.g. a para-virtualised interface to describe the workarounds (but this is hard to do in an OS-agnostic way), or virtual-ID registers describing an idealised environment where no workarounds are required (and only hosts that have no errata affecting a guest would be able to provide this). Given the above complexity and the fact that Armv8.6-A standardises the frequency to 1GHz this didn't seem worth continuing with. So LPT was dropped from the spec and patches to avoid holding up the stolen time support. However, if you have a use case which doesn't require such a generic migration (e.g. perhaps old and new platforms are based on the same IP) then it might be worth looking at bring this back. But to make the problem solvable it either needs to be restricted to platforms which are substantially the same (so the errata list will be identical), or there's work to be done in preparation to deal with migrating a guest successfully between hosts with potentially different errata requirements. Can you share more details about the hosts that you are interested in migrating between? Thanks, Steve
Re: [PATCH 0/9] arm64: Stolen time support
Hi Steven, On 2019/8/2 22:50, Steven Price wrote: > This series add support for paravirtualized time for arm64 guests and > KVM hosts following the specification in Arm's document DEN 0057A: > > https://developer.arm.com/docs/den0057/a > > It implements support for stolen time, allowing the guest to > identify time when it is forcibly not executing. > > It doesn't implement support for Live Physical Time (LPT) as there are > some concerns about the overheads and approach in the above Do you plan to pick up LPT support? As there is demand of cross-frequency migration (from older platform to newer platform). I am not clear about the overheads and approach problem here, could you please give some detail information? Maybe we can work together to solve these concerns. :-) Thanks, Keqian > specification, and I expect an updated version of the specification to > be released soon with just the stolen time parts. > > I previously posted a series including LPT (as well as stolen time): > https://lore.kernel.org/kvmarm/20181212150226.38051-1-steven.pr...@arm.com/ > > Patches 2, 5, 7 and 8 are cleanup patches and could be taken separately. > > Christoffer Dall (1): > KVM: arm/arm64: Factor out hypercall handling from PSCI code > > Steven Price (8): > KVM: arm64: Document PV-time interface > KVM: arm64: Implement PV_FEATURES call > KVM: arm64: Support stolen time reporting via shared structure > KVM: Allow kvm_device_ops to be const > KVM: arm64: Provide a PV_TIME device to user space > arm/arm64: Provide a wrapper for SMCCC 1.1 calls > arm/arm64: Make use of the SMCCC 1.1 wrapper > arm64: Retrieve stolen time as paravirtualized guest > > Documentation/virtual/kvm/arm/pvtime.txt | 107 + > arch/arm/kvm/Makefile| 2 +- > arch/arm/kvm/handle_exit.c | 2 +- > arch/arm/mm/proc-v7-bugs.c | 13 +- > arch/arm64/include/asm/kvm_host.h| 13 +- > arch/arm64/include/asm/kvm_mmu.h | 2 + > arch/arm64/include/asm/pvclock-abi.h | 20 +++ > arch/arm64/include/uapi/asm/kvm.h| 6 + > arch/arm64/kernel/Makefile | 1 + > arch/arm64/kernel/cpu_errata.c | 80 -- > arch/arm64/kernel/kvm.c | 155 ++ > arch/arm64/kvm/Kconfig | 1 + > arch/arm64/kvm/Makefile | 2 + > arch/arm64/kvm/handle_exit.c | 4 +- > include/kvm/arm_hypercalls.h | 44 ++ > include/kvm/arm_psci.h | 2 +- > include/linux/arm-smccc.h| 58 +++ > include/linux/cpuhotplug.h | 1 + > include/linux/kvm_host.h | 4 +- > include/linux/kvm_types.h| 2 + > include/uapi/linux/kvm.h | 2 + > virt/kvm/arm/arm.c | 18 +++ > virt/kvm/arm/hypercalls.c| 138 > virt/kvm/arm/mmu.c | 44 ++ > virt/kvm/arm/psci.c | 84 +- > virt/kvm/arm/pvtime.c| 190 +++ > virt/kvm/kvm_main.c | 6 +- > 27 files changed, 848 insertions(+), 153 deletions(-) > create mode 100644 Documentation/virtual/kvm/arm/pvtime.txt > create mode 100644 arch/arm64/include/asm/pvclock-abi.h > create mode 100644 arch/arm64/kernel/kvm.c > create mode 100644 include/kvm/arm_hypercalls.h > create mode 100644 virt/kvm/arm/hypercalls.c > create mode 100644 virt/kvm/arm/pvtime.c >
Re: [PATCH 0/9] arm64: Stolen time support
On 05/08/2019 14:06, Steven Price wrote: > On 03/08/2019 19:05, Marc Zyngier wrote: >> On Fri, 2 Aug 2019 15:50:08 +0100 >> Steven Price wrote: >> >> Hi Steven, >> >>> This series add support for paravirtualized time for arm64 guests and >>> KVM hosts following the specification in Arm's document DEN 0057A: >>> >>> https://developer.arm.com/docs/den0057/a >>> >>> It implements support for stolen time, allowing the guest to >>> identify time when it is forcibly not executing. >>> >>> It doesn't implement support for Live Physical Time (LPT) as there are >>> some concerns about the overheads and approach in the above >>> specification, and I expect an updated version of the specification to >>> be released soon with just the stolen time parts. >> >> Thanks for posting this. >> >> My current concern with this series is around the fact that we allocate >> memory from the kernel on behalf of the guest. It is the first example >> of such thing in the ARM port, and I can't really say I'm fond of it. >> >> x86 seems to get away with it by having the memory allocated from >> userspace, why I tend to like more. Yes, put_user is more >> expensive than a straight store, but this isn't done too often either. >> >> What is the rational for your current approach? > > As I see it there are 3 approaches that can be taken here: > > 1. Hypervisor allocates memory and adds it to the virtual machine. This > means that everything to do with the 'device' is encapsulated behind the > KVM_CREATE_DEVICE / KVM_[GS]ET_DEVICE_ATTR ioctls. But since we want the > stolen time structure to be fast it cannot be a trapping region and has > to be backed by real memory - in this case allocated by the host kernel. > > 2. Host user space allocates memory. Similar to above, but this time > user space needs to manage the memory region as well as the usual > KVM_CREATE_DEVICE dance. I've no objection to this, but it means > kvmtool/QEMU needs to be much more aware of what is going on (e.g. how > to size the memory region). > > 3. Guest kernel "donates" the memory to the hypervisor for the > structure. As far as I'm aware this is what x86 does. The problems I see > this approach are: > > a) kexec becomes much more tricky - there needs to be a disabling > mechanism for the guest to stop the hypervisor scribbling on memory > before starting the new kernel. > > b) If there is more than one entity that is interested in the > information (e.g. firmware and kernel) then this requires some form of > arbitration in the guest because the hypervisor doesn't want to have to > track an arbitrary number of regions to update. > > c) Performance can suffer if the host kernel doesn't have a suitably > aligned/sized area to use. As you say - put_user() is more expensive. > The structure is updated on every return to the VM. > > > Of course x86 does prove the third approach can work, but I'm not sure > which is actually better. Avoid the kexec cancellation requirements was > the main driver of the current approach. Although many of the > conversations about this were also tied up with Live Physical Time which > adds its own complications. My current train of thoughts is around (2): - We don't need a new mechanism to track pages or deal with overlapping IPA ranges - We can get rid of the save/restore interface The drawback is that the amount of memory required per vcpu becomes ABI. I don't think that's a huge deal, as the hypervisor has the same contract with the guest. We also take a small hit with put_user(), but this is only done as a consequence of vcpu_load() (and not on every entry as you suggest above). It'd be worth quantifying this overhead before making any decision one way or another. Thanks, M. -- Jazz is not dead, it just smells funny...
Re: [PATCH 0/9] arm64: Stolen time support
On 03/08/2019 19:05, Marc Zyngier wrote: > On Fri, 2 Aug 2019 15:50:08 +0100 > Steven Price wrote: > > Hi Steven, > >> This series add support for paravirtualized time for arm64 guests and >> KVM hosts following the specification in Arm's document DEN 0057A: >> >> https://developer.arm.com/docs/den0057/a >> >> It implements support for stolen time, allowing the guest to >> identify time when it is forcibly not executing. >> >> It doesn't implement support for Live Physical Time (LPT) as there are >> some concerns about the overheads and approach in the above >> specification, and I expect an updated version of the specification to >> be released soon with just the stolen time parts. > > Thanks for posting this. > > My current concern with this series is around the fact that we allocate > memory from the kernel on behalf of the guest. It is the first example > of such thing in the ARM port, and I can't really say I'm fond of it. > > x86 seems to get away with it by having the memory allocated from > userspace, why I tend to like more. Yes, put_user is more > expensive than a straight store, but this isn't done too often either. > > What is the rational for your current approach? As I see it there are 3 approaches that can be taken here: 1. Hypervisor allocates memory and adds it to the virtual machine. This means that everything to do with the 'device' is encapsulated behind the KVM_CREATE_DEVICE / KVM_[GS]ET_DEVICE_ATTR ioctls. But since we want the stolen time structure to be fast it cannot be a trapping region and has to be backed by real memory - in this case allocated by the host kernel. 2. Host user space allocates memory. Similar to above, but this time user space needs to manage the memory region as well as the usual KVM_CREATE_DEVICE dance. I've no objection to this, but it means kvmtool/QEMU needs to be much more aware of what is going on (e.g. how to size the memory region). 3. Guest kernel "donates" the memory to the hypervisor for the structure. As far as I'm aware this is what x86 does. The problems I see this approach are: a) kexec becomes much more tricky - there needs to be a disabling mechanism for the guest to stop the hypervisor scribbling on memory before starting the new kernel. b) If there is more than one entity that is interested in the information (e.g. firmware and kernel) then this requires some form of arbitration in the guest because the hypervisor doesn't want to have to track an arbitrary number of regions to update. c) Performance can suffer if the host kernel doesn't have a suitably aligned/sized area to use. As you say - put_user() is more expensive. The structure is updated on every return to the VM. Of course x86 does prove the third approach can work, but I'm not sure which is actually better. Avoid the kexec cancellation requirements was the main driver of the current approach. Although many of the conversations about this were also tied up with Live Physical Time which adds its own complications. Steve