Re: [PATCH v8 4/7] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
Hi Thomas, On Fri, Oct 1, 2021 at 12:59 PM Thomas Gleixner wrote: > > Marcelo, > > On Fri, Oct 01 2021 at 09:05, Marcelo Tosatti wrote: > > On Fri, Oct 01, 2021 at 01:02:23AM +0200, Thomas Gleixner wrote: > >> But even if that would be the case, then what prevents the stale time > >> stamps to be visible? Nothing: > >> > >> T0:t = now(); > >> -> pause > >> -> resume > >> -> magic "fixup" > >> T1:dostuff(t); > > > > Yes. > > > > BTW, you could have a userspace notification (then applications > > could handle this if desired). > > Well, we have that via timerfd with TFD_TIMER_CANCEL_ON_SET for > CLOCK_REALTIME. That's what applications which are sensitive to clock > REALTIME jumps use today. > > >> Now the proposed change is creating exactly the same problem: > >> > >> >> > + if (data.flags & KVM_CLOCK_REALTIME) { > >> >> > + u64 now_real_ns = ktime_get_real_ns(); > >> >> > + > >> >> > + /* > >> >> > + * Avoid stepping the kvmclock backwards. > >> >> > + */ > >> >> > + if (now_real_ns > data.realtime) > >> >> > + data.clock += now_real_ns - data.realtime; > >> >> > + } > >> > >> IOW, it takes the time between pause and resume into account and > >> forwards the underlying base clock which makes CLOCK_MONOTONIC > >> jump forward by exactly that amount of time. > > > > Well, it is assuming that the > > > > T0:t = now(); > > T1:pause vm() > > T2: finish vm migration() > > T3:dostuff(t); > > > > Interval between T1 and T2 is small (and that the guest > > clocks are synchronized up to a given boundary). > > Yes, I understand that, but it's an assumption and there is no boundary > for the time jump in the proposed patches, which rings my alarm bells :) > > > But i suppose adding a limit to the forward clock advance > > (in the migration case) is useful: > > > > 1) If migration (well actually, only the final steps > > to finish migration, the time between when guest is paused > > on source and is resumed on destination) takes too long, > > then too bad: fix it to be shorter if you want the clocks > > to have close to zero change to realtime on migration. > > > > 2) Avoid the other bugs in case of large forward advance. > > > > Maybe having it configurable, with a say, 1 minute maximum by default > > is a good choice? > > Don't know what 1 minute does in terms of applications etc. You have to > do some experiments on that. I debated quite a bit on what the absolute limit should be for advancing the KVM clock, and settled on doing no checks in the kernel besides the monotonicity invariant. End of the day, userspace can ignore all of the rules that KVM will try to enforce on the kvm clock/TSC and jump it as it sees fit (both are already directly writable). But I agree that there has to be some reason around what is acceptable. We have an absolute limit on how far forward we will yank the KVM clock and TSC in our userspace, but of course it has a TOCTOU problem for whatever madness can come in between userspace and the time the kernel actually services the ioctl. -- Thanks, Oliver > > An alternative would be to advance only the guests REALTIME clock, from > > data about how long steps T1-T2 took. > > Yes, that's what would happen in the cooperative S2IDLE or S3 case when > the guest resumes. > > >> So now virt came along and created a hard to solve circular dependency > >> problem: > >> > >>- If CLOCK_MONOTONIC stops for too long then NTP/PTP gets out of > >> sync, but everything else is happy. > >> > >>- If CLOCK_MONOTONIC jumps too far forward, then all hell breaks > >> lose, but NTP/PTP is happy. > > > > One must handle the > > > > T0:t = now(); > > -> pause > > -> resume > > -> magic "fixup" > > T1:dostuff(t); > > > > fact if one is going to use savevm/restorevm anyway, so... > > (it is kind of unfixable, unless you modify your application > > to accept notifications to redo any computation based on t, isnt it?). > > Well yes, but what applications can deal with is CLOCK_REALTIME jumping > because that's a property of it. Not so much the CLOCK_MONOTONIC part. > > >> If you decide that correctness is overrated, then please document it > >> clearly instead of trying to pretend being correct. > > > > Based on the above, advancing only CLOCK_REALTIME (and not CLOCK_MONOTONIC) > > would be correct, right? And its probably not very hard to do. > > Time _is_ hard to get right. > > So you might experiment with something like this as a stop gap: > > Provide the guest something like this: > > u64 migration_seq; > u64 realtime_delta_ns; > > in the shared clock page > > Do not forward jump clock MONOTONIC. > > On resume kick an IPI where the guest handler does: > > if (clock_data->migration_
Re: [PATCH v8 7/7] KVM: x86: Expose TSC offset controls to userspace
Marcelo, On Fri, Oct 1, 2021 at 12:11 PM Marcelo Tosatti wrote: > > On Fri, Oct 01, 2021 at 05:12:20PM +0200, Paolo Bonzini wrote: > > On 01/10/21 12:32, Marcelo Tosatti wrote: > > > > +1. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (t_0), + > > > > kvmclock nanoseconds (k_0), and realtime nanoseconds (r_0). + [...] > > > > +4. Invoke the KVM_SET_CLOCK ioctl, providing the kvmclock > > > > nanoseconds + (k_0) and realtime nanoseconds (r_0) in their > > > > respective fields. + Ensure that the KVM_CLOCK_REALTIME flag is > > > > set in the provided + structure. KVM will advance the VM's > > > > kvmclock to account for elapsed + time since recording the clock > > > > values. > > > > > > You can't advance both kvmclock (kvmclock_offset variable) and the > > > TSCs, which would be double counting. > > > > > > So you have to either add the elapsed realtime (1) between > > > KVM_GET_CLOCK to kvmclock (which this patch is doing), or to the > > > TSCs. If you do both, there is double counting. Am i missing > > > something? > > > > Probably one of these two (but it's worth pointing out both of them): > > > > 1) the attribute that's introduced here *replaces* > > KVM_SET_MSR(MSR_IA32_TSC), so the TSC is not added. > > > > 2) the adjustment formula later in the algorithm does not care about how > > much time passed between step 1 and step 4. It just takes two well > > known (TSC, kvmclock) pairs, and uses them to ensure the guest TSC is > > the same on the destination as if the guest was still running on the > > source. It is irrelevant that one of them is before migration and one > > is after, all it matters is that one is on the source and one is on the > > destination. > > OK, so it still relies on NTPd daemon to fix the CLOCK_REALTIME delay > which is introduced during migration (which is what i would guess is > the lower hanging fruit) (for guests using TSC). The series gives userspace the ability to modify the guest's perception of the TSC in whatever way it sees fit. The algorithm in the documentation provides a suggestion to userspace on how to do exactly that. I kept that advancement logic out of the kernel because IMO it is an implementation detail: users have differing opinions on how clocks should behave across a migration and KVM shouldn't have any baked-in rules around it. At the same time, userspace can choose to _not_ jump the TSC and use the available interfaces to just migrate the existing state of the TSCs. When I had initially proposed this series upstream, Paolo astutely pointed out that there was no good way to get a (CLOCK_REALTIME, TSC) pairing, which is critical for the TSC advancement algorithm in the documentation. Google's best way to get (CLOCK_REALTIME, TSC) exists in userspace [1], hence the missing kvm clock changes. So, in all, the spirit of the KVM clock changes is to provide missing UAPI around the clock/TSC, with the side effect of changing the guest-visible value. [1] https://cloud.google.com/spanner/docs/true-time-external-consistency > My point was that, by advancing the _TSC value_ by: > > T0. stop guest vcpus(source) > T1. KVM_GET_CLOCK (source) > T2. KVM_SET_CLOCK (destination) > T3. Write guest TSCs(destination) > T4. resume guest(destination) > > new_off_n = t_0 + off_n + (k_1 - k_0) * freq - t_1 > > t_0:host TSC at KVM_GET_CLOCK time. > off_n: TSC offset at vcpu-n (as long as no guest TSC writes are performed, > TSC offset is fixed). > ... > > +4. Invoke the KVM_SET_CLOCK ioctl, providing the kvmclock nanoseconds > + (k_0) and realtime nanoseconds (r_0) in their respective fields. > + Ensure that the KVM_CLOCK_REALTIME flag is set in the provided > + structure. KVM will advance the VM's kvmclock to account for elapsed > + time since recording the clock values. > > Only kvmclock is advanced (by passing r_0). But a guest might not use kvmclock > (hopefully modern guests on modern hosts will use TSC clocksource, > whose clock_gettime is faster... some people are using that already). > Hopefully the above explanation made it clearer how the TSCs are supposed to get advanced, and why it isn't done in the kernel. > At some point QEMU should enable invariant TSC flag by default? > > That said, the point is: why not advance the _TSC_ values > (instead of kvmclock nanoseconds), as doing so would reduce > the "the CLOCK_REALTIME delay which is introduced during migration" > for both kvmclock users and modern tsc clocksource users. > > So yes, i also like this patchset, but would like it even more > if it fixed the case above as well (and not sure whether adding > the migration delta to KVMCLOCK makes it harder to fix TSC case > later). > > > Perhaps we can add to step 6 something like: > > > > > +6. Adjust the guest TSC offsets for every vCPU to account for (1) > > > time + elapsed since recording state and (2) difference in TSCs > > > between the + source and destination machine: + + new_off_n = t_0 > > > +
Re: [PATCH v2 03/11] KVM: arm64: Encapsulate reset request logic in a helper function
On Thu, Sep 30, 2021 at 11:05 PM Reiji Watanabe wrote: > > On Thu, Sep 23, 2021 at 12:16 PM Oliver Upton wrote: > > > > In its implementation of the PSCI function, KVM needs to request that a > > target vCPU resets before its next entry into the guest. Wrap the logic > > for requesting a reset in a function for later use by other implemented > > PSCI calls. > > > > No functional change intended. > > > > Signed-off-by: Oliver Upton > > --- > > arch/arm64/kvm/psci.c | 59 +-- > > 1 file changed, 35 insertions(+), 24 deletions(-) > > > > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c > > index 310b9cb2b32b..bb59b692998b 100644 > > --- a/arch/arm64/kvm/psci.c > > +++ b/arch/arm64/kvm/psci.c > > @@ -64,9 +64,40 @@ static inline bool kvm_psci_valid_affinity(unsigned long > > affinity) > > return !(affinity & ~MPIDR_HWID_BITMASK); > > } > > > > -static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu) > > +static void kvm_psci_vcpu_request_reset(struct kvm_vcpu *vcpu, > > + unsigned long entry_addr, > > + unsigned long context_id, > > + bool big_endian) > > { > > struct vcpu_reset_state *reset_state; > > + > > + lockdep_assert_held(&vcpu->kvm->lock); > > + > > + reset_state = &vcpu->arch.reset_state; > > + reset_state->pc = entry_addr; > > + > > + /* Propagate caller endianness */ > > + reset_state->be = big_endian; > > + > > + /* > > +* NOTE: We always update r0 (or x0) because for PSCI v0.1 > > +* the general purpose registers are undefined upon CPU_ON. > > +*/ > > + reset_state->r0 = context_id; > > + > > + WRITE_ONCE(reset_state->reset, true); > > + kvm_make_request(KVM_REQ_VCPU_RESET, vcpu); > > + > > + /* > > +* Make sure the reset request is observed if the change to > > +* power_state is observed. > > +*/ > > + smp_wmb(); > > + vcpu->arch.power_off = false; > > +} > > + > > +static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu) > > +{ > > struct kvm *kvm = source_vcpu->kvm; > > struct kvm_vcpu *vcpu = NULL; > > unsigned long cpu_id; > > @@ -90,29 +121,9 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu > > *source_vcpu) > > return PSCI_RET_INVALID_PARAMS; > > } > > > > - reset_state = &vcpu->arch.reset_state; > > - > > - reset_state->pc = smccc_get_arg2(source_vcpu); > > - > > - /* Propagate caller endianness */ > > - reset_state->be = kvm_vcpu_is_be(source_vcpu); > > - > > - /* > > -* NOTE: We always update r0 (or x0) because for PSCI v0.1 > > -* the general purpose registers are undefined upon CPU_ON. > > -*/ > > - reset_state->r0 = smccc_get_arg3(source_vcpu); > > - > > - WRITE_ONCE(reset_state->reset, true); > > - kvm_make_request(KVM_REQ_VCPU_RESET, vcpu); > > - > > - /* > > -* Make sure the reset request is observed if the change to > > -* power_state is observed. > > -*/ > > - smp_wmb(); > > - > > - vcpu->arch.power_off = false; > > + kvm_psci_vcpu_request_reset(vcpu, smccc_get_arg2(source_vcpu), > > + smccc_get_arg3(source_vcpu), > > + kvm_vcpu_is_be(source_vcpu)); > > kvm_vcpu_wake_up(vcpu); > > > > return PSCI_RET_SUCCESS; > > -- > > 2.33.0.685.g46640cef36-goog > > Reviewed-by: Reiji Watanabe > > Not directly related to the patch, but the (original) code doesn't > do any sanity checking for the entry address although the PSCI spec says: > > "INVALID_ADDRESS is returned when the entry point address is known > by the implementation to be invalid, because it is in a range that > is known not to be available to the caller." Right, I had noticed the same but was a tad too lazy to address in this series :) Thanks for the review, Reji! -- Best, Oliver ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v8 4/7] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
Marcelo, On Fri, Oct 01 2021 at 09:05, Marcelo Tosatti wrote: > On Fri, Oct 01, 2021 at 01:02:23AM +0200, Thomas Gleixner wrote: >> But even if that would be the case, then what prevents the stale time >> stamps to be visible? Nothing: >> >> T0:t = now(); >> -> pause >> -> resume >> -> magic "fixup" >> T1:dostuff(t); > > Yes. > > BTW, you could have a userspace notification (then applications > could handle this if desired). Well, we have that via timerfd with TFD_TIMER_CANCEL_ON_SET for CLOCK_REALTIME. That's what applications which are sensitive to clock REALTIME jumps use today. >> Now the proposed change is creating exactly the same problem: >> >> >> > + if (data.flags & KVM_CLOCK_REALTIME) { >> >> > + u64 now_real_ns = ktime_get_real_ns(); >> >> > + >> >> > + /* >> >> > + * Avoid stepping the kvmclock backwards. >> >> > + */ >> >> > + if (now_real_ns > data.realtime) >> >> > + data.clock += now_real_ns - data.realtime; >> >> > + } >> >> IOW, it takes the time between pause and resume into account and >> forwards the underlying base clock which makes CLOCK_MONOTONIC >> jump forward by exactly that amount of time. > > Well, it is assuming that the > > T0:t = now(); > T1:pause vm() > T2: finish vm migration() > T3:dostuff(t); > > Interval between T1 and T2 is small (and that the guest > clocks are synchronized up to a given boundary). Yes, I understand that, but it's an assumption and there is no boundary for the time jump in the proposed patches, which rings my alarm bells :) > But i suppose adding a limit to the forward clock advance > (in the migration case) is useful: > > 1) If migration (well actually, only the final steps > to finish migration, the time between when guest is paused > on source and is resumed on destination) takes too long, > then too bad: fix it to be shorter if you want the clocks > to have close to zero change to realtime on migration. > > 2) Avoid the other bugs in case of large forward advance. > > Maybe having it configurable, with a say, 1 minute maximum by default > is a good choice? Don't know what 1 minute does in terms of applications etc. You have to do some experiments on that. > An alternative would be to advance only the guests REALTIME clock, from > data about how long steps T1-T2 took. Yes, that's what would happen in the cooperative S2IDLE or S3 case when the guest resumes. >> So now virt came along and created a hard to solve circular dependency >> problem: >> >>- If CLOCK_MONOTONIC stops for too long then NTP/PTP gets out of >> sync, but everything else is happy. >> >>- If CLOCK_MONOTONIC jumps too far forward, then all hell breaks >> lose, but NTP/PTP is happy. > > One must handle the > > T0:t = now(); > -> pause > -> resume > -> magic "fixup" > T1:dostuff(t); > > fact if one is going to use savevm/restorevm anyway, so... > (it is kind of unfixable, unless you modify your application > to accept notifications to redo any computation based on t, isnt it?). Well yes, but what applications can deal with is CLOCK_REALTIME jumping because that's a property of it. Not so much the CLOCK_MONOTONIC part. >> If you decide that correctness is overrated, then please document it >> clearly instead of trying to pretend being correct. > > Based on the above, advancing only CLOCK_REALTIME (and not CLOCK_MONOTONIC) > would be correct, right? And its probably not very hard to do. Time _is_ hard to get right. So you might experiment with something like this as a stop gap: Provide the guest something like this: u64 migration_seq; u64 realtime_delta_ns; in the shared clock page Do not forward jump clock MONOTONIC. On resume kick an IPI where the guest handler does: if (clock_data->migration_seq == migration_seq) return; migration_seq = clock_data->migration_seq; ts64 = { 0, 0 }; timespec64_add_ns(&ts64, clock_data->realtime_delta_ns); timekeeping_inject_sleeptime64(&ts64); Make sure that the IPI completes before you migrate the guest another time or implement it slightly smarter, but you get the idea :) That's what we use for suspend time injection, but it should just work without frozen tasks as well. It will forward clock REALTIME by the amount of time spent during migration. It'll also modify the BOOTTIME offset by the same amount, but that's not a tragedy. The function will also reset NTP state so the NTP/PTP daemon knows that there was a kernel initiated time jump and it can work out easily what to do like it does on resume from an actual suspend. It will also invoke clock_was_set() which makes all the other time related updates trigger and wakeup tasks w
Re: [PATCH v8 4/7] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
On Fri, Oct 1, 2021 at 7:41 AM Paolo Bonzini wrote: > > On 01/10/21 16:39, Paolo Bonzini wrote: > > On 16/09/21 20:15, Oliver Upton wrote: > >> +if (data.flags & ~KVM_CLOCK_REALTIME) > >> return -EINVAL; > > > > Let's accept KVM_CLOCK_HOST_TSC here even though it's not used; there > > may be programs that expect to send back to KVM_SET_CLOCK whatever they > > got from KVM_GET_CLOCK. > > Nevermind, KVM_SET_CLOCK is already rejecting KVM_CLOCK_TSC_STABLE so no > need to do that! Yeah, I don't know the story on the interface but it is really odd that userspace needs to blow away flags to successfully write the clock structure. -- Thanks, Oliver ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
On Fri, Oct 1, 2021 at 4:43 AM Marc Zyngier wrote: > > On Thu, 30 Sep 2021 18:24:23 +0100, > Oliver Upton wrote: > > > > Hey Marc, > > > > On Thu, Sep 30, 2021 at 12:32 AM Marc Zyngier wrote: > > > > > > Hi Oliver, > > > > > > On Wed, 29 Sep 2021 19:22:05 +0100, > > > Oliver Upton wrote: > > > > > > > > I have some lingering thoughts on this subject since we last spoke and > > > > wanted to discuss. > > > > > > > > I'm having a hard time figuring out how a VMM should handle a new > > > > hypercall identity register introduced on a newer kernel. In order to > > > > maintain guest ABI, the VMM would need to know about that register and > > > > zero it when restoring an older guest. > > > > > > Just as it would need to be able to discover any new system register > > > exposed by default, as it happens at all times. Which is why we have a > > > way to discover all the registers, architected or not. > > > > > > > Perhaps instead we could reserve a range of firmware registers as the > > > > 'hypercall identity' registers. Implement all of them as RAZ/WI by > > > > default, encouraging userspace to zero these registers away for older > > > > VMs but still allowing an old userspace to pick up new KVM features. > > > > Doing so would align the hypercall identity registers with the feature > > > > ID registers from the architecture. > > > > > > The range already exists in the form of the "coprocessor" 0x14. I > > > don't see the need to expose it as RAZ/WI, however. If userspace > > > doesn't know about how this pseudo-register works, it won't be able to > > > program it anyway. > > > > > > I don't buy the parallel with the ID-regs either. They are RAZ/WI by > > > default so that they don't UNDEF at runtime. The meaning of a RAZ > > > id-register is also well defined (feature not implemented), and the > > > CPU cannot write to them. In a way, the ID-regs *are* the enumeration > > > mechanism. > > > > > > Our firmware registers don't follow the same rules. Userspace can > > > write to them, and there is no such "not implemented" rule (case in > > > point, PSCI). We also have a separate enumeration mechanism > > > (GET_ONE_REG), which is (more or less) designed for userspace to find > > > what is implemented. > > > > > > For these reasons, I don't immediately see the point of advertising a > > > set of registers ahead of time, before userspace grows an > > > understanding of what these registers mean. > > > > Supposing we don't preallocate some hypercall ID registers, how can we > > safely migrate a guest from an older kernel to newer kernel? Ideally, > > we would preserve the hypercall feature set across the migration which > > could work for a while with the first set of registers that get > > defined, but whenever a new hypercall firmware register comes along > > then the VMM will be clueless to the new ABI. > > My expectations were that whenever userspace writes a set of firmware > register, all the defaults are invalidated. For example say that > host-A know about a single hypercall register, while host-B knows > about two. Upon writing to the first register, the host would clear > the set of available services in the second one. If userspace > eventually writes there, the value would stick if valid. > > Also, remember that pseudo-registers don't have to be 64bit. We could > define a new class of hypercall-specific registers that would be much > wider, and thus have a larger write granularity (KVM supports anything > from 8 to 2048 bits). This would make it pretty easy to implement the > above. > > > Fundamentally, I don't think userspace should need a patch to preserve > > ABI on a newer kernel. Despite that, it would seem that userspace will > > need to learn of any firmware registers that control hypercall > > features which come after the initial set that gets proposed. If > > KVM_GET_REG_LIST were to disambiguate between ID registers (hypercall, > > architectural feature ID registers) from other parts of the vCPU > > state, it would be clear to what registers to zero on a newer kernel. > > Apologies if it is distracting to mention the feature ID registers > > here, but both are on my mind currently and want to make sure there is > > some consistency in how features get handled on newer kernels, > > architected or not. > > The problem I see is that we will always need to grow the HC register > space one way or another, no matter how many we reserve. Your approach > only works if we don't exceed that particular range. Maybe it will > never be a problem, but it remains that this is not scalable. > > If we wanted to be safe, we'd reserve the whole of the possible space > as defined by the SMCCC spec. Given that we currently have two HC > spaces (the ARM-controlled one, and the KVM-specific one), the > function space being 16bits in each case, that's 16kB worth of zeroes > that userspace has to save/restore at all times... I'm not overly > enthusiastic. Yeah, that's definitely overkill. I agree with your po
Re: [PATCH v8 7/7] KVM: x86: Expose TSC offset controls to userspace
On Fri, Oct 01, 2021 at 05:12:20PM +0200, Paolo Bonzini wrote: > On 01/10/21 12:32, Marcelo Tosatti wrote: > > > +1. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (t_0), + > > > kvmclock nanoseconds (k_0), and realtime nanoseconds (r_0). + [...] > > > +4. Invoke the KVM_SET_CLOCK ioctl, providing the kvmclock > > > nanoseconds + (k_0) and realtime nanoseconds (r_0) in their > > > respective fields. + Ensure that the KVM_CLOCK_REALTIME flag is > > > set in the provided + structure. KVM will advance the VM's > > > kvmclock to account for elapsed + time since recording the clock > > > values. > > > > You can't advance both kvmclock (kvmclock_offset variable) and the > > TSCs, which would be double counting. > > > > So you have to either add the elapsed realtime (1) between > > KVM_GET_CLOCK to kvmclock (which this patch is doing), or to the > > TSCs. If you do both, there is double counting. Am i missing > > something? > > Probably one of these two (but it's worth pointing out both of them): > > 1) the attribute that's introduced here *replaces* > KVM_SET_MSR(MSR_IA32_TSC), so the TSC is not added. > > 2) the adjustment formula later in the algorithm does not care about how > much time passed between step 1 and step 4. It just takes two well > known (TSC, kvmclock) pairs, and uses them to ensure the guest TSC is > the same on the destination as if the guest was still running on the > source. It is irrelevant that one of them is before migration and one > is after, all it matters is that one is on the source and one is on the > destination. OK, so it still relies on NTPd daemon to fix the CLOCK_REALTIME delay which is introduced during migration (which is what i would guess is the lower hanging fruit) (for guests using TSC). My point was that, by advancing the _TSC value_ by: T0. stop guest vcpus(source) T1. KVM_GET_CLOCK (source) T2. KVM_SET_CLOCK (destination) T3. Write guest TSCs(destination) T4. resume guest(destination) new_off_n = t_0 + off_n + (k_1 - k_0) * freq - t_1 t_0:host TSC at KVM_GET_CLOCK time. off_n: TSC offset at vcpu-n (as long as no guest TSC writes are performed, TSC offset is fixed). ... +4. Invoke the KVM_SET_CLOCK ioctl, providing the kvmclock nanoseconds + (k_0) and realtime nanoseconds (r_0) in their respective fields. + Ensure that the KVM_CLOCK_REALTIME flag is set in the provided + structure. KVM will advance the VM's kvmclock to account for elapsed + time since recording the clock values. Only kvmclock is advanced (by passing r_0). But a guest might not use kvmclock (hopefully modern guests on modern hosts will use TSC clocksource, whose clock_gettime is faster... some people are using that already). At some point QEMU should enable invariant TSC flag by default? That said, the point is: why not advance the _TSC_ values (instead of kvmclock nanoseconds), as doing so would reduce the "the CLOCK_REALTIME delay which is introduced during migration" for both kvmclock users and modern tsc clocksource users. So yes, i also like this patchset, but would like it even more if it fixed the case above as well (and not sure whether adding the migration delta to KVMCLOCK makes it harder to fix TSC case later). > Perhaps we can add to step 6 something like: > > > +6. Adjust the guest TSC offsets for every vCPU to account for (1) > > time + elapsed since recording state and (2) difference in TSCs > > between the + source and destination machine: + + new_off_n = t_0 > > + off_n + (k_1 - k_0) * freq - t_1 + > > "off + t - k * freq" is the guest TSC value corresponding to a time of 0 > in kvmclock. The above formula ensures that it is the same on the > destination as it was on the source. > > Also, the names are a bit hard to follow. Perhaps > > t_0 tsc_src > t_1 tsc_dest > k_0 guest_src > k_1 guest_dest > r_0 host_src > off_n ofs_src[i] > new_off_n ofs_dest[i] > > Paolo > > ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v2] KVM: arm64: Allow KVM to be disabled from the command line
Although KVM can be compiled out of the kernel, it cannot be disabled at runtime. Allow this possibility by introducing a new mode that will prevent KVM from initialising. This is useful in the (limited) circumstances where you don't want KVM to be available (what is wrong with you?), or when you want to install another hypervisor instead (good luck with that). Reviewed-by: David Brazdil Acked-by: Will Deacon Acked-by: Suzuki K Poulose Signed-off-by: Marc Zyngier --- Notes: v2: Dropped the id_aa64mmfr1_vh=0 setting so that KVM can be disabled and yet stay in VHE mode on platforms that require it. I kept the AB/RB's, but please shout if you disagree! Documentation/admin-guide/kernel-parameters.txt | 2 ++ arch/arm64/include/asm/kvm_host.h | 1 + arch/arm64/kvm/arm.c| 14 +- 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 91ba391f9b32..f268731a3d4d 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2365,6 +2365,8 @@ kvm-arm.mode= [KVM,ARM] Select one of KVM/arm64's modes of operation. + none: Forcefully disable KVM. + nvhe: Standard nVHE-based mode, without support for protected guests. diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index f8be56d5342b..019490c67976 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -58,6 +58,7 @@ enum kvm_mode { KVM_MODE_DEFAULT, KVM_MODE_PROTECTED, + KVM_MODE_NONE, }; enum kvm_mode kvm_get_mode(void); diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index fe102cd2e518..658171231af9 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -2064,6 +2064,11 @@ int kvm_arch_init(void *opaque) return -ENODEV; } + if (kvm_get_mode() == KVM_MODE_NONE) { + kvm_info("KVM disabled from command line\n"); + return -ENODEV; + } + in_hyp_mode = is_kernel_in_hyp_mode(); if (cpus_have_final_cap(ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE) || @@ -2137,8 +2142,15 @@ static int __init early_kvm_mode_cfg(char *arg) return 0; } - if (strcmp(arg, "nvhe") == 0 && !WARN_ON(is_kernel_in_hyp_mode())) + if (strcmp(arg, "nvhe") == 0 && !WARN_ON(is_kernel_in_hyp_mode())) { + kvm_mode = KVM_MODE_DEFAULT; return 0; + } + + if (strcmp(arg, "none") == 0) { + kvm_mode = KVM_MODE_NONE; + return 0; + } return -EINVAL; } -- 2.30.2 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v8 5/7] kvm: x86: protect masterclock with a seqcount
On 16/09/21 20:15, Oliver Upton wrote: + seq = read_seqcount_begin(&ka->pvclock_sc); + do { + use_master_clock = ka->use_master_clock; Oops, the "seq" assignment should be inside the "do". With that fixed, my tests seem to work. I will shortly push the result to kvm/queue. Paolo ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v8 4/7] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
On 01/10/21 17:39, Oliver Upton wrote: Nevermind, KVM_SET_CLOCK is already rejecting KVM_CLOCK_TSC_STABLE so no need to do that! Yeah, I don't know the story on the interface but it is really odd that userspace needs to blow away flags to successfully write the clock structure. Yeah, let's fix it now and accept all three flags I would like that, even though it cannot be fixed in existing kernels. Paolo ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v8 7/7] KVM: x86: Expose TSC offset controls to userspace
On 01/10/21 12:32, Marcelo Tosatti wrote: +1. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (t_0), + kvmclock nanoseconds (k_0), and realtime nanoseconds (r_0). + [...] +4. Invoke the KVM_SET_CLOCK ioctl, providing the kvmclock nanoseconds + (k_0) and realtime nanoseconds (r_0) in their respective fields. + Ensure that the KVM_CLOCK_REALTIME flag is set in the provided + structure. KVM will advance the VM's kvmclock to account for elapsed + time since recording the clock values. You can't advance both kvmclock (kvmclock_offset variable) and the TSCs, which would be double counting. So you have to either add the elapsed realtime (1) between KVM_GET_CLOCK to kvmclock (which this patch is doing), or to the TSCs. If you do both, there is double counting. Am i missing something? Probably one of these two (but it's worth pointing out both of them): 1) the attribute that's introduced here *replaces* KVM_SET_MSR(MSR_IA32_TSC), so the TSC is not added. 2) the adjustment formula later in the algorithm does not care about how much time passed between step 1 and step 4. It just takes two well known (TSC, kvmclock) pairs, and uses them to ensure the guest TSC is the same on the destination as if the guest was still running on the source. It is irrelevant that one of them is before migration and one is after, all it matters is that one is on the source and one is on the destination. Perhaps we can add to step 6 something like: +6. Adjust the guest TSC offsets for every vCPU to account for (1) time + elapsed since recording state and (2) difference in TSCs between the + source and destination machine: + + new_off_n = t_0 + off_n + (k_1 - k_0) * freq - t_1 + "off + t - k * freq" is the guest TSC value corresponding to a time of 0 in kvmclock. The above formula ensures that it is the same on the destination as it was on the source. Also, the names are a bit hard to follow. Perhaps t_0 tsc_src t_1 tsc_dest k_0 guest_src k_1 guest_dest r_0 host_src off_n ofs_src[i] new_off_n ofs_dest[i] Paolo ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm64: Allow KVM to be disabled from the command line
On Fri, 01 Oct 2021 10:27:18 +0100, Suzuki K Poulose wrote: > > On 30/09/2021 11:29, Marc Zyngier wrote: > > On Wed, 29 Sep 2021 11:35:46 +0100, > > Suzuki K Poulose wrote: > >> > >>> + if (strcmp(arg, "none") == 0 && !WARN_ON(is_kernel_in_hyp_mode())) { > >> > >> nit: Does this really need to WARN here ? Unlike the "nvhe" case, if the > >> user wants to keep the KVM out of the picture for, say debugging > >> something, it is perfectly Ok to allow the kernel to be running at EL2 > >> without having to change the Firmware to alter the landing EL for the > >> kernel ? > > > > Well, the doc says "run in nVHE mode" and the option forces > > id_aa64mmfr1.vh=0. The WARN_ON() will only fires on broken^Wfruity HW > > that is VHE only. Note that this doesn't rely on any firmware change > > (we drop from EL2 to EL1 and stay there). > > Ah, ok. So the "none" is in fact "nvhe + no-kvm". Thats the bit I > missed. TBH, that name to me sounds like "no KVM" at all, which is what > we want. The question is, do we really need "none" to force vh == 0 ? I > understand this is only a problem on a rare set of HWs. But the generic > option looks deceiving. > > That said, I am happy to leave this as is and the doc says so. I think you have a point here. Conflating the two things is a bit odd, and we might as well let the user pick the configuration they want (they can always pass 'id_aa64mmfr1.vh=0' themselves). I'll respin the patch with this change. Thanks, M. -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v8 4/7] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
On 01/10/21 16:39, Paolo Bonzini wrote: On 16/09/21 20:15, Oliver Upton wrote: + if (data.flags & ~KVM_CLOCK_REALTIME) return -EINVAL; Let's accept KVM_CLOCK_HOST_TSC here even though it's not used; there may be programs that expect to send back to KVM_SET_CLOCK whatever they got from KVM_GET_CLOCK. Nevermind, KVM_SET_CLOCK is already rejecting KVM_CLOCK_TSC_STABLE so no need to do that! Paolo ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v8 4/7] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
On 16/09/21 20:15, Oliver Upton wrote: + if (data.flags & ~KVM_CLOCK_REALTIME) return -EINVAL; Let's accept KVM_CLOCK_HOST_TSC here even though it's not used; there may be programs that expect to send back to KVM_SET_CLOCK whatever they got from KVM_GET_CLOCK. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v8 4/7] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
On 01/10/21 01:02, Thomas Gleixner wrote: Now the proposed change is creating exactly the same problem: + if (data.flags & KVM_CLOCK_REALTIME) { + u64 now_real_ns = ktime_get_real_ns(); + + /* + * Avoid stepping the kvmclock backwards. + */ + if (now_real_ns > data.realtime) + data.clock += now_real_ns - data.realtime; + } Indeed, though it's opt-in (you can always not pass KVM_CLOCK_REALTIME and then the kernel will not muck with the value you gave it). virt came along and created a hard to solve circular dependency problem: - If CLOCK_MONOTONIC stops for too long then NTP/PTP gets out of sync, but everything else is happy. - If CLOCK_MONOTONIC jumps too far forward, then all hell breaks lose, but NTP/PTP is happy. Yes, I agree that this sums it up. For example QEMU (meaning: Marcelo :)) has gone for the former and "hoping" that NTP/PTP sorts it out sooner or later. The clock in nanoseconds is sent out to the destination and restored. Google's userspace instead went for the latter. The reason is that they've always started running on the destination before finishing the memory copy[1], therefore it's much easier to bound the CLOCK_MONOTONIC jump. I do like very much the cooperative S2IDLE or even S3 way to handle the brownout during live migration. However if your stopping time is bounded, these patches are nice because, on current processors that have TSC scaling, they make it possible to keep the illusion of the TSC running. Of course that's a big "if"; however, you can always bound the stopping time by aborting the restart on the destination machine once you get close enough to the limit. Paolo [1] see https://dl.acm.org/doi/pdf/10.1145/3296975.3186415, figure 3 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 06/11] KVM: arm64: Add support for SYSTEM_SUSPEND PSCI call
On Thu, 30 Sep 2021 18:40:43 +0100, Oliver Upton wrote: > > On Thu, Sep 30, 2021 at 5:29 AM Marc Zyngier wrote: > > > > I think there is an additional feature we need, which is to give > > control back to userspace every time a wake-up condition occurs (I did > > touch on that in [1]). This would give the opportunity to the VMM to > > do whatever it needs to perform. > > > > A typical use case would be to implement wake-up from certain > > interrupts only (mask non-wake-up IRQs on suspend request, return to > > the guest for WFI, wake-up returns to the VMM to restore the previous > > interrupt configuration, resume). > > > > Userspace would be free to clear the suspend state and resume the > > guest, or to reenter WFI. > > Yeah, this is definitely needed for the series. > > Just to make sure it's clear, what should the default behavior be if > userspace doesn't touch state at all and it calls KVM_RUN again? It > would seem to me that we should resume the guest by default, much like > we would do for the SUSPEND event exit. My mental model is that the suspend state is "sticky". Once set, it stays and the guest doesn't execute any instruction until cleared by the VMM. It has the advantage that the VMM always knows the state the vcpu is in, because that's what it asked for, and the vcpu can't change state on its own. It means that the VMM would have to set the vcpu state to 'RUNNABLE' once it is satisfied that it can be run. Thanks, M. -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 05/11] KVM: arm64: Defer WFI emulation as a requested event
On Thu, 30 Sep 2021 18:09:07 +0100, Sean Christopherson wrote: > > On Thu, Sep 30, 2021, Marc Zyngier wrote: > > On Thu, 23 Sep 2021 20:16:04 +0100, Oliver Upton wrote: > > > @@ -681,6 +687,9 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu) > > > if (kvm_check_request(KVM_REQ_SLEEP, vcpu)) > > > kvm_vcpu_sleep(vcpu); > > > > > > + if (kvm_check_request(KVM_REQ_SUSPEND, vcpu)) > > > + kvm_vcpu_suspend(vcpu); > > > + > > ... > > > > diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c > > > index 275a27368a04..5e5ef9ff4fba 100644 > > > --- a/arch/arm64/kvm/handle_exit.c > > > +++ b/arch/arm64/kvm/handle_exit.c > > > @@ -95,8 +95,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu) > > > } else { > > > trace_kvm_wfx_arm64(*vcpu_pc(vcpu), false); > > > vcpu->stat.wfi_exit_stat++; > > > - kvm_vcpu_block(vcpu); > > > - kvm_clear_request(KVM_REQ_UNHALT, vcpu); > > > + kvm_make_request(KVM_REQ_SUSPEND, vcpu); > > > } > > > > > > kvm_incr_pc(vcpu); > > > > This is a change in behaviour. At the point where the blocking > > happens, PC will have already been incremented. I'd rather you don't > > do that. Instead, make the helper available and call into it directly, > > preserving the current semantics. > > Is there architectural behavior that KVM can emulate? E.g. if you > were to probe a physical CPU while it's waiting, would you observe > the pre-WFI PC, or the post-WFI PC? Following arch behavior would > be ideal because it eliminates subjectivity. The architecture doesn't really say (that's one of the many IMPDEF behaviours). However, there are at least some extensions (such as statistical profiling) that do require the PC to be accurately recorded together with the effects of the instructions at that PC. If an implementation was to pre-increment the PC, the corresponding trace would be inaccurate. > Regardless of the architectural behavior, changing KVM's behavior > should be done explicitly in a separate patch. > > Irrespective of PC behavior, I would caution against using a request > for handling WFI. Deferring the WFI opens up the possibility for > all sorts of ordering oddities, e.g. if KVM exits to userspace > between here and check_vcpu_requests(), then KVM can end up with a > "spurious" pending KVM_REQ_SUSPEND if maniupaltes vCPU state. I > highly doubt that userspace VMMs would actually do that, as it would > basically require a signal from userspace, but it's not impossible, > and at the very least the pending request is yet another thing to > worry about in the future. +1. It really isn't worth the complexity. Thanks, M. -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3 02/10] KVM: arm64: vgic-v3: Check redist region is not above the VM IPA size
On Tue, 28 Sep 2021 19:47:56 +0100, Ricardo Koller wrote: > > Verify that the redistributor regions do not extend beyond the > VM-specified IPA range (phys_size). This can happen when using > KVM_VGIC_V3_ADDR_TYPE_REDIST or KVM_VGIC_V3_ADDR_TYPE_REDIST_REGIONS > with: > > base + size > phys_size AND base < phys_size > > Add the missing check into vgic_v3_alloc_redist_region() which is called > when setting the regions, and into vgic_v3_check_base() which is called > when attempting the first vcpu-run. The vcpu-run check does not apply to > KVM_VGIC_V3_ADDR_TYPE_REDIST_REGIONS because the regions size is known > before the first vcpu-run. Note that using the REDIST_REGIONS API > results in a different check, which already exists, at first vcpu run: > that the number of redist regions is enough for all vcpus. > > Finally, this patch also enables some extra tests in > vgic_v3_alloc_redist_region() by calculating "size" early for the legacy > redist api: like checking that the REDIST region can fit all the already > created vcpus. > > Signed-off-by: Ricardo Koller > --- > arch/arm64/kvm/vgic/vgic-mmio-v3.c | 6 -- > arch/arm64/kvm/vgic/vgic-v3.c | 4 > 2 files changed, 8 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c > b/arch/arm64/kvm/vgic/vgic-mmio-v3.c > index a09cdc0b953c..9be02bf7865e 100644 > --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c > +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c > @@ -796,7 +796,9 @@ static int vgic_v3_alloc_redist_region(struct kvm *kvm, > uint32_t index, > struct vgic_dist *d = &kvm->arch.vgic; > struct vgic_redist_region *rdreg; > struct list_head *rd_regions = &d->rd_regions; > - size_t size = count * KVM_VGIC_V3_REDIST_SIZE; > + int nr_vcpus = atomic_read(&kvm->online_vcpus); > + size_t size = count ? count * KVM_VGIC_V3_REDIST_SIZE > + : nr_vcpus * KVM_VGIC_V3_REDIST_SIZE; > int ret; > > /* cross the end of memory ? */ > @@ -840,7 +842,7 @@ static int vgic_v3_alloc_redist_region(struct kvm *kvm, > uint32_t index, > > rdreg->base = VGIC_ADDR_UNDEF; > > - ret = vgic_check_ioaddr(kvm, &rdreg->base, base, SZ_64K); > + ret = vgic_check_iorange(kvm, &rdreg->base, base, SZ_64K, size); > if (ret) > goto free; > > diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c > index 21a6207fb2ee..27ee674631b3 100644 > --- a/arch/arm64/kvm/vgic/vgic-v3.c > +++ b/arch/arm64/kvm/vgic/vgic-v3.c > @@ -486,6 +486,10 @@ bool vgic_v3_check_base(struct kvm *kvm) > if (rdreg->base + vgic_v3_rd_region_size(kvm, rdreg) < > rdreg->base) > return false; > + > + if (rdreg->base + vgic_v3_rd_region_size(kvm, rdreg) > > + kvm_phys_size(kvm)) > + return false; Why can't we replace these two checks with a single call to your new fancy helper? Thanks, M. -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3 01/10] kvm: arm64: vgic: Introduce vgic_check_iorange
On Thu, 30 Sep 2021 22:19:16 +0100, Ricardo Koller wrote: > > On Thu, Sep 30, 2021 at 09:02:12AM +0200, Eric Auger wrote: > > Hi, > > > > On 9/29/21 11:17 PM, Ricardo Koller wrote: > > > On Wed, Sep 29, 2021 at 06:29:21PM +0200, Eric Auger wrote: > > >> Hi Ricardo, > > >> > > >> On 9/28/21 8:47 PM, Ricardo Koller wrote: > > >>> Add the new vgic_check_iorange helper that checks that an iorange is > > >>> sane: the start address and size have valid alignments, the range is > > >>> within the addressable PA range, start+size doesn't overflow, and the > > >>> start wasn't already defined. > > >>> > > >>> No functional change. > > >>> > > >>> Signed-off-by: Ricardo Koller > > >>> --- > > >>> arch/arm64/kvm/vgic/vgic-kvm-device.c | 22 ++ > > >>> arch/arm64/kvm/vgic/vgic.h| 4 > > >>> 2 files changed, 26 insertions(+) > > >>> > > >>> diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c > > >>> b/arch/arm64/kvm/vgic/vgic-kvm-device.c > > >>> index 7740995de982..f714aded67b2 100644 > > >>> --- a/arch/arm64/kvm/vgic/vgic-kvm-device.c > > >>> +++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c > > >>> @@ -29,6 +29,28 @@ int vgic_check_ioaddr(struct kvm *kvm, phys_addr_t > > >>> *ioaddr, > > >>> return 0; > > >>> } > > >>> > > >>> +int vgic_check_iorange(struct kvm *kvm, phys_addr_t *ioaddr, > > >>> + phys_addr_t addr, phys_addr_t alignment, > > >>> + phys_addr_t size) > > >>> +{ > > >>> + int ret; > > >>> + > > >>> + ret = vgic_check_ioaddr(kvm, ioaddr, addr, alignment); > > >> nit: not related to this patch but I am just wondering why we are > > >> passing phys_addr_t *ioaddr downto vgic_check_ioaddr and thus to > > >> > > >> vgic_check_iorange()? This must be a leftover of some old code? > > >> > > > It's used to check that the base of a region is not already set. > > > kvm_vgic_addr() uses it to make that check; > > > vgic_v3_alloc_redist_region() does not: > > > > > > rdreg->base = VGIC_ADDR_UNDEF; // so the "not already defined" check > > > passes > > > ret = vgic_check_ioaddr(kvm, &rdreg->base, base, SZ_64K); > > Yes but I meant why a pointer? > > I can't think of any good reason. It must be some leftover as you said. It definitely is. Please have a patch to fix that. Also, it doesn't look like vgic_check_ioaddr() has any other user at the end of the series. Worth getting rid of. M. -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v8 4/7] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
On Fri, Oct 01, 2021 at 09:05:27AM -0300, Marcelo Tosatti wrote: > On Fri, Oct 01, 2021 at 01:02:23AM +0200, Thomas Gleixner wrote: > > Marcelo, > > > > On Thu, Sep 30 2021 at 16:21, Marcelo Tosatti wrote: > > > On Wed, Sep 29, 2021 at 03:56:29PM -0300, Marcelo Tosatti wrote: > > >> On Thu, Sep 16, 2021 at 06:15:35PM +, Oliver Upton wrote: > > >> > > >> Thomas, CC'ed, has deeper understanding of problems with > > >> forward time jumps than I do. Thomas, any comments? > > > > > > Based on the earlier discussion about the problems of synchronizing > > > the guests clock via a notification to the NTP/Chrony daemon > > > (where there is a window where applications can read the stale > > > value of the clock), a possible solution would be triggering > > > an NMI on the destination (so that it runs ASAP, with higher > > > priority than application/kernel). > > > > > > What would this NMI do, exactly? > > > > Nothing. You cannot do anything time related in an NMI. > > > > You might queue irq work which handles that, but that would still not > > prevent user space or kernel space from observing the stale time stamp > > depending on the execution state from where it resumes. > > Yes. > > > >> As a note: this makes it not OK to use KVM_CLOCK_REALTIME flag > > >> for either vm pause / vm resume (well, if paused for long periods of > > >> time) > > >> or savevm / restorevm. > > > > > > Maybe with the NMI above, it would be possible to use > > > the realtime clock as a way to know time elapsed between > > > events and advance guest clock without the current > > > problematic window. > > > > As much duct tape you throw at the problem, it cannot be solved ever > > because it's fundamentally broken. All you can do is to make the > > observation windows smaller, but that's just curing the symptom. > > Yes. > > > The problem is that the guest is paused/resumed without getting any > > information about that and the execution of the guest is stopped at an > > arbitrary instruction boundary from which it resumes after migration or > > restore. So there is no way to guarantee that after resume all vCPUs are > > executing in a state which can handle that. > > > > But even if that would be the case, then what prevents the stale time > > stamps to be visible? Nothing: > > > > T0:t = now(); > > -> pause > > -> resume > > -> magic "fixup" > > T1:dostuff(t); > > Yes. > > BTW, you could have a userspace notification (then applications > could handle this if desired). > > > But that's not a fundamental problem because every preemptible or > > interruptible code has the same issue: > > > > T0:t = now(); > > -> preemption or interrupt > > T1:dostuff(t); > > > > Which is usually not a problem, but It becomes a problem when T1 - T0 is > > greater than the usual expectations which can obviously be trivially > > achieved by guest migration or a savevm, restorevm cycle. > > > > But let's go a step back and look at the clocks and their expectations: > > > > CLOCK_MONOTONIC: > > > > Monotonically increasing clock which counts unless the system > > is in suspend. On resume it continues counting without jumping > > forward. > > > > That's the reference clock for everything else and therefore it > > is important that it does _not_ jump around. > > > > The reasons why CLOCK_MONOTONIC stops during suspend is > > historical and any attempt to change that breaks the world and > > some more because making it jump forward will trigger all sorts > > of timeouts, watchdogs and whatever. The last attempt to make > > CLOCK_MONOTONIC behave like CLOCK_BOOTTIME was reverted within 3 > > weeks. It's not going to be attempted again. See a3ed0e4393d6 > > ("Revert: Unify CLOCK_MONOTONIC and CLOCK_BOOTTIME") for > > details. > > > > Now the proposed change is creating exactly the same problem: > > > > >> > +if (data.flags & KVM_CLOCK_REALTIME) { > > >> > +u64 now_real_ns = ktime_get_real_ns(); > > >> > + > > >> > +/* > > >> > + * Avoid stepping the kvmclock backwards. > > >> > + */ > > >> > +if (now_real_ns > data.realtime) > > >> > +data.clock += now_real_ns - data.realtime; > > >> > +} > > > > IOW, it takes the time between pause and resume into account and > > forwards the underlying base clock which makes CLOCK_MONOTONIC > > jump forward by exactly that amount of time. > > Well, it is assuming that the > > T0:t = now(); > T1:pause vm() > T2: finish vm migration() > T3:dostuff(t); > > Interval between T1 and T2 is small (and that the guest > clocks are synchronized up to a given boundary). > > But i suppose adding a limit to the forward clock advance > (in the migration case) is useful: > > 1) If migration (well actually, only the final steps > to finish migration, the time between when guest is pause
Re: [PATCH v8 4/7] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
On Fri, Oct 01, 2021 at 01:02:23AM +0200, Thomas Gleixner wrote: > Marcelo, > > On Thu, Sep 30 2021 at 16:21, Marcelo Tosatti wrote: > > On Wed, Sep 29, 2021 at 03:56:29PM -0300, Marcelo Tosatti wrote: > >> On Thu, Sep 16, 2021 at 06:15:35PM +, Oliver Upton wrote: > >> > >> Thomas, CC'ed, has deeper understanding of problems with > >> forward time jumps than I do. Thomas, any comments? > > > > Based on the earlier discussion about the problems of synchronizing > > the guests clock via a notification to the NTP/Chrony daemon > > (where there is a window where applications can read the stale > > value of the clock), a possible solution would be triggering > > an NMI on the destination (so that it runs ASAP, with higher > > priority than application/kernel). > > > > What would this NMI do, exactly? > > Nothing. You cannot do anything time related in an NMI. > > You might queue irq work which handles that, but that would still not > prevent user space or kernel space from observing the stale time stamp > depending on the execution state from where it resumes. Yes. > >> As a note: this makes it not OK to use KVM_CLOCK_REALTIME flag > >> for either vm pause / vm resume (well, if paused for long periods of time) > >> or savevm / restorevm. > > > > Maybe with the NMI above, it would be possible to use > > the realtime clock as a way to know time elapsed between > > events and advance guest clock without the current > > problematic window. > > As much duct tape you throw at the problem, it cannot be solved ever > because it's fundamentally broken. All you can do is to make the > observation windows smaller, but that's just curing the symptom. Yes. > The problem is that the guest is paused/resumed without getting any > information about that and the execution of the guest is stopped at an > arbitrary instruction boundary from which it resumes after migration or > restore. So there is no way to guarantee that after resume all vCPUs are > executing in a state which can handle that. > > But even if that would be the case, then what prevents the stale time > stamps to be visible? Nothing: > > T0:t = now(); > -> pause > -> resume > -> magic "fixup" > T1:dostuff(t); Yes. BTW, you could have a userspace notification (then applications could handle this if desired). > But that's not a fundamental problem because every preemptible or > interruptible code has the same issue: > > T0:t = now(); > -> preemption or interrupt > T1:dostuff(t); > > Which is usually not a problem, but It becomes a problem when T1 - T0 is > greater than the usual expectations which can obviously be trivially > achieved by guest migration or a savevm, restorevm cycle. > > But let's go a step back and look at the clocks and their expectations: > > CLOCK_MONOTONIC: > > Monotonically increasing clock which counts unless the system > is in suspend. On resume it continues counting without jumping > forward. > > That's the reference clock for everything else and therefore it > is important that it does _not_ jump around. > > The reasons why CLOCK_MONOTONIC stops during suspend is > historical and any attempt to change that breaks the world and > some more because making it jump forward will trigger all sorts > of timeouts, watchdogs and whatever. The last attempt to make > CLOCK_MONOTONIC behave like CLOCK_BOOTTIME was reverted within 3 > weeks. It's not going to be attempted again. See a3ed0e4393d6 > ("Revert: Unify CLOCK_MONOTONIC and CLOCK_BOOTTIME") for > details. > > Now the proposed change is creating exactly the same problem: > > >> > + if (data.flags & KVM_CLOCK_REALTIME) { > >> > + u64 now_real_ns = ktime_get_real_ns(); > >> > + > >> > + /* > >> > + * Avoid stepping the kvmclock backwards. > >> > + */ > >> > + if (now_real_ns > data.realtime) > >> > + data.clock += now_real_ns - data.realtime; > >> > + } > > IOW, it takes the time between pause and resume into account and > forwards the underlying base clock which makes CLOCK_MONOTONIC > jump forward by exactly that amount of time. Well, it is assuming that the T0:t = now(); T1:pause vm() T2:finish vm migration() T3:dostuff(t); Interval between T1 and T2 is small (and that the guest clocks are synchronized up to a given boundary). But i suppose adding a limit to the forward clock advance (in the migration case) is useful: 1) If migration (well actually, only the final steps to finish migration, the time between when guest is paused on source and is resumed on destination) takes too long, then too bad: fix it to be shorter if you want the clocks to have close to zero change to realtime on migration. 2) Avoid the other bugs in case of large forward advance. Maybe having it configurab
Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
On Thu, 30 Sep 2021 18:24:23 +0100, Oliver Upton wrote: > > Hey Marc, > > On Thu, Sep 30, 2021 at 12:32 AM Marc Zyngier wrote: > > > > Hi Oliver, > > > > On Wed, 29 Sep 2021 19:22:05 +0100, > > Oliver Upton wrote: > > > > > > I have some lingering thoughts on this subject since we last spoke and > > > wanted to discuss. > > > > > > I'm having a hard time figuring out how a VMM should handle a new > > > hypercall identity register introduced on a newer kernel. In order to > > > maintain guest ABI, the VMM would need to know about that register and > > > zero it when restoring an older guest. > > > > Just as it would need to be able to discover any new system register > > exposed by default, as it happens at all times. Which is why we have a > > way to discover all the registers, architected or not. > > > > > Perhaps instead we could reserve a range of firmware registers as the > > > 'hypercall identity' registers. Implement all of them as RAZ/WI by > > > default, encouraging userspace to zero these registers away for older > > > VMs but still allowing an old userspace to pick up new KVM features. > > > Doing so would align the hypercall identity registers with the feature > > > ID registers from the architecture. > > > > The range already exists in the form of the "coprocessor" 0x14. I > > don't see the need to expose it as RAZ/WI, however. If userspace > > doesn't know about how this pseudo-register works, it won't be able to > > program it anyway. > > > > I don't buy the parallel with the ID-regs either. They are RAZ/WI by > > default so that they don't UNDEF at runtime. The meaning of a RAZ > > id-register is also well defined (feature not implemented), and the > > CPU cannot write to them. In a way, the ID-regs *are* the enumeration > > mechanism. > > > > Our firmware registers don't follow the same rules. Userspace can > > write to them, and there is no such "not implemented" rule (case in > > point, PSCI). We also have a separate enumeration mechanism > > (GET_ONE_REG), which is (more or less) designed for userspace to find > > what is implemented. > > > > For these reasons, I don't immediately see the point of advertising a > > set of registers ahead of time, before userspace grows an > > understanding of what these registers mean. > > Supposing we don't preallocate some hypercall ID registers, how can we > safely migrate a guest from an older kernel to newer kernel? Ideally, > we would preserve the hypercall feature set across the migration which > could work for a while with the first set of registers that get > defined, but whenever a new hypercall firmware register comes along > then the VMM will be clueless to the new ABI. My expectations were that whenever userspace writes a set of firmware register, all the defaults are invalidated. For example say that host-A know about a single hypercall register, while host-B knows about two. Upon writing to the first register, the host would clear the set of available services in the second one. If userspace eventually writes there, the value would stick if valid. Also, remember that pseudo-registers don't have to be 64bit. We could define a new class of hypercall-specific registers that would be much wider, and thus have a larger write granularity (KVM supports anything from 8 to 2048 bits). This would make it pretty easy to implement the above. > Fundamentally, I don't think userspace should need a patch to preserve > ABI on a newer kernel. Despite that, it would seem that userspace will > need to learn of any firmware registers that control hypercall > features which come after the initial set that gets proposed. If > KVM_GET_REG_LIST were to disambiguate between ID registers (hypercall, > architectural feature ID registers) from other parts of the vCPU > state, it would be clear to what registers to zero on a newer kernel. > Apologies if it is distracting to mention the feature ID registers > here, but both are on my mind currently and want to make sure there is > some consistency in how features get handled on newer kernels, > architected or not. The problem I see is that we will always need to grow the HC register space one way or another, no matter how many we reserve. Your approach only works if we don't exceed that particular range. Maybe it will never be a problem, but it remains that this is not scalable. If we wanted to be safe, we'd reserve the whole of the possible space as defined by the SMCCC spec. Given that we currently have two HC spaces (the ARM-controlled one, and the KVM-specific one), the function space being 16bits in each case, that's 16kB worth of zeroes that userspace has to save/restore at all times... I'm not overly enthusiastic. Thanks, M. -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v8 7/7] KVM: x86: Expose TSC offset controls to userspace
On Fri, Oct 01, 2021 at 11:17:33AM +0200, Paolo Bonzini wrote: > On 30/09/21 21:14, Marcelo Tosatti wrote: > > > + new_off_n = t_0 + off_n + (k_1 - k_0) * freq - t_1 > > Hi Oliver, > > > > This won't advance the TSC values themselves, right? > > Why not? It affects the TSC offset in the vmcs, so the TSC in the VM is > advanced too. > > Paolo +4. Invoke the KVM_SET_CLOCK ioctl, providing the kvmclock nanoseconds + (k_0) and realtime nanoseconds (r_0) in their respective fields. + Ensure that the KVM_CLOCK_REALTIME flag is set in the provided + structure. KVM will advance the VM's kvmclock to account for elapsed + time since recording the clock values. You can't advance both kvmclock (kvmclock_offset variable) and the TSCs, which would be double counting. So you have to either add the elapsed realtime (1) between KVM_GET_CLOCK to kvmclock (which this patch is doing), or to the TSCs. If you do both, there is double counting. Am i missing something? To make it clearer: TSC clocksource is faster than kvmclock source, so we'd rather use when possible, which is achievable with TSC scaling support on HW. 1: As mentioned earlier, just using the realtime clock delta between hosts can introduce problems. So need a scheme to: - Find the offset between host clocks, with upper and lower bounds on error. - Take appropriate actions based on that (for example, do not use KVM_CLOCK_REALTIME flag on KVM_SET_CLOCK if the delta between hosts is large). Which can be done in userspace or kernel space... (hum, but maybe delegating this to userspace will introduce different solutions of the same problem?). > > This (advancing the TSC values by the realtime elapsed time) would be > > awesome because TSC clock_gettime() vdso is faster, and some > > applications prefer to just read from TSC directly. > > See "x86: kvmguest: use TSC clocksource if invariant TSC is exposed". > > > > The advancement with this patchset only applies to kvmclock. > > > > ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 03/11] KVM: arm64: Encapsulate reset request logic in a helper function
On Thu, Sep 23, 2021 at 12:16 PM Oliver Upton wrote: > > In its implementation of the PSCI function, KVM needs to request that a > target vCPU resets before its next entry into the guest. Wrap the logic > for requesting a reset in a function for later use by other implemented > PSCI calls. > > No functional change intended. > > Signed-off-by: Oliver Upton > --- > arch/arm64/kvm/psci.c | 59 +-- > 1 file changed, 35 insertions(+), 24 deletions(-) > > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c > index 310b9cb2b32b..bb59b692998b 100644 > --- a/arch/arm64/kvm/psci.c > +++ b/arch/arm64/kvm/psci.c > @@ -64,9 +64,40 @@ static inline bool kvm_psci_valid_affinity(unsigned long > affinity) > return !(affinity & ~MPIDR_HWID_BITMASK); > } > > -static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu) > +static void kvm_psci_vcpu_request_reset(struct kvm_vcpu *vcpu, > + unsigned long entry_addr, > + unsigned long context_id, > + bool big_endian) > { > struct vcpu_reset_state *reset_state; > + > + lockdep_assert_held(&vcpu->kvm->lock); > + > + reset_state = &vcpu->arch.reset_state; > + reset_state->pc = entry_addr; > + > + /* Propagate caller endianness */ > + reset_state->be = big_endian; > + > + /* > +* NOTE: We always update r0 (or x0) because for PSCI v0.1 > +* the general purpose registers are undefined upon CPU_ON. > +*/ > + reset_state->r0 = context_id; > + > + WRITE_ONCE(reset_state->reset, true); > + kvm_make_request(KVM_REQ_VCPU_RESET, vcpu); > + > + /* > +* Make sure the reset request is observed if the change to > +* power_state is observed. > +*/ > + smp_wmb(); > + vcpu->arch.power_off = false; > +} > + > +static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu) > +{ > struct kvm *kvm = source_vcpu->kvm; > struct kvm_vcpu *vcpu = NULL; > unsigned long cpu_id; > @@ -90,29 +121,9 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu > *source_vcpu) > return PSCI_RET_INVALID_PARAMS; > } > > - reset_state = &vcpu->arch.reset_state; > - > - reset_state->pc = smccc_get_arg2(source_vcpu); > - > - /* Propagate caller endianness */ > - reset_state->be = kvm_vcpu_is_be(source_vcpu); > - > - /* > -* NOTE: We always update r0 (or x0) because for PSCI v0.1 > -* the general purpose registers are undefined upon CPU_ON. > -*/ > - reset_state->r0 = smccc_get_arg3(source_vcpu); > - > - WRITE_ONCE(reset_state->reset, true); > - kvm_make_request(KVM_REQ_VCPU_RESET, vcpu); > - > - /* > -* Make sure the reset request is observed if the change to > -* power_state is observed. > -*/ > - smp_wmb(); > - > - vcpu->arch.power_off = false; > + kvm_psci_vcpu_request_reset(vcpu, smccc_get_arg2(source_vcpu), > + smccc_get_arg3(source_vcpu), > + kvm_vcpu_is_be(source_vcpu)); > kvm_vcpu_wake_up(vcpu); > > return PSCI_RET_SUCCESS; > -- > 2.33.0.685.g46640cef36-goog Reviewed-by: Reiji Watanabe Not directly related to the patch, but the (original) code doesn't do any sanity checking for the entry address although the PSCI spec says: "INVALID_ADDRESS is returned when the entry point address is known by the implementation to be invalid, because it is in a range that is known not to be available to the caller." Thanks, Reiji ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm64: Allow KVM to be disabled from the command line
On 30/09/2021 11:29, Marc Zyngier wrote: On Wed, 29 Sep 2021 11:35:46 +0100, Suzuki K Poulose wrote: On 03/09/2021 10:16, Marc Zyngier wrote: Although KVM can be compiled out of the kernel, it cannot be disabled at runtime. Allow this possibility by introducing a new mode that will prevent KVM from initialising. This is useful in the (limited) circumstances where you don't want KVM to be available (what is wrong with you?), or when you want to install another hypervisor instead (good luck with that). Signed-off-by: Marc Zyngier --- Documentation/admin-guide/kernel-parameters.txt | 3 +++ arch/arm64/include/asm/kvm_host.h | 1 + arch/arm64/kernel/idreg-override.c | 1 + arch/arm64/kvm/arm.c| 14 +- 4 files changed, 18 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 91ba391f9b32..cc5f68846434 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2365,6 +2365,9 @@ kvm-arm.mode= [KVM,ARM] Select one of KVM/arm64's modes of operation. +none: Forcefully disable KVM and run in nVHE mode, + preventing KVM from ever initialising. + nvhe: Standard nVHE-based mode, without support for protected guests. diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index f8be56d5342b..019490c67976 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -58,6 +58,7 @@ enum kvm_mode { KVM_MODE_DEFAULT, KVM_MODE_PROTECTED, + KVM_MODE_NONE, }; enum kvm_mode kvm_get_mode(void); diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c index d8e606fe3c21..57013c1b6552 100644 --- a/arch/arm64/kernel/idreg-override.c +++ b/arch/arm64/kernel/idreg-override.c @@ -95,6 +95,7 @@ static const struct { charalias[FTR_ALIAS_NAME_LEN]; charfeature[FTR_ALIAS_OPTION_LEN]; } aliases[] __initconst = { + { "kvm-arm.mode=none","id_aa64mmfr1.vh=0" }, { "kvm-arm.mode=nvhe","id_aa64mmfr1.vh=0" }, { "kvm-arm.mode=protected", "id_aa64mmfr1.vh=0" }, { "arm64.nobti", "id_aa64pfr1.bt=0" }, diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index fe102cd2e518..cdc70e238316 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -2064,6 +2064,11 @@ int kvm_arch_init(void *opaque) return -ENODEV; } +if (kvm_get_mode() == KVM_MODE_NONE) { + kvm_info("KVM disabled from command line\n"); + return -ENODEV; + } + in_hyp_mode = is_kernel_in_hyp_mode(); if (cpus_have_final_cap(ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE) || @@ -2137,8 +2142,15 @@ static int __init early_kvm_mode_cfg(char *arg) return 0; } -if (strcmp(arg, "nvhe") == 0 && !WARN_ON(is_kernel_in_hyp_mode())) + if (strcmp(arg, "nvhe") == 0 && !WARN_ON(is_kernel_in_hyp_mode())) { + kvm_mode = KVM_MODE_DEFAULT; return 0; + } + + if (strcmp(arg, "none") == 0 && !WARN_ON(is_kernel_in_hyp_mode())) { nit: Does this really need to WARN here ? Unlike the "nvhe" case, if the user wants to keep the KVM out of the picture for, say debugging something, it is perfectly Ok to allow the kernel to be running at EL2 without having to change the Firmware to alter the landing EL for the kernel ? Well, the doc says "run in nVHE mode" and the option forces id_aa64mmfr1.vh=0. The WARN_ON() will only fires on broken^Wfruity HW that is VHE only. Note that this doesn't rely on any firmware change (we drop from EL2 to EL1 and stay there). Ah, ok. So the "none" is in fact "nvhe + no-kvm". Thats the bit I missed. TBH, that name to me sounds like "no KVM" at all, which is what we want. The question is, do we really need "none" to force vh == 0 ? I understand this is only a problem on a rare set of HWs. But the generic option looks deceiving. That said, I am happy to leave this as is and the doc says so. We could add another option (none-vhe?) that stays at EL2 and still disables KVM if there is an appetite for it. Na. Don't think that is necessary. Otherwise, Acked-by: Suzuki K Poulose Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v8 7/7] KVM: x86: Expose TSC offset controls to userspace
On 30/09/21 21:14, Marcelo Tosatti wrote: + new_off_n = t_0 + off_n + (k_1 - k_0) * freq - t_1 Hi Oliver, This won't advance the TSC values themselves, right? Why not? It affects the TSC offset in the vmcs, so the TSC in the VM is advanced too. Paolo This (advancing the TSC values by the realtime elapsed time) would be awesome because TSC clock_gettime() vdso is faster, and some applications prefer to just read from TSC directly. See "x86: kvmguest: use TSC clocksource if invariant TSC is exposed". The advancement with this patchset only applies to kvmclock. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 02/11] KVM: arm64: Clean up SMC64 PSCI filtering for AArch32 guests
On Thu, Sep 23, 2021 at 12:16 PM Oliver Upton wrote: > > The only valid calling SMC calling convention from an AArch32 state is > SMC32. Disallow any PSCI function that sets the SMC64 function ID bit > when called from AArch32 rather than comparing against known SMC64 PSCI > functions. > > Signed-off-by: Oliver Upton Looks nice. Reviewed-by: Reiji Watanabe ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 01/11] KVM: arm64: Drop unused vcpu param to kvm_psci_valid_affinity()
On Thu, Sep 23, 2021 at 12:16 PM Oliver Upton wrote: > > The helper function does not need a pointer to the vCPU, as it only > consults a constant mask; drop the unused vcpu parameter. > > Signed-off-by: Oliver Upton Reviewed-by: Reiji Watanabe ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm