Re: [please test] pvclock(4): fix several bugs
On Thu, Sep 08, 2022 at 10:17:12AM -0500, Scott Cheloha wrote: > > On Sep 8, 2022, at 9:05 AM, Mike Larkin wrote: > > [...] > > > > You could compile this and then objdump -D it and see for yourself... > > I can't make heads or tails of it. Please explain what I am looking > at and why it is, or is not, atomic. Or I guess I can just wing it. Okay, so this C code: volatile uint64_t pvclock_lastcount; /* [...] */ void pvclock_get_timecount(struct timecounter *tc) { uint64_t ctr, last; /* [...] */ do { last = pvclock_lastcount; if (ctr < last) return (last); } while (atomic_cas_64(_lastcount, last, ctr) != last); return (ctr); } ... yields this amd64 disassembly (from the linked bsd binary): 81de242b: 75 2b jne81de2458 81de242d: eb 01 jmp81de2430 81de242f: cc int3 81de2430: 48 8b 0d 91 fc 6c 00mov7142545(%rip),%rcx # 824b20c8 81de2437: 48 87 d1xchg %rdx,%rcx 81de243a: 48 39 d1cmp%rdx,%rcx 81de243d: 48 87 d1xchg %rdx,%rcx 81de2440: 72 13 jb 81de2455 81de2442: 48 89 c8mov%rcx,%rax 81de2445: f0 48 0f b1 15 7a fclock cmpxchg %rdx,7142522(%rip) # 824b20c8 81de244c: 6c 00 81de244e: 48 39 c8cmp%rcx,%rax 81de2451: 75 dd jne81de2430 81de2453: eb 03 jmp81de2458 81de2455: 48 8b d1mov%rcx,%rdx 81de2458: 89 d0 mov%edx,%eax 81de245a: 48 83 c4 08 add$0x8,%rsp 81de245e: 41 5e pop%r14 81de2460: c9 leaveq 81de2461: c3 retq ... and also yields this i386 disassembly (from the pvclock.o object): 2c7: 75 2e jne2f7 2c9: eb 05 jmp2d0 2cb: cc int3 2cc: cc int3 2cd: cc int3 2ce: cc int3 2cf: cc int3 2d0: 8b 15 04 00 00 00 mov0x4,%edx 2d6: a1 00 00 00 00 mov0x0,%eax 2db: 87 d8 xchg %ebx,%eax 2dd: 39 d8 cmp%ebx,%eax 2df: 87 d8 xchg %ebx,%eax 2e1: 89 f1 mov%esi,%ecx 2e3: 19 d1 sbb%edx,%ecx 2e5: 72 0e jb 2f5 2e7: 89 f1 mov%esi,%ecx 2e9: f0 0f c7 0d 00 00 00lock cmpxchg8b 0x0 2f0: 00 2f1: 75 dd jne2d0 2f3: eb 02 jmp2f7 2f5: 8b d8 mov%eax,%ebx 2f7: 89 d8 mov%ebx,%eax 2f9: 83 c4 1cadd$0x1c,%esp 2fc: 5e pop%esi 2fd: 5f pop%edi 2fe: 5b pop%ebx 2ff: 5d pop%ebp 300: c3 ret If we isolate the pvclock_lastcount loads, on amd64 we have: 81de2430: 48 8b 0d 91 fc 6c 00mov7142545(%rip),%rcx # 824b20c8 and on i386 we have: 2d0: 8b 15 04 00 00 00 mov0x4,%edx 2d6: a1 00 00 00 00 mov0x0,%eax so the 8-byte load is atomic on amd64 (one load) and non-atomic on i386 (two loads). I don't know what jsg@ meant when he said the ifdefs "seemed unnecessary", but near as I can tell they are necessary. I need an atomic 8-byte load and i386 can't, or won't, do it. So I guess we go back to my original patch. This resolves kettenis@'s atomic_cas_64() objections because we no longer need it. So, once again, the patch in brief: - Add rdtsc_lfence() to i386/include/cpufunc.h - Make pvclock_lastcount volatile uint64_t to fix the non-PVCLOCK_FLAG_TSC_STABLE case (see sub.). - Check for SSE2 support in pvclock_match(), we need it for LFENCE in pvclock_get_timecount(). - Do RDTSC as soon as possible in the lockless read loop to get a better timestamp. - Use rdtsc_lfence() instead of rdtsc() to get a better timestamp. - Check whether our TSC lags ti->ti_tsc_timestamp so we don't produce a bogus delta. - Fix the non-PVCLOCK_FLAG_TSC_STABLE case: + On amd64 we can do this with an atomic_cas_ulong(9) loop. We need to cast the pointer to (unsigned long *) or the compiler complains. This is safe because sizeof(long) equals sizeof(uint64_t) on amd64. + On i386 we need a mutex, so add pvclock_mtx. Set
Re: [please test] pvclock(4): fix several bugs
> On Sep 8, 2022, at 9:05 AM, Mike Larkin wrote: > > On Thu, Sep 08, 2022 at 08:32:27AM -0500, Scott Cheloha wrote: >> On Tue, Sep 06, 2022 at 03:30:44AM -0700, Mike Larkin wrote: >>> On Sun, Sep 04, 2022 at 02:50:10PM +1000, Jonathan Gray wrote: >>>> On Sat, Sep 03, 2022 at 05:33:01PM -0500, Scott Cheloha wrote: >>>>> On Sat, Sep 03, 2022 at 10:37:31PM +1000, Jonathan Gray wrote: >>>>>> On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote: >>>>>>>> On Sep 3, 2022, at 02:22, Jonathan Gray wrote: >>>>>>>> >>>>>>>> ???On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote: >>>>>>>>> dv@ suggested coming to the list to request testing for the pvclock(4) >>>>>>>>> driver. Attached is a patch that corrects several bugs. Most of >>>>>>>>> these changes will only matter in the non-TSC_STABLE case on a >>>>>>>>> multiprocessor VM. >>>>>>>>> >>>>>>>>> Ideally, nothing should break. >>>>>>>>> >>>>>>>>> - pvclock yields a 64-bit value. The BSD timecounter layer can only >>>>>>>>> use the lower 32 bits, but internally we need to track the full >>>>>>>>> 64-bit value to allow comparisons with the full value in the >>>>>>>>> non-TSC_STABLE case. So make pvclock_lastcount a 64-bit quantity. >>>>>>>>> >>>>>>>>> - In pvclock_get_timecount(), move rdtsc() up into the lockless read >>>>>>>>> loop to get a more accurate timestamp. >>>>>>>>> >>>>>>>>> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc(). >>>>>>>>> >>>>>>>>> - In pvclock_get_timecount(), check that our TSC value doesn't predate >>>>>>>>> ti->ti_tsc_timestamp, otherwise we will produce an enormous value. >>>>>>>>> >>>>>>>>> - In pvclock_get_timecount(), update pvclock_lastcount in the >>>>>>>>> non-TSC_STABLE case with more care. On amd64 we can do this with an >>>>>>>>> atomic_cas_ulong(9) loop because u_long is 64 bits. On i386 we need >>>>>>>>> to introduce a mutex to protect our comparison and read/write. >>>>>>>> >>>>>>>> i386 has cmpxchg8b, no need to disable interrupts >>>>>>>> the ifdefs seem excessive >>>>>>> >>>>>>> How do I make use of CMPXCHG8B on i386 >>>>>>> in this context? >>>>>>> >>>>>>> atomic_cas_ulong(9) is a 32-bit CAS on >>>>>>> i386. >>>>>> >>>>>> static inline uint64_t >>>>>> atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n) >>>>>> { >>>>>> return __sync_val_compare_and_swap(p, o, n); >>>>>> } >>>>>> >>>>>> Or md atomic.h files could have an equivalent. >>>>>> Not possible on all 32-bit archs. >>>>>> >>>>>>> >>>>>>> We can't use FP registers in the kernel, no? >>>>>> >>>>>> What do FP registers have to do with it? >>>>>> >>>>>>> >>>>>>> Am I missing some other avenue? >>>>>> >>>>>> There is no rdtsc_lfence() on i386. Initial diff doesn't build. >>>>> >>>>> LFENCE is an SSE2 extension. As is MFENCE. I don't think I can just >>>>> drop rdtsc_lfence() into cpufunc.h and proceed without causing some >>>>> kind of fault on an older CPU. >>>>> >>>>> What are my options on a 586-class CPU for forcing RDTSC to complete >>>>> before later instructions? >>>> >>>> "3.3.2. Serializing Operations >>>> After executing certain instructions the Pentium processor serializes >>>> instruction execution. This means that any modifications to flags, >>>> registers, and memory for previous instructions are completed before >>>> the next instruction is fetched and executed. The prefetch queue >>>> is flushed as a result of serializing operations. >>>> >>
Re: [please test] pvclock(4): fix several bugs
> On Sep 8, 2022, at 9:27 AM, Mark Kettenis wrote: > >> Date: Thu, 8 Sep 2022 08:32:27 -0500 >> From: Scott Cheloha >> >> On Tue, Sep 06, 2022 at 03:30:44AM -0700, Mike Larkin wrote: >>> On Sun, Sep 04, 2022 at 02:50:10PM +1000, Jonathan Gray wrote: >>>> On Sat, Sep 03, 2022 at 05:33:01PM -0500, Scott Cheloha wrote: >>>>> On Sat, Sep 03, 2022 at 10:37:31PM +1000, Jonathan Gray wrote: >>>>>> On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote: >>>>>>>> On Sep 3, 2022, at 02:22, Jonathan Gray wrote: >>>>>>>> >>>>>>>> ???On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote: >>>>>>>>> dv@ suggested coming to the list to request testing for the pvclock(4) >>>>>>>>> driver. Attached is a patch that corrects several bugs. Most of >>>>>>>>> these changes will only matter in the non-TSC_STABLE case on a >>>>>>>>> multiprocessor VM. >>>>>>>>> >>>>>>>>> Ideally, nothing should break. >>>>>>>>> >>>>>>>>> - pvclock yields a 64-bit value. The BSD timecounter layer can only >>>>>>>>> use the lower 32 bits, but internally we need to track the full >>>>>>>>> 64-bit value to allow comparisons with the full value in the >>>>>>>>> non-TSC_STABLE case. So make pvclock_lastcount a 64-bit quantity. >>>>>>>>> >>>>>>>>> - In pvclock_get_timecount(), move rdtsc() up into the lockless read >>>>>>>>> loop to get a more accurate timestamp. >>>>>>>>> >>>>>>>>> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc(). >>>>>>>>> >>>>>>>>> - In pvclock_get_timecount(), check that our TSC value doesn't predate >>>>>>>>> ti->ti_tsc_timestamp, otherwise we will produce an enormous value. >>>>>>>>> >>>>>>>>> - In pvclock_get_timecount(), update pvclock_lastcount in the >>>>>>>>> non-TSC_STABLE case with more care. On amd64 we can do this with an >>>>>>>>> atomic_cas_ulong(9) loop because u_long is 64 bits. On i386 we need >>>>>>>>> to introduce a mutex to protect our comparison and read/write. >>>>>>>> >>>>>>>> i386 has cmpxchg8b, no need to disable interrupts >>>>>>>> the ifdefs seem excessive >>>>>>> >>>>>>> How do I make use of CMPXCHG8B on i386 >>>>>>> in this context? >>>>>>> >>>>>>> atomic_cas_ulong(9) is a 32-bit CAS on >>>>>>> i386. >>>>>> >>>>>> static inline uint64_t >>>>>> atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n) >>>>>> { >>>>>> return __sync_val_compare_and_swap(p, o, n); >>>>>> } >>>>>> >>>>>> Or md atomic.h files could have an equivalent. >>>>>> Not possible on all 32-bit archs. >>>>>> >>>>>>> >>>>>>> We can't use FP registers in the kernel, no? >>>>>> >>>>>> What do FP registers have to do with it? >>>>>> >>>>>>> >>>>>>> Am I missing some other avenue? >>>>>> >>>>>> There is no rdtsc_lfence() on i386. Initial diff doesn't build. >>>>> >>>>> LFENCE is an SSE2 extension. As is MFENCE. I don't think I can just >>>>> drop rdtsc_lfence() into cpufunc.h and proceed without causing some >>>>> kind of fault on an older CPU. >>>>> >>>>> What are my options on a 586-class CPU for forcing RDTSC to complete >>>>> before later instructions? >>>> >>>> "3.3.2. Serializing Operations >>>> After executing certain instructions the Pentium processor serializes >>>> instruction execution. This means that any modifications to flags, >>>> registers, and memory for previous instructions are completed before >>>> the next instruction is fetched and executed. The prefetch queue >>>> is flushed as a result of serializing operations. >>&g
Re: [please test] pvclock(4): fix several bugs
> Date: Thu, 8 Sep 2022 08:32:27 -0500 > From: Scott Cheloha > > On Tue, Sep 06, 2022 at 03:30:44AM -0700, Mike Larkin wrote: > > On Sun, Sep 04, 2022 at 02:50:10PM +1000, Jonathan Gray wrote: > > > On Sat, Sep 03, 2022 at 05:33:01PM -0500, Scott Cheloha wrote: > > > > On Sat, Sep 03, 2022 at 10:37:31PM +1000, Jonathan Gray wrote: > > > > > On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote: > > > > > > > On Sep 3, 2022, at 02:22, Jonathan Gray wrote: > > > > > > > > > > > > > > ???On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote: > > > > > > >> dv@ suggested coming to the list to request testing for the > > > > > > >> pvclock(4) > > > > > > >> driver. Attached is a patch that corrects several bugs. Most of > > > > > > >> these changes will only matter in the non-TSC_STABLE case on a > > > > > > >> multiprocessor VM. > > > > > > >> > > > > > > >> Ideally, nothing should break. > > > > > > >> > > > > > > >> - pvclock yields a 64-bit value. The BSD timecounter layer can > > > > > > >> only > > > > > > >> use the lower 32 bits, but internally we need to track the full > > > > > > >> 64-bit value to allow comparisons with the full value in the > > > > > > >> non-TSC_STABLE case. So make pvclock_lastcount a 64-bit > > > > > > >> quantity. > > > > > > >> > > > > > > >> - In pvclock_get_timecount(), move rdtsc() up into the lockless > > > > > > >> read > > > > > > >> loop to get a more accurate timestamp. > > > > > > >> > > > > > > >> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc(). > > > > > > >> > > > > > > >> - In pvclock_get_timecount(), check that our TSC value doesn't > > > > > > >> predate > > > > > > >> ti->ti_tsc_timestamp, otherwise we will produce an enormous > > > > > > >> value. > > > > > > >> > > > > > > >> - In pvclock_get_timecount(), update pvclock_lastcount in the > > > > > > >> non-TSC_STABLE case with more care. On amd64 we can do this > > > > > > >> with an > > > > > > >> atomic_cas_ulong(9) loop because u_long is 64 bits. On i386 we > > > > > > >> need > > > > > > >> to introduce a mutex to protect our comparison and read/write. > > > > > > > > > > > > > > i386 has cmpxchg8b, no need to disable interrupts > > > > > > > the ifdefs seem excessive > > > > > > > > > > > > How do I make use of CMPXCHG8B on i386 > > > > > > in this context? > > > > > > > > > > > > atomic_cas_ulong(9) is a 32-bit CAS on > > > > > > i386. > > > > > > > > > > static inline uint64_t > > > > > atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n) > > > > > { > > > > > return __sync_val_compare_and_swap(p, o, n); > > > > > } > > > > > > > > > > Or md atomic.h files could have an equivalent. > > > > > Not possible on all 32-bit archs. > > > > > > > > > > > > > > > > > We can't use FP registers in the kernel, no? > > > > > > > > > > What do FP registers have to do with it? > > > > > > > > > > > > > > > > > Am I missing some other avenue? > > > > > > > > > > There is no rdtsc_lfence() on i386. Initial diff doesn't build. > > > > > > > > LFENCE is an SSE2 extension. As is MFENCE. I don't think I can just > > > > drop rdtsc_lfence() into cpufunc.h and proceed without causing some > > > > kind of fault on an older CPU. > > > > > > > > What are my options on a 586-class CPU for forcing RDTSC to complete > > > > before later instructions? > > > > > > "3.3.2. Serializing Operations > > > After executing certain instructions the Pentium processor serializes > > > instruction execution. This mean
Re: [please test] pvclock(4): fix several bugs
On Thu, Sep 08, 2022 at 08:32:27AM -0500, Scott Cheloha wrote: > On Tue, Sep 06, 2022 at 03:30:44AM -0700, Mike Larkin wrote: > > On Sun, Sep 04, 2022 at 02:50:10PM +1000, Jonathan Gray wrote: > > > On Sat, Sep 03, 2022 at 05:33:01PM -0500, Scott Cheloha wrote: > > > > On Sat, Sep 03, 2022 at 10:37:31PM +1000, Jonathan Gray wrote: > > > > > On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote: > > > > > > > On Sep 3, 2022, at 02:22, Jonathan Gray wrote: > > > > > > > > > > > > > > ???On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote: > > > > > > >> dv@ suggested coming to the list to request testing for the > > > > > > >> pvclock(4) > > > > > > >> driver. Attached is a patch that corrects several bugs. Most of > > > > > > >> these changes will only matter in the non-TSC_STABLE case on a > > > > > > >> multiprocessor VM. > > > > > > >> > > > > > > >> Ideally, nothing should break. > > > > > > >> > > > > > > >> - pvclock yields a 64-bit value. The BSD timecounter layer can > > > > > > >> only > > > > > > >> use the lower 32 bits, but internally we need to track the full > > > > > > >> 64-bit value to allow comparisons with the full value in the > > > > > > >> non-TSC_STABLE case. So make pvclock_lastcount a 64-bit > > > > > > >> quantity. > > > > > > >> > > > > > > >> - In pvclock_get_timecount(), move rdtsc() up into the lockless > > > > > > >> read > > > > > > >> loop to get a more accurate timestamp. > > > > > > >> > > > > > > >> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc(). > > > > > > >> > > > > > > >> - In pvclock_get_timecount(), check that our TSC value doesn't > > > > > > >> predate > > > > > > >> ti->ti_tsc_timestamp, otherwise we will produce an enormous > > > > > > >> value. > > > > > > >> > > > > > > >> - In pvclock_get_timecount(), update pvclock_lastcount in the > > > > > > >> non-TSC_STABLE case with more care. On amd64 we can do this > > > > > > >> with an > > > > > > >> atomic_cas_ulong(9) loop because u_long is 64 bits. On i386 we > > > > > > >> need > > > > > > >> to introduce a mutex to protect our comparison and read/write. > > > > > > > > > > > > > > i386 has cmpxchg8b, no need to disable interrupts > > > > > > > the ifdefs seem excessive > > > > > > > > > > > > How do I make use of CMPXCHG8B on i386 > > > > > > in this context? > > > > > > > > > > > > atomic_cas_ulong(9) is a 32-bit CAS on > > > > > > i386. > > > > > > > > > > static inline uint64_t > > > > > atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n) > > > > > { > > > > > return __sync_val_compare_and_swap(p, o, n); > > > > > } > > > > > > > > > > Or md atomic.h files could have an equivalent. > > > > > Not possible on all 32-bit archs. > > > > > > > > > > > > > > > > > We can't use FP registers in the kernel, no? > > > > > > > > > > What do FP registers have to do with it? > > > > > > > > > > > > > > > > > Am I missing some other avenue? > > > > > > > > > > There is no rdtsc_lfence() on i386. Initial diff doesn't build. > > > > > > > > LFENCE is an SSE2 extension. As is MFENCE. I don't think I can just > > > > drop rdtsc_lfence() into cpufunc.h and proceed without causing some > > > > kind of fault on an older CPU. > > > > > > > > What are my options on a 586-class CPU for forcing RDTSC to complete > > > > before later instructions? > > > > > > "3.3.2. Serializing Operations > > > After executing certain instructions the Pentium processor serializes > > > instruction execution. This means that an
Re: [please test] pvclock(4): fix several bugs
On Thu, Sep 08, 2022 at 08:32:27AM -0500, Scott Cheloha wrote: > On Tue, Sep 06, 2022 at 03:30:44AM -0700, Mike Larkin wrote: > > On Sun, Sep 04, 2022 at 02:50:10PM +1000, Jonathan Gray wrote: > > > On Sat, Sep 03, 2022 at 05:33:01PM -0500, Scott Cheloha wrote: > > > > On Sat, Sep 03, 2022 at 10:37:31PM +1000, Jonathan Gray wrote: > > > > > On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote: > > > > > > > On Sep 3, 2022, at 02:22, Jonathan Gray wrote: > > > > > > > > > > > > > > ???On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote: > > > > > > >> dv@ suggested coming to the list to request testing for the > > > > > > >> pvclock(4) > > > > > > >> driver. Attached is a patch that corrects several bugs. Most of > > > > > > >> these changes will only matter in the non-TSC_STABLE case on a > > > > > > >> multiprocessor VM. > > > > > > >> > > > > > > >> Ideally, nothing should break. > > > > > > >> > > > > > > >> - pvclock yields a 64-bit value. The BSD timecounter layer can > > > > > > >> only > > > > > > >> use the lower 32 bits, but internally we need to track the full > > > > > > >> 64-bit value to allow comparisons with the full value in the > > > > > > >> non-TSC_STABLE case. So make pvclock_lastcount a 64-bit > > > > > > >> quantity. > > > > > > >> > > > > > > >> - In pvclock_get_timecount(), move rdtsc() up into the lockless > > > > > > >> read > > > > > > >> loop to get a more accurate timestamp. > > > > > > >> > > > > > > >> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc(). > > > > > > >> > > > > > > >> - In pvclock_get_timecount(), check that our TSC value doesn't > > > > > > >> predate > > > > > > >> ti->ti_tsc_timestamp, otherwise we will produce an enormous > > > > > > >> value. > > > > > > >> > > > > > > >> - In pvclock_get_timecount(), update pvclock_lastcount in the > > > > > > >> non-TSC_STABLE case with more care. On amd64 we can do this > > > > > > >> with an > > > > > > >> atomic_cas_ulong(9) loop because u_long is 64 bits. On i386 we > > > > > > >> need > > > > > > >> to introduce a mutex to protect our comparison and read/write. > > > > > > > > > > > > > > i386 has cmpxchg8b, no need to disable interrupts > > > > > > > the ifdefs seem excessive > > > > > > > > > > > > How do I make use of CMPXCHG8B on i386 > > > > > > in this context? > > > > > > > > > > > > atomic_cas_ulong(9) is a 32-bit CAS on > > > > > > i386. > > > > > > > > > > static inline uint64_t > > > > > atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n) > > > > > { > > > > > return __sync_val_compare_and_swap(p, o, n); > > > > > } > > > > > > > > > > Or md atomic.h files could have an equivalent. > > > > > Not possible on all 32-bit archs. > > > > > > > > > > > > > > > > > We can't use FP registers in the kernel, no? > > > > > > > > > > What do FP registers have to do with it? > > > > > > > > > > > > > > > > > Am I missing some other avenue? > > > > > > > > > > There is no rdtsc_lfence() on i386. Initial diff doesn't build. > > > > > > > > LFENCE is an SSE2 extension. As is MFENCE. I don't think I can just > > > > drop rdtsc_lfence() into cpufunc.h and proceed without causing some > > > > kind of fault on an older CPU. > > > > > > > > What are my options on a 586-class CPU for forcing RDTSC to complete > > > > before later instructions? > > > > > > "3.3.2. Serializing Operations > > > After executing certain instructions the Pentium processor serializes > > > instruction execution. This means t
Re: [please test] pvclock(4): fix several bugs
On Tue, Sep 06, 2022 at 03:30:44AM -0700, Mike Larkin wrote: > On Sun, Sep 04, 2022 at 02:50:10PM +1000, Jonathan Gray wrote: > > On Sat, Sep 03, 2022 at 05:33:01PM -0500, Scott Cheloha wrote: > > > On Sat, Sep 03, 2022 at 10:37:31PM +1000, Jonathan Gray wrote: > > > > On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote: > > > > > > On Sep 3, 2022, at 02:22, Jonathan Gray wrote: > > > > > > > > > > > > ???On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote: > > > > > >> dv@ suggested coming to the list to request testing for the > > > > > >> pvclock(4) > > > > > >> driver. Attached is a patch that corrects several bugs. Most of > > > > > >> these changes will only matter in the non-TSC_STABLE case on a > > > > > >> multiprocessor VM. > > > > > >> > > > > > >> Ideally, nothing should break. > > > > > >> > > > > > >> - pvclock yields a 64-bit value. The BSD timecounter layer can > > > > > >> only > > > > > >> use the lower 32 bits, but internally we need to track the full > > > > > >> 64-bit value to allow comparisons with the full value in the > > > > > >> non-TSC_STABLE case. So make pvclock_lastcount a 64-bit quantity. > > > > > >> > > > > > >> - In pvclock_get_timecount(), move rdtsc() up into the lockless > > > > > >> read > > > > > >> loop to get a more accurate timestamp. > > > > > >> > > > > > >> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc(). > > > > > >> > > > > > >> - In pvclock_get_timecount(), check that our TSC value doesn't > > > > > >> predate > > > > > >> ti->ti_tsc_timestamp, otherwise we will produce an enormous value. > > > > > >> > > > > > >> - In pvclock_get_timecount(), update pvclock_lastcount in the > > > > > >> non-TSC_STABLE case with more care. On amd64 we can do this with > > > > > >> an > > > > > >> atomic_cas_ulong(9) loop because u_long is 64 bits. On i386 we > > > > > >> need > > > > > >> to introduce a mutex to protect our comparison and read/write. > > > > > > > > > > > > i386 has cmpxchg8b, no need to disable interrupts > > > > > > the ifdefs seem excessive > > > > > > > > > > How do I make use of CMPXCHG8B on i386 > > > > > in this context? > > > > > > > > > > atomic_cas_ulong(9) is a 32-bit CAS on > > > > > i386. > > > > > > > > static inline uint64_t > > > > atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n) > > > > { > > > > return __sync_val_compare_and_swap(p, o, n); > > > > } > > > > > > > > Or md atomic.h files could have an equivalent. > > > > Not possible on all 32-bit archs. > > > > > > > > > > > > > > We can't use FP registers in the kernel, no? > > > > > > > > What do FP registers have to do with it? > > > > > > > > > > > > > > Am I missing some other avenue? > > > > > > > > There is no rdtsc_lfence() on i386. Initial diff doesn't build. > > > > > > LFENCE is an SSE2 extension. As is MFENCE. I don't think I can just > > > drop rdtsc_lfence() into cpufunc.h and proceed without causing some > > > kind of fault on an older CPU. > > > > > > What are my options on a 586-class CPU for forcing RDTSC to complete > > > before later instructions? > > > > "3.3.2. Serializing Operations > > After executing certain instructions the Pentium processor serializes > > instruction execution. This means that any modifications to flags, > > registers, and memory for previous instructions are completed before > > the next instruction is fetched and executed. The prefetch queue > > is flushed as a result of serializing operations. > > > > The Pentium processor serializes instruction execution after executing > > one of the following instructions: Move to Special Register (except > > CRO), INVD, INVLPG, IRET, IRETD, LGDT, LLDT, LIDT, LTR, WBINVD, > > CPUID, RSM and WRMSR." >
Re: [please test] pvclock(4): fix several bugs
On Sun, Sep 04, 2022 at 02:50:10PM +1000, Jonathan Gray wrote: > On Sat, Sep 03, 2022 at 05:33:01PM -0500, Scott Cheloha wrote: > > On Sat, Sep 03, 2022 at 10:37:31PM +1000, Jonathan Gray wrote: > > > On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote: > > > > > On Sep 3, 2022, at 02:22, Jonathan Gray wrote: > > > > > > > > > > ???On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote: > > > > >> dv@ suggested coming to the list to request testing for the > > > > >> pvclock(4) > > > > >> driver. Attached is a patch that corrects several bugs. Most of > > > > >> these changes will only matter in the non-TSC_STABLE case on a > > > > >> multiprocessor VM. > > > > >> > > > > >> Ideally, nothing should break. > > > > >> > > > > >> - pvclock yields a 64-bit value. The BSD timecounter layer can only > > > > >> use the lower 32 bits, but internally we need to track the full > > > > >> 64-bit value to allow comparisons with the full value in the > > > > >> non-TSC_STABLE case. So make pvclock_lastcount a 64-bit quantity. > > > > >> > > > > >> - In pvclock_get_timecount(), move rdtsc() up into the lockless read > > > > >> loop to get a more accurate timestamp. > > > > >> > > > > >> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc(). > > > > >> > > > > >> - In pvclock_get_timecount(), check that our TSC value doesn't > > > > >> predate > > > > >> ti->ti_tsc_timestamp, otherwise we will produce an enormous value. > > > > >> > > > > >> - In pvclock_get_timecount(), update pvclock_lastcount in the > > > > >> non-TSC_STABLE case with more care. On amd64 we can do this with an > > > > >> atomic_cas_ulong(9) loop because u_long is 64 bits. On i386 we need > > > > >> to introduce a mutex to protect our comparison and read/write. > > > > > > > > > > i386 has cmpxchg8b, no need to disable interrupts > > > > > the ifdefs seem excessive > > > > > > > > How do I make use of CMPXCHG8B on i386 > > > > in this context? > > > > > > > > atomic_cas_ulong(9) is a 32-bit CAS on > > > > i386. > > > > > > static inline uint64_t > > > atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n) > > > { > > > return __sync_val_compare_and_swap(p, o, n); > > > } > > > > > > Or md atomic.h files could have an equivalent. > > > Not possible on all 32-bit archs. > > > > > > > > > > > We can't use FP registers in the kernel, no? > > > > > > What do FP registers have to do with it? > > > > > > > > > > > Am I missing some other avenue? > > > > > > There is no rdtsc_lfence() on i386. Initial diff doesn't build. > > > > LFENCE is an SSE2 extension. As is MFENCE. I don't think I can just > > drop rdtsc_lfence() into cpufunc.h and proceed without causing some > > kind of fault on an older CPU. > > > > What are my options on a 586-class CPU for forcing RDTSC to complete > > before later instructions? > > "3.3.2. Serializing Operations > After executing certain instructions the Pentium processor serializes > instruction execution. This means that any modifications to flags, > registers, and memory for previous instructions are completed before > the next instruction is fetched and executed. The prefetch queue > is flushed as a result of serializing operations. > > The Pentium processor serializes instruction execution after executing > one of the following instructions: Move to Special Register (except > CRO), INVD, INVLPG, IRET, IRETD, LGDT, LLDT, LIDT, LTR, WBINVD, > CPUID, RSM and WRMSR." > > from: > Pentium Processor User's Manual > Volume 1: Pentium Processor Data Book > Order Number 241428 > > http://bitsavers.org/components/intel/pentium/1993_Intel_Pentium_Processor_Users_Manual_Volume_1.pdf > > So it could be rdtsc ; cpuid. > lfence; rdtsc should still be preferred. > > It could be tested during boot and set a function pointer. > Or the codepatch bits could be used. > > In the specific case of pvclock, can it be assumed that the host > has hardware virt and would then have lfence? > I think this is a fair assumption. -ml
Re: [please test] pvclock(4): fix several bugs
On Sat, Sep 03, 2022 at 05:33:01PM -0500, Scott Cheloha wrote: > On Sat, Sep 03, 2022 at 10:37:31PM +1000, Jonathan Gray wrote: > > On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote: > > > > On Sep 3, 2022, at 02:22, Jonathan Gray wrote: > > > > > > > > ???On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote: > > > >> dv@ suggested coming to the list to request testing for the pvclock(4) > > > >> driver. Attached is a patch that corrects several bugs. Most of > > > >> these changes will only matter in the non-TSC_STABLE case on a > > > >> multiprocessor VM. > > > >> > > > >> Ideally, nothing should break. > > > >> > > > >> - pvclock yields a 64-bit value. The BSD timecounter layer can only > > > >> use the lower 32 bits, but internally we need to track the full > > > >> 64-bit value to allow comparisons with the full value in the > > > >> non-TSC_STABLE case. So make pvclock_lastcount a 64-bit quantity. > > > >> > > > >> - In pvclock_get_timecount(), move rdtsc() up into the lockless read > > > >> loop to get a more accurate timestamp. > > > >> > > > >> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc(). > > > >> > > > >> - In pvclock_get_timecount(), check that our TSC value doesn't predate > > > >> ti->ti_tsc_timestamp, otherwise we will produce an enormous value. > > > >> > > > >> - In pvclock_get_timecount(), update pvclock_lastcount in the > > > >> non-TSC_STABLE case with more care. On amd64 we can do this with an > > > >> atomic_cas_ulong(9) loop because u_long is 64 bits. On i386 we need > > > >> to introduce a mutex to protect our comparison and read/write. > > > > > > > > i386 has cmpxchg8b, no need to disable interrupts > > > > the ifdefs seem excessive > > > > > > How do I make use of CMPXCHG8B on i386 > > > in this context? > > > > > > atomic_cas_ulong(9) is a 32-bit CAS on > > > i386. > > > > static inline uint64_t > > atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n) > > { > > return __sync_val_compare_and_swap(p, o, n); > > } > > > > Or md atomic.h files could have an equivalent. > > Not possible on all 32-bit archs. > > > > > > > > We can't use FP registers in the kernel, no? > > > > What do FP registers have to do with it? > > > > > > > > Am I missing some other avenue? > > > > There is no rdtsc_lfence() on i386. Initial diff doesn't build. > > LFENCE is an SSE2 extension. As is MFENCE. I don't think I can just > drop rdtsc_lfence() into cpufunc.h and proceed without causing some > kind of fault on an older CPU. > > What are my options on a 586-class CPU for forcing RDTSC to complete > before later instructions? "3.3.2. Serializing Operations After executing certain instructions the Pentium processor serializes instruction execution. This means that any modifications to flags, registers, and memory for previous instructions are completed before the next instruction is fetched and executed. The prefetch queue is flushed as a result of serializing operations. The Pentium processor serializes instruction execution after executing one of the following instructions: Move to Special Register (except CRO), INVD, INVLPG, IRET, IRETD, LGDT, LLDT, LIDT, LTR, WBINVD, CPUID, RSM and WRMSR." from: Pentium Processor User's Manual Volume 1: Pentium Processor Data Book Order Number 241428 http://bitsavers.org/components/intel/pentium/1993_Intel_Pentium_Processor_Users_Manual_Volume_1.pdf So it could be rdtsc ; cpuid. lfence; rdtsc should still be preferred. It could be tested during boot and set a function pointer. Or the codepatch bits could be used. In the specific case of pvclock, can it be assumed that the host has hardware virt and would then have lfence?
Re: [please test] pvclock(4): fix several bugs
On Sat, Sep 03, 2022 at 10:37:31PM +1000, Jonathan Gray wrote: > On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote: > > > On Sep 3, 2022, at 02:22, Jonathan Gray wrote: > > > > > > ???On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote: > > >> dv@ suggested coming to the list to request testing for the pvclock(4) > > >> driver. Attached is a patch that corrects several bugs. Most of > > >> these changes will only matter in the non-TSC_STABLE case on a > > >> multiprocessor VM. > > >> > > >> Ideally, nothing should break. > > >> > > >> - pvclock yields a 64-bit value. The BSD timecounter layer can only > > >> use the lower 32 bits, but internally we need to track the full > > >> 64-bit value to allow comparisons with the full value in the > > >> non-TSC_STABLE case. So make pvclock_lastcount a 64-bit quantity. > > >> > > >> - In pvclock_get_timecount(), move rdtsc() up into the lockless read > > >> loop to get a more accurate timestamp. > > >> > > >> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc(). > > >> > > >> - In pvclock_get_timecount(), check that our TSC value doesn't predate > > >> ti->ti_tsc_timestamp, otherwise we will produce an enormous value. > > >> > > >> - In pvclock_get_timecount(), update pvclock_lastcount in the > > >> non-TSC_STABLE case with more care. On amd64 we can do this with an > > >> atomic_cas_ulong(9) loop because u_long is 64 bits. On i386 we need > > >> to introduce a mutex to protect our comparison and read/write. > > > > > > i386 has cmpxchg8b, no need to disable interrupts > > > the ifdefs seem excessive > > > > How do I make use of CMPXCHG8B on i386 > > in this context? > > > > atomic_cas_ulong(9) is a 32-bit CAS on > > i386. > > static inline uint64_t > atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n) > { > return __sync_val_compare_and_swap(p, o, n); > } > > Or md atomic.h files could have an equivalent. > Not possible on all 32-bit archs. > > > > > We can't use FP registers in the kernel, no? > > What do FP registers have to do with it? > > > > > Am I missing some other avenue? > > There is no rdtsc_lfence() on i386. Initial diff doesn't build. LFENCE is an SSE2 extension. As is MFENCE. I don't think I can just drop rdtsc_lfence() into cpufunc.h and proceed without causing some kind of fault on an older CPU. What are my options on a 586-class CPU for forcing RDTSC to complete before later instructions?
Re: [please test] pvclock(4): fix several bugs
> On Sep 3, 2022, at 07:37, Jonathan Gray wrote: > > On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote: >>>> On Sep 3, 2022, at 02:22, Jonathan Gray wrote: >>> >>> On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote: >>>> dv@ suggested coming to the list to request testing for the pvclock(4) >>>> driver. Attached is a patch that corrects several bugs. Most of >>>> these changes will only matter in the non-TSC_STABLE case on a >>>> multiprocessor VM. >>>> >>>> Ideally, nothing should break. >>>> >>>> - pvclock yields a 64-bit value. The BSD timecounter layer can only >>>> use the lower 32 bits, but internally we need to track the full >>>> 64-bit value to allow comparisons with the full value in the >>>> non-TSC_STABLE case. So make pvclock_lastcount a 64-bit quantity. >>>> >>>> - In pvclock_get_timecount(), move rdtsc() up into the lockless read >>>> loop to get a more accurate timestamp. >>>> >>>> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc(). >>>> >>>> - In pvclock_get_timecount(), check that our TSC value doesn't predate >>>> ti->ti_tsc_timestamp, otherwise we will produce an enormous value. >>>> >>>> - In pvclock_get_timecount(), update pvclock_lastcount in the >>>> non-TSC_STABLE case with more care. On amd64 we can do this with an >>>> atomic_cas_ulong(9) loop because u_long is 64 bits. On i386 we need >>>> to introduce a mutex to protect our comparison and read/write. >>> >>> i386 has cmpxchg8b, no need to disable interrupts >>> the ifdefs seem excessive >> >> How do I make use of CMPXCHG8B on i386 >> in this context? >> >> atomic_cas_ulong(9) is a 32-bit CAS on >> i386. > > static inline uint64_t > atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n) > { >return __sync_val_compare_and_swap(p, o, n); > } > > Or md atomic.h files could have an equivalent. > Not possible on all 32-bit archs. Gotcha. >> We can't use FP registers in the kernel, no? > > What do FP registers have to do with it? I read someplace that using FP registers was a quick and dirty way to take advantage of the 64-bit-aligned atomic access guarantees of the Pentium. >> Am I missing some other avenue? > > There is no rdtsc_lfence() on i386. Initial diff doesn't build. I will come back with a fuller patch in a bit.
Re: [please test] pvclock(4): fix several bugs
On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote: > > On Sep 3, 2022, at 02:22, Jonathan Gray wrote: > > > > On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote: > >> dv@ suggested coming to the list to request testing for the pvclock(4) > >> driver. Attached is a patch that corrects several bugs. Most of > >> these changes will only matter in the non-TSC_STABLE case on a > >> multiprocessor VM. > >> > >> Ideally, nothing should break. > >> > >> - pvclock yields a 64-bit value. The BSD timecounter layer can only > >> use the lower 32 bits, but internally we need to track the full > >> 64-bit value to allow comparisons with the full value in the > >> non-TSC_STABLE case. So make pvclock_lastcount a 64-bit quantity. > >> > >> - In pvclock_get_timecount(), move rdtsc() up into the lockless read > >> loop to get a more accurate timestamp. > >> > >> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc(). > >> > >> - In pvclock_get_timecount(), check that our TSC value doesn't predate > >> ti->ti_tsc_timestamp, otherwise we will produce an enormous value. > >> > >> - In pvclock_get_timecount(), update pvclock_lastcount in the > >> non-TSC_STABLE case with more care. On amd64 we can do this with an > >> atomic_cas_ulong(9) loop because u_long is 64 bits. On i386 we need > >> to introduce a mutex to protect our comparison and read/write. > > > > i386 has cmpxchg8b, no need to disable interrupts > > the ifdefs seem excessive > > How do I make use of CMPXCHG8B on i386 > in this context? > > atomic_cas_ulong(9) is a 32-bit CAS on > i386. static inline uint64_t atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n) { return __sync_val_compare_and_swap(p, o, n); } Or md atomic.h files could have an equivalent. Not possible on all 32-bit archs. > > We can't use FP registers in the kernel, no? What do FP registers have to do with it? > > Am I missing some other avenue? There is no rdtsc_lfence() on i386. Initial diff doesn't build.
Re: [please test] pvclock(4): fix several bugs
> On Sep 3, 2022, at 02:22, Jonathan Gray wrote: > > On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote: >> dv@ suggested coming to the list to request testing for the pvclock(4) >> driver. Attached is a patch that corrects several bugs. Most of >> these changes will only matter in the non-TSC_STABLE case on a >> multiprocessor VM. >> >> Ideally, nothing should break. >> >> - pvclock yields a 64-bit value. The BSD timecounter layer can only >> use the lower 32 bits, but internally we need to track the full >> 64-bit value to allow comparisons with the full value in the >> non-TSC_STABLE case. So make pvclock_lastcount a 64-bit quantity. >> >> - In pvclock_get_timecount(), move rdtsc() up into the lockless read >> loop to get a more accurate timestamp. >> >> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc(). >> >> - In pvclock_get_timecount(), check that our TSC value doesn't predate >> ti->ti_tsc_timestamp, otherwise we will produce an enormous value. >> >> - In pvclock_get_timecount(), update pvclock_lastcount in the >> non-TSC_STABLE case with more care. On amd64 we can do this with an >> atomic_cas_ulong(9) loop because u_long is 64 bits. On i386 we need >> to introduce a mutex to protect our comparison and read/write. > > i386 has cmpxchg8b, no need to disable interrupts > the ifdefs seem excessive How do I make use of CMPXCHG8B on i386 in this context? atomic_cas_ulong(9) is a 32-bit CAS on i386. We can't use FP registers in the kernel, no? Am I missing some other avenue?
Re: [please test] pvclock(4): fix several bugs
On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote: > dv@ suggested coming to the list to request testing for the pvclock(4) > driver. Attached is a patch that corrects several bugs. Most of > these changes will only matter in the non-TSC_STABLE case on a > multiprocessor VM. > > Ideally, nothing should break. > I've put this on an OpenBSD VM hosted on a proxmox VE 7 hypervisor. It uses pvclock as the time source both before and after the patch : kern.timecounter.hardware=pvclock0 kern.timecounter.choice=i8254(0) pvclock0(1500) acpihpet0(1000) acpitimer0(1000) OpenBSD 7.2-beta (GENERIC.MP) #300: Sat Sep 3 09:34:42 CEST 2022 matth...@zuma.herrb.net:/usr/obj/GENERIC.MP real mem = 4278005760 (4079MB) avail mem = 4130959360 (3939MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf59a0 (10 entries) bios0: vendor SeaBIOS version "rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org" date 04/01/2014 bios0: QEMU Standard PC (Q35 + ICH9, 2009) acpi0 at bios0: ACPI 3.0 acpi0: sleep states S3 S4 S5 acpi0: tables DSDT FACP APIC SSDT HPET MCFG WAET acpi0: wakeup devices acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Common KVM processor, 2000.60 MHz, 0f-06-01 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,CX16,x2APIC,HV,NXE,LONG,LAHF,MELTDOWN cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 4MB 64b/line 16-way L2 cache, 16MB 64b/line 16-way L3 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 1000MHz ioapic0 at mainbus0: apid 0 pa 0xfec0, version 11, 24 pins acpihpet0 at acpi0: 1 Hz acpimcfg0 at acpi0 acpimcfg0: addr 0xb000, bus 0-255 acpiprt0 at acpi0: bus 0 (PCI0) "ACPI0006" at acpi0 not configured acpipci0 at acpi0 PCI0: 0x0010 0x0011 0x acpicmos0 at acpi0 "PNP0A06" at acpi0 not configured "PNP0A06" at acpi0 not configured "PNP0A06" at acpi0 not configured "QEMU0002" at acpi0 not configured "ACPI0010" at acpi0 not configured "QEMUVGID" at acpi0 not configured acpicpu0 at acpi0: C1(@1 halt!) pvbus0 at mainbus0: KVM pvclock0 at pvbus0 pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 "Intel 82G33 Host" rev 0x00 vga1 at pci0 dev 1 function 0 "VMware SVGA II" rev 0x00 wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation) wsdisplay0: screen 1-5 added (80x25, vt100 emulation) uhci0 at pci0 dev 26 function 0 "Intel 82801I USB" rev 0x03: apic 0 int 16 uhci1 at pci0 dev 26 function 1 "Intel 82801I USB" rev 0x03: apic 0 int 17 uhci2 at pci0 dev 26 function 2 "Intel 82801I USB" rev 0x03: apic 0 int 18 ehci0 at pci0 dev 26 function 7 "Intel 82801I USB" rev 0x03: apic 0 int 19 usb0 at ehci0: USB revision 2.0 uhub0 at usb0 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 addr 1 azalia0 at pci0 dev 27 function 0 "Intel 82801I HD Audio" rev 0x03: msi azalia0: no HD-Audio codecs ppb0 at pci0 dev 28 function 0 vendor "Red Hat", unknown product 0x000c rev 0x00: apic 0 int 16 pci1 at ppb0 bus 1 ppb1 at pci0 dev 28 function 1 vendor "Red Hat", unknown product 0x000c rev 0x00: apic 0 int 16 pci2 at ppb1 bus 2 ppb2 at pci0 dev 28 function 2 vendor "Red Hat", unknown product 0x000c rev 0x00: apic 0 int 16 pci3 at ppb2 bus 3 ppb3 at pci0 dev 28 function 3 vendor "Red Hat", unknown product 0x000c rev 0x00: apic 0 int 16 pci4 at ppb3 bus 4 uhci3 at pci0 dev 29 function 0 "Intel 82801I USB" rev 0x03: apic 0 int 16 uhci4 at pci0 dev 29 function 1 "Intel 82801I USB" rev 0x03: apic 0 int 17 uhci5 at pci0 dev 29 function 2 "Intel 82801I USB" rev 0x03: apic 0 int 18 ehci1 at pci0 dev 29 function 7 "Intel 82801I USB" rev 0x03: apic 0 int 19 usb1 at ehci1: USB revision 2.0 uhub1 at usb1 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 addr 1 ppb4 at pci0 dev 30 function 0 "Intel 82801BA Hub-to-PCI" rev 0x92 pci5 at ppb4 bus 5 ppb5 at pci5 dev 1 function 0 "Red Hat Qemu PCI-PCI" rev 0x00 pci6 at ppb5 bus 6 virtio0 at pci6 dev 3 function 0 "Qumranet Virtio Memory Balloon" rev 0x00 viomb0 at virtio0 virtio0: apic 0 int 20 virtio1 at pci6 dev 10 function 0 "Qumranet Virtio Storage" rev 0x00 vioblk0 at virtio1 scsibus1 at vioblk0: 1 targets sd0 at scsibus1 targ 0 lun 0: sd0: 40960MB, 512 bytes/sector, 83886080 sectors virtio1: msix shared virtio2 at pci6 dev 18 function 0 "Qumranet Virtio Network" rev 0x00 vio0 at virtio2: address 82:da:9e:53:3e:6b virtio2: msix shared ppb6 at pci5 dev 2 function 0 "Red Hat Qemu
Re: [please test] pvclock(4): fix several bugs
On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote: > dv@ suggested coming to the list to request testing for the pvclock(4) > driver. Attached is a patch that corrects several bugs. Most of > these changes will only matter in the non-TSC_STABLE case on a > multiprocessor VM. > > Ideally, nothing should break. > > - pvclock yields a 64-bit value. The BSD timecounter layer can only > use the lower 32 bits, but internally we need to track the full > 64-bit value to allow comparisons with the full value in the > non-TSC_STABLE case. So make pvclock_lastcount a 64-bit quantity. > > - In pvclock_get_timecount(), move rdtsc() up into the lockless read > loop to get a more accurate timestamp. > > - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc(). > > - In pvclock_get_timecount(), check that our TSC value doesn't predate > ti->ti_tsc_timestamp, otherwise we will produce an enormous value. > > - In pvclock_get_timecount(), update pvclock_lastcount in the > non-TSC_STABLE case with more care. On amd64 we can do this with an > atomic_cas_ulong(9) loop because u_long is 64 bits. On i386 we need > to introduce a mutex to protect our comparison and read/write. i386 has cmpxchg8b, no need to disable interrupts the ifdefs seem excessive > > Index: pvclock.c > === > RCS file: /cvs/src/sys/dev/pv/pvclock.c,v > retrieving revision 1.8 > diff -u -p -r1.8 pvclock.c > --- pvclock.c 5 Nov 2021 11:38:29 - 1.8 > +++ pvclock.c 2 Sep 2022 22:54:08 - > @@ -27,6 +27,10 @@ > #include > #include > #include > +#include > +#if defined(__i386__) > +#include > +#endif > > #include > #include > @@ -35,7 +39,12 @@ > #include > #include > > -uint pvclock_lastcount; > +#if defined(__amd64__) > +volatile u_long pvclock_lastcount; > +#elif defined(__i386__) > +struct mutex pvclock_mtx = MUTEX_INITIALIZER(IPL_HIGH); > +uint64_t pvclock_lastcount; > +#endif > > struct pvclock_softc { > struct devicesc_dev; > @@ -212,7 +221,7 @@ pvclock_get_timecount(struct timecounter > { > struct pvclock_softc*sc = tc->tc_priv; > struct pvclock_time_info*ti; > - uint64_t tsc_timestamp, system_time, delta, ctr; > + uint64_t system_time, delta, ctr, tsc; > uint32_t version, mul_frac; > int8_t shift; > uint8_t flags; > @@ -220,8 +229,12 @@ pvclock_get_timecount(struct timecounter > ti = sc->sc_time; > do { > version = pvclock_read_begin(ti); > + tsc = rdtsc_lfence(); > + if (ti->ti_tsc_timestamp < tsc) > + delta = tsc - ti->ti_tsc_timestamp; > + else > + delta = 0; > system_time = ti->ti_system_time; > - tsc_timestamp = ti->ti_tsc_timestamp; > mul_frac = ti->ti_tsc_to_system_mul; > shift = ti->ti_tsc_shift; > flags = ti->ti_flags; > @@ -231,7 +244,6 @@ pvclock_get_timecount(struct timecounter >* The algorithm is described in >* linux/Documentation/virtual/kvm/msr.txt >*/ > - delta = rdtsc() - tsc_timestamp; > if (shift < 0) > delta >>= -shift; > else > @@ -241,10 +253,20 @@ pvclock_get_timecount(struct timecounter > if ((flags & PVCLOCK_FLAG_TSC_STABLE) != 0) > return (ctr); > > - if (ctr < pvclock_lastcount) > - return (pvclock_lastcount); > - > - atomic_swap_uint(_lastcount, ctr); > - > +#if defined(__amd64__) > + u_long last; > + do { > + last = pvclock_lastcount; > + if (ctr < last) > + return last; > + } while (atomic_cas_ulong(_lastcount, last, ctr) != last); > +#elif defined(__i386__) > + mtx_enter(_mtx); > + if (pvclock_lastcount < ctr) > + pvclock_lastcount = ctr; > + else > + ctr = pvclock_lastcount; > + mtx_leave(_mtx); > +#endif > return (ctr); > } > >
Re: [please test] pvclock(4): fix several bugs
On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote: > dv@ suggested coming to the list to request testing for the pvclock(4) > driver. Attached is a patch that corrects several bugs. Most of > these changes will only matter in the non-TSC_STABLE case on a > multiprocessor VM. > > Ideally, nothing should break. > > - pvclock yields a 64-bit value. The BSD timecounter layer can only > use the lower 32 bits, but internally we need to track the full > 64-bit value to allow comparisons with the full value in the > non-TSC_STABLE case. So make pvclock_lastcount a 64-bit quantity. > > - In pvclock_get_timecount(), move rdtsc() up into the lockless read > loop to get a more accurate timestamp. > > - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc(). > > - In pvclock_get_timecount(), check that our TSC value doesn't predate > ti->ti_tsc_timestamp, otherwise we will produce an enormous value. > > - In pvclock_get_timecount(), update pvclock_lastcount in the > non-TSC_STABLE case with more care. On amd64 we can do this with an > atomic_cas_ulong(9) loop because u_long is 64 bits. On i386 we need > to introduce a mutex to protect our comparison and read/write. > I tested on an 8 core ESXi VM, nothing broke. But it doesn't even have pvclock as a timesource, so I'm not sure the test is meaningful or useful. -ml > Index: pvclock.c > === > RCS file: /cvs/src/sys/dev/pv/pvclock.c,v > retrieving revision 1.8 > diff -u -p -r1.8 pvclock.c > --- pvclock.c 5 Nov 2021 11:38:29 - 1.8 > +++ pvclock.c 2 Sep 2022 22:54:08 - > @@ -27,6 +27,10 @@ > #include > #include > #include > +#include > +#if defined(__i386__) > +#include > +#endif > > #include > #include > @@ -35,7 +39,12 @@ > #include > #include > > -uint pvclock_lastcount; > +#if defined(__amd64__) > +volatile u_long pvclock_lastcount; > +#elif defined(__i386__) > +struct mutex pvclock_mtx = MUTEX_INITIALIZER(IPL_HIGH); > +uint64_t pvclock_lastcount; > +#endif > > struct pvclock_softc { > struct devicesc_dev; > @@ -212,7 +221,7 @@ pvclock_get_timecount(struct timecounter > { > struct pvclock_softc*sc = tc->tc_priv; > struct pvclock_time_info*ti; > - uint64_t tsc_timestamp, system_time, delta, ctr; > + uint64_t system_time, delta, ctr, tsc; > uint32_t version, mul_frac; > int8_t shift; > uint8_t flags; > @@ -220,8 +229,12 @@ pvclock_get_timecount(struct timecounter > ti = sc->sc_time; > do { > version = pvclock_read_begin(ti); > + tsc = rdtsc_lfence(); > + if (ti->ti_tsc_timestamp < tsc) > + delta = tsc - ti->ti_tsc_timestamp; > + else > + delta = 0; > system_time = ti->ti_system_time; > - tsc_timestamp = ti->ti_tsc_timestamp; > mul_frac = ti->ti_tsc_to_system_mul; > shift = ti->ti_tsc_shift; > flags = ti->ti_flags; > @@ -231,7 +244,6 @@ pvclock_get_timecount(struct timecounter >* The algorithm is described in >* linux/Documentation/virtual/kvm/msr.txt >*/ > - delta = rdtsc() - tsc_timestamp; > if (shift < 0) > delta >>= -shift; > else > @@ -241,10 +253,20 @@ pvclock_get_timecount(struct timecounter > if ((flags & PVCLOCK_FLAG_TSC_STABLE) != 0) > return (ctr); > > - if (ctr < pvclock_lastcount) > - return (pvclock_lastcount); > - > - atomic_swap_uint(_lastcount, ctr); > - > +#if defined(__amd64__) > + u_long last; > + do { > + last = pvclock_lastcount; > + if (ctr < last) > + return last; > + } while (atomic_cas_ulong(_lastcount, last, ctr) != last); > +#elif defined(__i386__) > + mtx_enter(_mtx); > + if (pvclock_lastcount < ctr) > + pvclock_lastcount = ctr; > + else > + ctr = pvclock_lastcount; > + mtx_leave(_mtx); > +#endif > return (ctr); > }
[please test] pvclock(4): fix several bugs
dv@ suggested coming to the list to request testing for the pvclock(4) driver. Attached is a patch that corrects several bugs. Most of these changes will only matter in the non-TSC_STABLE case on a multiprocessor VM. Ideally, nothing should break. - pvclock yields a 64-bit value. The BSD timecounter layer can only use the lower 32 bits, but internally we need to track the full 64-bit value to allow comparisons with the full value in the non-TSC_STABLE case. So make pvclock_lastcount a 64-bit quantity. - In pvclock_get_timecount(), move rdtsc() up into the lockless read loop to get a more accurate timestamp. - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc(). - In pvclock_get_timecount(), check that our TSC value doesn't predate ti->ti_tsc_timestamp, otherwise we will produce an enormous value. - In pvclock_get_timecount(), update pvclock_lastcount in the non-TSC_STABLE case with more care. On amd64 we can do this with an atomic_cas_ulong(9) loop because u_long is 64 bits. On i386 we need to introduce a mutex to protect our comparison and read/write. Index: pvclock.c === RCS file: /cvs/src/sys/dev/pv/pvclock.c,v retrieving revision 1.8 diff -u -p -r1.8 pvclock.c --- pvclock.c 5 Nov 2021 11:38:29 - 1.8 +++ pvclock.c 2 Sep 2022 22:54:08 - @@ -27,6 +27,10 @@ #include #include #include +#include +#if defined(__i386__) +#include +#endif #include #include @@ -35,7 +39,12 @@ #include #include -uint pvclock_lastcount; +#if defined(__amd64__) +volatile u_long pvclock_lastcount; +#elif defined(__i386__) +struct mutex pvclock_mtx = MUTEX_INITIALIZER(IPL_HIGH); +uint64_t pvclock_lastcount; +#endif struct pvclock_softc { struct devicesc_dev; @@ -212,7 +221,7 @@ pvclock_get_timecount(struct timecounter { struct pvclock_softc*sc = tc->tc_priv; struct pvclock_time_info*ti; - uint64_t tsc_timestamp, system_time, delta, ctr; + uint64_t system_time, delta, ctr, tsc; uint32_t version, mul_frac; int8_t shift; uint8_t flags; @@ -220,8 +229,12 @@ pvclock_get_timecount(struct timecounter ti = sc->sc_time; do { version = pvclock_read_begin(ti); + tsc = rdtsc_lfence(); + if (ti->ti_tsc_timestamp < tsc) + delta = tsc - ti->ti_tsc_timestamp; + else + delta = 0; system_time = ti->ti_system_time; - tsc_timestamp = ti->ti_tsc_timestamp; mul_frac = ti->ti_tsc_to_system_mul; shift = ti->ti_tsc_shift; flags = ti->ti_flags; @@ -231,7 +244,6 @@ pvclock_get_timecount(struct timecounter * The algorithm is described in * linux/Documentation/virtual/kvm/msr.txt */ - delta = rdtsc() - tsc_timestamp; if (shift < 0) delta >>= -shift; else @@ -241,10 +253,20 @@ pvclock_get_timecount(struct timecounter if ((flags & PVCLOCK_FLAG_TSC_STABLE) != 0) return (ctr); - if (ctr < pvclock_lastcount) - return (pvclock_lastcount); - - atomic_swap_uint(_lastcount, ctr); - +#if defined(__amd64__) + u_long last; + do { + last = pvclock_lastcount; + if (ctr < last) + return last; + } while (atomic_cas_ulong(_lastcount, last, ctr) != last); +#elif defined(__i386__) + mtx_enter(_mtx); + if (pvclock_lastcount < ctr) + pvclock_lastcount = ctr; + else + ctr = pvclock_lastcount; + mtx_leave(_mtx); +#endif return (ctr); }
Re: pvclock(4)
On 2018 Dec 04 (Tue) at 15:14:51 +0100 (+0100), Reyk Floeter wrote: :On Tue, Dec 04, 2018 at 05:43:48AM -0800, Chris Cappuccio wrote: :> Of course printf instead of panic for testers :> : :Oh, right, thanks! : :@john: Does this "slightly less simple" diff work for you? : :@phessler, Chris: Maybe we should get this fix tested and in, wait for :reports, and I can use the time to think about my other option. What :do you think? : Yea, I think this diff would help avoid panics for systems where that bit is never set, and should go in. OK :Reyk : :Index: sys/dev/pv/pvclock.c :=== :RCS file: /cvs/src/sys/dev/pv/pvclock.c,v :retrieving revision 1.2 :diff -u -p -u -p -r1.2 pvclock.c :--- sys/dev/pv/pvclock.c 24 Nov 2018 13:12:29 - 1.2 :+++ sys/dev/pv/pvclock.c 4 Dec 2018 14:03:57 - :@@ -70,6 +70,11 @@ uint pvclock_get_timecount(struct timec : void pvclock_read_time_info(struct pvclock_softc *, : struct pvclock_time_info *); : :+static inline uint32_t :+ pvclock_read_begin(const struct pvclock_time_info *); :+static inline int :+ pvclock_read_done(const struct pvclock_time_info *, uint32_t); :+ : struct cfattach pvclock_ca = { : sizeof(struct pvclock_softc), : pvclock_match, :@@ -127,8 +132,11 @@ pvclock_match(struct device *parent, voi : void : pvclock_attach(struct device *parent, struct device *self, void *aux) : { :- struct pvclock_softc*sc = (struct pvclock_softc *)self; :- paddr_t pa; :+ struct pvclock_softc*sc = (struct pvclock_softc *)self; :+ struct pvclock_time_info*ti; :+ paddr_t pa; :+ uint32_t version; :+ uint8_t flags; : : if ((sc->sc_time = km_alloc(PAGE_SIZE, : _any, _zero, _nowait)) == NULL) { :@@ -143,6 +151,19 @@ pvclock_attach(struct device *parent, st : : wrmsr(KVM_MSR_SYSTEM_TIME, pa | PVCLOCK_SYSTEM_TIME_ENABLE); : sc->sc_paddr = pa; :+ :+ ti = sc->sc_time; :+ do { :+ version = pvclock_read_begin(ti); :+ flags = ti->ti_flags; :+ } while (!pvclock_read_done(ti, version)); :+ :+ if ((flags & PVCLOCK_FLAG_TSC_STABLE) == 0) { :+ wrmsr(KVM_MSR_SYSTEM_TIME, pa & ~PVCLOCK_SYSTEM_TIME_ENABLE); :+ km_free(sc->sc_time, PAGE_SIZE, _any, _zero); :+ printf(": unstable clock\n"); :+ return; :+ } : : sc->sc_tc = _timecounter; : sc->sc_tc->tc_name = DEVNAME(sc); : -- Reality is an obstacle to hallucination.
Re: pvclock(4)
Reyk Floeter 於 2018-12-04 22:14 寫到: On Tue, Dec 04, 2018 at 05:43:48AM -0800, Chris Cappuccio wrote: Of course printf instead of panic for testers Oh, right, thanks! @john: Does this "slightly less simple" diff work for you? Hi Reyk, Yes, your patch also work for me, I can boot without manually disable pvclock. Attached dmesg, thanks. -- Key fingerprint: CDB3 6C62 254B C088 1E5D DD32 182C 97DB CF2C 80AC OpenBSD 6.4-current (GENERIC.MP) #490: Thu Nov 29 16:00:20 MST 2018 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 788389888 (751MB) avail mem = 755261440 (720MB) User Kernel Config UKC> disable 59 59 pvclock0 disabled UKC> quit Continuing... mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf5b80 (10 entries) bios0: vendor SeaBIOS version "1.11.1-1" date 04/01/2014 bios0: QEMU Standard PC (i440FX + PIIX, 1996) acpi0 at bios0: rev 0 acpi0: sleep states S3 S4 S5 acpi0: tables DSDT FACP APIC HPET acpi0: wakeup devices acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz, 118.79 MHz, 06-17-0a cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,SSSE3,CX16,SSE4.1,x2APIC,DEADLINE,XSAVE,HV,NXE,LONG,LAHF,PERF,ARAT,MELTDOWN cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 1039MHz cpu1 at mainbus0: apid 1 (application processor) cpu1: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz, 322.28 MHz, 06-17-0a cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,SSSE3,CX16,SSE4.1,x2APIC,DEADLINE,XSAVE,HV,NXE,LONG,LAHF,PERF,ARAT,MELTDOWN cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache cpu1: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu1: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu1: smt 0, core 0, package 1 ioapic0 at mainbus0: apid 0 pa 0xfec0, version 20, 24 pins acpihpet0 at acpi0: 1 Hz acpiprt0 at acpi0: bus 0 (PCI0) acpicpu0 at acpi0: C1(@1 halt!) acpicpu1 at acpi0: C1(@1 halt!) "ACPI0006" at acpi0 not configured acpipci0 at acpi0 PCI0: _OSC failed acpicmos0 at acpi0 "PNP0A06" at acpi0 not configured "PNP0A06" at acpi0 not configured "PNP0A06" at acpi0 not configured "QEMU0002" at acpi0 not configured "ACPI0010" at acpi0 not configured pvbus0 at mainbus0: KVM pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02 pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00 pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel 0 wired to compatibility, channel 1 wired to compatibility pciide0: channel 0 disabled (no drives) atapiscsi0 at pciide0 channel 1 drive 0 scsibus1 at atapiscsi0: 2 targets cd0 at scsibus1 targ 0 lun 0: ATAPI 5/cdrom removable cd0(pciide0:1:0): using PIO mode 4, DMA mode 2 piixpm0 at pci0 dev 1 function 3 "Intel 82371AB Power" rev 0x03: apic 0 int 9 iic0 at piixpm0 vga1 at pci0 dev 2 function 0 "Bochs VGA" rev 0x02 wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation) wsdisplay0: screen 1-5 added (80x25, vt100 emulation) virtio0 at pci0 dev 3 function 0 "Qumranet Virtio Network" rev 0x00 vio0 at virtio0: address aa:aa:aa:aa:aa:01 virtio0: msix shared virtio1 at pci0 dev 4 function 0 "Qumranet Virtio Storage" rev 0x00 vioblk0 at virtio1 scsibus2 at vioblk0: 2 targets sd0 at scsibus2 targ 0 lun 0: SCSI3 0/direct fixed sd0: 20480MB, 512 bytes/sector, 41943040 sectors virtio1: msix shared isa0 at pcib0 isadma0 at isa0 fdc0 at isa0 port 0x3f0/6 irq 6 drq 2 com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo pckbc0 at isa0 port 0x60/5 irq 1 irq 12 pckbd0 at pckbc0 (kbd slot) wskbd0 at pckbd0: console keyboard, using wsdisplay0 pms0 at pckbc0 (aux slot) wsmouse0 at pms0 mux 0 pcppi0 at isa0 port 0x61 spkr0 at pcppi0 lpt0 at isa0 port 0x378/4 irq 7 vscsi0 at root scsibus3 at vscsi0: 256 targets softraid0 at root scsibus4 at softraid0: 256 targets sd1 at scsibus4 targ 1 lun 0: SCSI2 0/direct fixed sd1: 20473MB, 512 bytes/sector, 41929058 sectors root on sd1a (fa1d7de7776af250.a) swap on sd1b dump on sd1b fd0 at fdc0 drive 1: density unknown syncing disks... done rebooting... OpenBSD 6.4-current (GENERIC.MP) #490: Thu Nov 29 16:00:20 MST 2018 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 788389888 (751MB) avail mem = 755261440 (720MB) User Kernel Config UKC> disable 59 59 pvclock0 disabled UKC> quit Continuing... mpath0 at root scsibus0 at mpath0: 256
Re: pvclock(4)
On Tue, Dec 04, 2018 at 05:43:48AM -0800, Chris Cappuccio wrote: > Of course printf instead of panic for testers > Oh, right, thanks! @john: Does this "slightly less simple" diff work for you? @phessler, Chris: Maybe we should get this fix tested and in, wait for reports, and I can use the time to think about my other option. What do you think? Reyk Index: sys/dev/pv/pvclock.c === RCS file: /cvs/src/sys/dev/pv/pvclock.c,v retrieving revision 1.2 diff -u -p -u -p -r1.2 pvclock.c --- sys/dev/pv/pvclock.c24 Nov 2018 13:12:29 - 1.2 +++ sys/dev/pv/pvclock.c4 Dec 2018 14:03:57 - @@ -70,6 +70,11 @@ uint pvclock_get_timecount(struct timec voidpvclock_read_time_info(struct pvclock_softc *, struct pvclock_time_info *); +static inline uint32_t +pvclock_read_begin(const struct pvclock_time_info *); +static inline int +pvclock_read_done(const struct pvclock_time_info *, uint32_t); + struct cfattach pvclock_ca = { sizeof(struct pvclock_softc), pvclock_match, @@ -127,8 +132,11 @@ pvclock_match(struct device *parent, voi void pvclock_attach(struct device *parent, struct device *self, void *aux) { - struct pvclock_softc*sc = (struct pvclock_softc *)self; - paddr_t pa; + struct pvclock_softc*sc = (struct pvclock_softc *)self; + struct pvclock_time_info*ti; + paddr_t pa; + uint32_t version; + uint8_t flags; if ((sc->sc_time = km_alloc(PAGE_SIZE, _any, _zero, _nowait)) == NULL) { @@ -143,6 +151,19 @@ pvclock_attach(struct device *parent, st wrmsr(KVM_MSR_SYSTEM_TIME, pa | PVCLOCK_SYSTEM_TIME_ENABLE); sc->sc_paddr = pa; + + ti = sc->sc_time; + do { + version = pvclock_read_begin(ti); + flags = ti->ti_flags; + } while (!pvclock_read_done(ti, version)); + + if ((flags & PVCLOCK_FLAG_TSC_STABLE) == 0) { + wrmsr(KVM_MSR_SYSTEM_TIME, pa & ~PVCLOCK_SYSTEM_TIME_ENABLE); + km_free(sc->sc_time, PAGE_SIZE, _any, _zero); + printf(": unstable clock\n"); + return; + } sc->sc_tc = _timecounter; sc->sc_tc->tc_name = DEVNAME(sc);
Re: pvclock(4)
On Tue, Dec 04, 2018 at 12:46:06PM +0100, Peter Hessler wrote: > On 2018 Dec 03 (Mon) at 16:56:10 -0800 (-0800), Chris Cappuccio wrote: > :Reyk Floeter [r...@openbsd.org] wrote: > :> > :> Yes, KVM???s stable bit is not a reliable indication as it is seems to > depend on the capabilities of the KVM version and not the actual availability > of the feature on the particular hardware. How annoying. > :> > :> As mentioned before: I???d like to disable pvclock for now and I can do > that in the morning CET if nobody beats me to it. > :> > :> I have an idea how to deal with old platforms afterwards but this needs > some more tests and thoughts. > :> > : > :Perhaps the solution is as "simple" as checking the status of the bit > :after the presence of the bit is established ? > : > > This makes sense, OK > See my other mail, this diff is not OK. Reyk > > > > :Index: pvclock.c > :=== > :RCS file: /cvs/src/sys/dev/pv/pvclock.c,v > :retrieving revision 1.2 > :diff -u -p -u -r1.2 pvclock.c > :--- pvclock.c24 Nov 2018 13:12:29 - 1.2 > :+++ pvclock.c4 Dec 2018 00:53:56 - > :@@ -127,8 +127,10 @@ pvclock_match(struct device *parent, voi > : void > : pvclock_attach(struct device *parent, struct device *self, void *aux) > : { > :-struct pvclock_softc*sc = (struct pvclock_softc *)self; > :-paddr_t pa; > :+struct pvclock_softc*sc = (struct pvclock_softc *)self; > :+struct pvclock_time_info*ti; > :+paddr_t pa; > :+uint8_t flags; > : > : if ((sc->sc_time = km_alloc(PAGE_SIZE, > : _any, _zero, _nowait)) == NULL) { > :@@ -151,6 +153,13 @@ pvclock_attach(struct device *parent, st > : > : /* Better than HPET but below TSC */ > : sc->sc_tc->tc_quality = 1500; > :+ > :+ti = sc->sc_time; > :+flags = ti->ti_flags; > :+if ((flags & PVCLOCK_FLAG_TSC_STABLE) == 0) { > :+printf(": unstable timestamp counter\n"); > :+return; > :+} > : > : tc_init(sc->sc_tc); > : > : > > -- > fortune -as
Re: pvclock(4)
On Mon, Dec 03, 2018 at 04:56:10PM -0800, Chris Cappuccio wrote: > Reyk Floeter [r...@openbsd.org] wrote: > > > > Yes, KVM???s stable bit is not a reliable indication as it is seems to > > depend on the capabilities of the KVM version and not the actual > > availability of the feature on the particular hardware. How annoying. > > > > As mentioned before: I???d like to disable pvclock for now and I can do > > that in the morning CET if nobody beats me to it. > > > > I have an idea how to deal with old platforms afterwards but this needs > > some more tests and thoughts. > > > > Perhaps the solution is as "simple" as checking the status of the bit > after the presence of the bit is established ? > The approach makes sense but it is not the right way to read the value. It only works if you don't hit a refresh cycle of the host. Better diff below. But I'm not sure if this is enough and if the "stable" flag mit appear occasionally. So I'd like to disable pvclock for now until we did some more testing and tried different options. Reyk Index: sys/dev/pv/pvclock.c === RCS file: /cvs/src/sys/dev/pv/pvclock.c,v retrieving revision 1.2 diff -u -p -u -p -r1.2 pvclock.c --- sys/dev/pv/pvclock.c24 Nov 2018 13:12:29 - 1.2 +++ sys/dev/pv/pvclock.c4 Dec 2018 12:09:06 - @@ -70,6 +70,11 @@ uint pvclock_get_timecount(struct timec voidpvclock_read_time_info(struct pvclock_softc *, struct pvclock_time_info *); +static inline uint32_t +pvclock_read_begin(const struct pvclock_time_info *); +static inline int +pvclock_read_done(const struct pvclock_time_info *, uint32_t); + struct cfattach pvclock_ca = { sizeof(struct pvclock_softc), pvclock_match, @@ -127,8 +132,11 @@ pvclock_match(struct device *parent, voi void pvclock_attach(struct device *parent, struct device *self, void *aux) { - struct pvclock_softc*sc = (struct pvclock_softc *)self; - paddr_t pa; + struct pvclock_softc*sc = (struct pvclock_softc *)self; + struct pvclock_time_info*ti; + paddr_t pa; + uint32_t version; + uint8_t flags; if ((sc->sc_time = km_alloc(PAGE_SIZE, _any, _zero, _nowait)) == NULL) { @@ -143,6 +151,19 @@ pvclock_attach(struct device *parent, st wrmsr(KVM_MSR_SYSTEM_TIME, pa | PVCLOCK_SYSTEM_TIME_ENABLE); sc->sc_paddr = pa; + + ti = sc->sc_time; + do { + version = pvclock_read_begin(ti); + flags = ti->ti_flags; + } while (!pvclock_read_done(ti, version)); + + if ((flags & PVCLOCK_FLAG_TSC_STABLE) == 0) { + wrmsr(KVM_MSR_SYSTEM_TIME, pa & ~PVCLOCK_SYSTEM_TIME_ENABLE); + km_free(sc->sc_time, PAGE_SIZE, _any, _zero); + panic(": unstable clock\n"); + return; + } sc->sc_tc = _timecounter; sc->sc_tc->tc_name = DEVNAME(sc);
Re: pvclock(4)
On 2018 Dec 03 (Mon) at 16:56:10 -0800 (-0800), Chris Cappuccio wrote: :Reyk Floeter [r...@openbsd.org] wrote: :> :> Yes, KVM???s stable bit is not a reliable indication as it is seems to depend on the capabilities of the KVM version and not the actual availability of the feature on the particular hardware. How annoying. :> :> As mentioned before: I???d like to disable pvclock for now and I can do that in the morning CET if nobody beats me to it. :> :> I have an idea how to deal with old platforms afterwards but this needs some more tests and thoughts. :> : :Perhaps the solution is as "simple" as checking the status of the bit :after the presence of the bit is established ? : This makes sense, OK :Index: pvclock.c :=== :RCS file: /cvs/src/sys/dev/pv/pvclock.c,v :retrieving revision 1.2 :diff -u -p -u -r1.2 pvclock.c :--- pvclock.c 24 Nov 2018 13:12:29 - 1.2 :+++ pvclock.c 4 Dec 2018 00:53:56 - :@@ -127,8 +127,10 @@ pvclock_match(struct device *parent, voi : void : pvclock_attach(struct device *parent, struct device *self, void *aux) : { :- struct pvclock_softc*sc = (struct pvclock_softc *)self; :- paddr_t pa; :+ struct pvclock_softc*sc = (struct pvclock_softc *)self; :+ struct pvclock_time_info*ti; :+ paddr_t pa; :+ uint8_t flags; : : if ((sc->sc_time = km_alloc(PAGE_SIZE, : _any, _zero, _nowait)) == NULL) { :@@ -151,6 +153,13 @@ pvclock_attach(struct device *parent, st : : /* Better than HPET but below TSC */ : sc->sc_tc->tc_quality = 1500; :+ :+ ti = sc->sc_time; :+ flags = ti->ti_flags; :+ if ((flags & PVCLOCK_FLAG_TSC_STABLE) == 0) { :+ printf(": unstable timestamp counter\n"); :+ return; :+ } : : tc_init(sc->sc_tc); : : -- fortune -as
Re: pvclock(4)
Chris Cappuccio 於 2018-12-04 08:56 寫到: Reyk Floeter [r...@openbsd.org] wrote: Yes, KVM???s stable bit is not a reliable indication as it is seems to depend on the capabilities of the KVM version and not the actual availability of the feature on the particular hardware. How annoying. As mentioned before: I???d like to disable pvclock for now and I can do that in the morning CET if nobody beats me to it. I have an idea how to deal with old platforms afterwards but this needs some more tests and thoughts. Perhaps the solution is as "simple" as checking the status of the bit after the presence of the bit is established ? Index: pvclock.c === RCS file: /cvs/src/sys/dev/pv/pvclock.c,v retrieving revision 1.2 diff -u -p -u -r1.2 pvclock.c --- pvclock.c 24 Nov 2018 13:12:29 - 1.2 +++ pvclock.c 4 Dec 2018 00:53:56 - @@ -127,8 +127,10 @@ pvclock_match(struct device *parent, voi void pvclock_attach(struct device *parent, struct device *self, void *aux) { - struct pvclock_softc*sc = (struct pvclock_softc *)self; - paddr_t pa; + struct pvclock_softc*sc = (struct pvclock_softc *)self; + struct pvclock_time_info*ti; + paddr_t pa; + uint8_t flags; if ((sc->sc_time = km_alloc(PAGE_SIZE, _any, _zero, _nowait)) == NULL) { @@ -151,6 +153,13 @@ pvclock_attach(struct device *parent, st /* Better than HPET but below TSC */ sc->sc_tc->tc_quality = 1500; + + ti = sc->sc_time; + flags = ti->ti_flags; + if ((flags & PVCLOCK_FLAG_TSC_STABLE) == 0) { + printf(": unstable timestamp counter\n"); + return; + } tc_init(sc->sc_tc); Hi, your patch work for me, the system boot without manually disable pvclock, thanks. Attached dmesg. -- Key fingerprint: CDB3 6C62 254B C088 1E5D DD32 182C 97DB CF2C 80AC OpenBSD 6.4-current (GENERIC.MP) #490: Thu Nov 29 16:00:20 MST 2018 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 788389888 (751MB) avail mem = 755261440 (720MB) User Kernel Config UKC> disable 59 59 pvclock0 disabled UKC> quit Continuing... mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf5b80 (10 entries) bios0: vendor SeaBIOS version "1.11.1-1" date 04/01/2014 bios0: QEMU Standard PC (i440FX + PIIX, 1996) acpi0 at bios0: rev 0 acpi0: sleep states S3 S4 S5 acpi0: tables DSDT FACP APIC HPET acpi0: wakeup devices acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz, 118.79 MHz, 06-17-0a cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,SSSE3,CX16,SSE4.1,x2APIC,DEADLINE,XSAVE,HV,NXE,LONG,LAHF,PERF,ARAT,MELTDOWN cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 1039MHz cpu1 at mainbus0: apid 1 (application processor) cpu1: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz, 322.28 MHz, 06-17-0a cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,SSSE3,CX16,SSE4.1,x2APIC,DEADLINE,XSAVE,HV,NXE,LONG,LAHF,PERF,ARAT,MELTDOWN cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache cpu1: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu1: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu1: smt 0, core 0, package 1 ioapic0 at mainbus0: apid 0 pa 0xfec0, version 20, 24 pins acpihpet0 at acpi0: 1 Hz acpiprt0 at acpi0: bus 0 (PCI0) acpicpu0 at acpi0: C1(@1 halt!) acpicpu1 at acpi0: C1(@1 halt!) "ACPI0006" at acpi0 not configured acpipci0 at acpi0 PCI0: _OSC failed acpicmos0 at acpi0 "PNP0A06" at acpi0 not configured "PNP0A06" at acpi0 not configured "PNP0A06" at acpi0 not configured "QEMU0002" at acpi0 not configured "ACPI0010" at acpi0 not configured pvbus0 at mainbus0: KVM pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02 pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00 pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel 0 wired to compatibility, channel 1 wired to compatibility pciide0: channel 0 disabled (no drives) atapiscsi0 at pciide0 channel 1 drive 0 scsibus1 at atapiscsi0: 2 targets cd0 at scsibus1 targ 0 lun 0: ATAPI 5/cdrom removable cd0(pciide0:1:0): using PIO mode 4, DMA mode 2 piixpm0 at pci0 dev 1 function 3 "Intel 82371AB
Re: pvclock(4)
Reyk Floeter [r...@openbsd.org] wrote: > > Yes, KVM???s stable bit is not a reliable indication as it is seems to depend > on the capabilities of the KVM version and not the actual availability of the > feature on the particular hardware. How annoying. > > As mentioned before: I???d like to disable pvclock for now and I can do that > in the morning CET if nobody beats me to it. > > I have an idea how to deal with old platforms afterwards but this needs some > more tests and thoughts. > Perhaps the solution is as "simple" as checking the status of the bit after the presence of the bit is established ? Index: pvclock.c === RCS file: /cvs/src/sys/dev/pv/pvclock.c,v retrieving revision 1.2 diff -u -p -u -r1.2 pvclock.c --- pvclock.c 24 Nov 2018 13:12:29 - 1.2 +++ pvclock.c 4 Dec 2018 00:53:56 - @@ -127,8 +127,10 @@ pvclock_match(struct device *parent, voi void pvclock_attach(struct device *parent, struct device *self, void *aux) { - struct pvclock_softc*sc = (struct pvclock_softc *)self; - paddr_t pa; + struct pvclock_softc*sc = (struct pvclock_softc *)self; + struct pvclock_time_info*ti; + paddr_t pa; + uint8_t flags; if ((sc->sc_time = km_alloc(PAGE_SIZE, _any, _zero, _nowait)) == NULL) { @@ -151,6 +153,13 @@ pvclock_attach(struct device *parent, st /* Better than HPET but below TSC */ sc->sc_tc->tc_quality = 1500; + + ti = sc->sc_time; + flags = ti->ti_flags; + if ((flags & PVCLOCK_FLAG_TSC_STABLE) == 0) { + printf(": unstable timestamp counter\n"); + return; + } tc_init(sc->sc_tc);
Re: pvclock(4)
> Am 04.12.2018 um 00:52 schrieb Chris Cappuccio : > > johnw [johnw.m...@gmail.com] wrote: >> >> Hi, after disable pvclock, it can boot with new kernel again, thanks. > ... >> cpu0: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz, 105.29 MHz, 06-17-0a >> cpu0: >> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,SSSE3,CX16,SSE4.1,x2APIC,DEADLINE,XSAVE,HV,NXE,LONG,LAHF,PERF,ARAT,MELTDOWN > ... > > This CPU clearly doesn't have the invariant TSC (it is required according > to Reyk Floeter's tech@ posting from Nov 19th.) So, pvclock does not handle > this situation properly, apparently checking for KVM_FEATURE_CLOCKSOURCE2 > and KVM_FEATURE_CLOCKSOURCE_STABLE_BIT is not enough. Yes, KVM’s stable bit is not a reliable indication as it is seems to depend on the capabilities of the KVM version and not the actual availability of the feature on the particular hardware. How annoying. As mentioned before: I‘d like to disable pvclock for now and I can do that in the morning CET if nobody beats me to it. I have an idea how to deal with old platforms afterwards but this needs some more tests and thoughts. Reyk
Re: pvclock(4)
johnw [johnw.m...@gmail.com] wrote: > > Hi, after disable pvclock, it can boot with new kernel again, thanks. ... > cpu0: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz, 105.29 MHz, 06-17-0a > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,SSSE3,CX16,SSE4.1,x2APIC,DEADLINE,XSAVE,HV,NXE,LONG,LAHF,PERF,ARAT,MELTDOWN ... This CPU clearly doesn't have the invariant TSC (it is required according to Reyk Floeter's tech@ posting from Nov 19th.) So, pvclock does not handle this situation properly, apparently checking for KVM_FEATURE_CLOCKSOURCE2 and KVM_FEATURE_CLOCKSOURCE_STABLE_BIT is not enough. Chris
Re: pvclock(4)
johnw 於 2018-11-29 17:38 寫到: Reyk Floeter 於 2018-11-29 14:09 寫到: Do you have a full dmesg for me? Yes, attached dmesg.log, thanks. Hi, after disable pvclock, it can boot with new kernel again, thanks. -- Key fingerprint: CDB3 6C62 254B C088 1E5D DD32 182C 97DB CF2C 80AC OpenBSD 6.4-current (GENERIC.MP) #447: Sun Nov 18 17:25:58 MST 2018 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 519954432 (495MB) avail mem = 494972928 (472MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf5b80 (10 entries) bios0: vendor SeaBIOS version "1.11.1-1" date 04/01/2014 bios0: QEMU Standard PC (i440FX + PIIX, 1996) acpi0 at bios0: rev 0 acpi0: sleep states S3 S4 S5 acpi0: tables DSDT FACP APIC HPET acpi0: wakeup devices acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz, 105.29 MHz, 06-17-0a cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,SSSE3,CX16,SSE4.1,x2APIC,DEADLINE,XSAVE,HV,NXE,LONG,LAHF,PERF,ARAT,MELTDOWN cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 1009MHz cpu1 at mainbus0: apid 1 (application processor) cpu1: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz, 165.45 MHz, 06-17-0a cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,SSSE3,CX16,SSE4.1,x2APIC,DEADLINE,XSAVE,HV,NXE,LONG,LAHF,PERF,ARAT,MELTDOWN cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache cpu1: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu1: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu1: smt 0, core 0, package 1 ioapic0 at mainbus0: apid 0 pa 0xfec0, version 20, 24 pins acpihpet0 at acpi0: 1 Hz acpiprt0 at acpi0: bus 0 (PCI0) acpicpu0 at acpi0: C1(@1 halt!) acpicpu1 at acpi0: C1(@1 halt!) "ACPI0006" at acpi0 not configured acpipci0 at acpi0 PCI0: _OSC failed acpicmos0 at acpi0 "PNP0A06" at acpi0 not configured "PNP0A06" at acpi0 not configured "PNP0A06" at acpi0 not configured "QEMU0002" at acpi0 not configured "ACPI0010" at acpi0 not configured pvbus0 at mainbus0: KVM pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02 pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00 pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel 0 wired to compatibility, channel 1 wired to compatibility pciide0: channel 0 disabled (no drives) atapiscsi0 at pciide0 channel 1 drive 0 scsibus1 at atapiscsi0: 2 targets cd0 at scsibus1 targ 0 lun 0: ATAPI 5/cdrom removable cd0(pciide0:1:0): using PIO mode 4, DMA mode 2 piixpm0 at pci0 dev 1 function 3 "Intel 82371AB Power" rev 0x03: apic 0 int 9 iic0 at piixpm0 vga1 at pci0 dev 2 function 0 "Bochs VGA" rev 0x02 wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation) wsdisplay0: screen 1-5 added (80x25, vt100 emulation) virtio0 at pci0 dev 3 function 0 "Qumranet Virtio Network" rev 0x00 vio0 at virtio0: address aa:aa:aa:aa:aa:01 virtio0: msix shared virtio1 at pci0 dev 4 function 0 "Qumranet Virtio Storage" rev 0x00 vioblk0 at virtio1 scsibus2 at vioblk0: 2 targets sd0 at scsibus2 targ 0 lun 0: SCSI3 0/direct fixed sd0: 20480MB, 512 bytes/sector, 41943040 sectors virtio1: msix shared isa0 at pcib0 isadma0 at isa0 fdc0 at isa0 port 0x3f0/6 irq 6 drq 2 com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo pckbc0 at isa0 port 0x60/5 irq 1 irq 12 pckbd0 at pckbc0 (kbd slot) wskbd0 at pckbd0: console keyboard, using wsdisplay0 pms0 at pckbc0 (aux slot) wsmouse0 at pms0 mux 0 pcppi0 at isa0 port 0x61 spkr0 at pcppi0 lpt0 at isa0 port 0x378/4 irq 7 vscsi0 at root scsibus3 at vscsi0: 256 targets softraid0 at root scsibus4 at softraid0: 256 targets sd1 at scsibus4 targ 1 lun 0: SCSI2 0/direct fixed sd1: 20473MB, 512 bytes/sector, 41929058 sectors root on sd1a (fa1d7de7776af250.a) swap on sd1b dump on sd1b fd0 at fdc0 drive 1: density unknown syncing disks... done rebooting... OpenBSD 6.4-current (GENERIC.MP) #447: Sun Nov 18 17:25:58 MST 2018 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 519954432 (495MB) avail mem = 494972928 (472MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf5b80 (10 entries) bios0: vendor SeaBIOS version "1.11.1-1" date 04/01/2014 bios0: QEMU Standard PC (i440FX + PIIX, 1996) acpi0 at bios0: rev 0 acpi0: sleep states S3 S4 S5 acpi0: tables DSDT FACP APIC HPET acpi0: wakeup devices acpitimer0 at
Re: pvclock(4)
Reyk Floeter 於 2018-11-29 14:09 寫到: Do you have a full dmesg for me? Yes, attached dmesg.log, thanks. -- Key fingerprint: CDB3 6C62 254B C088 1E5D DD32 182C 97DB CF2C 80AC OpenBSD 6.4-current (GENERIC.MP) #447: Sun Nov 18 17:25:58 MST 2018 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 519954432 (495MB) avail mem = 494972928 (472MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf5b80 (10 entries) bios0: vendor SeaBIOS version "1.11.1-1" date 04/01/2014 bios0: QEMU Standard PC (i440FX + PIIX, 1996) acpi0 at bios0: rev 0 acpi0: sleep states S3 S4 S5 acpi0: tables DSDT FACP APIC HPET acpi0: wakeup devices acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz, 105.29 MHz, 06-17-0a cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,SSSE3,CX16,SSE4.1,x2APIC,DEADLINE,XSAVE,HV,NXE,LONG,LAHF,PERF,ARAT,MELTDOWN cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 1009MHz cpu1 at mainbus0: apid 1 (application processor) cpu1: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz, 165.45 MHz, 06-17-0a cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,SSSE3,CX16,SSE4.1,x2APIC,DEADLINE,XSAVE,HV,NXE,LONG,LAHF,PERF,ARAT,MELTDOWN cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache cpu1: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu1: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu1: smt 0, core 0, package 1 ioapic0 at mainbus0: apid 0 pa 0xfec0, version 20, 24 pins acpihpet0 at acpi0: 1 Hz acpiprt0 at acpi0: bus 0 (PCI0) acpicpu0 at acpi0: C1(@1 halt!) acpicpu1 at acpi0: C1(@1 halt!) "ACPI0006" at acpi0 not configured acpipci0 at acpi0 PCI0: _OSC failed acpicmos0 at acpi0 "PNP0A06" at acpi0 not configured "PNP0A06" at acpi0 not configured "PNP0A06" at acpi0 not configured "QEMU0002" at acpi0 not configured "ACPI0010" at acpi0 not configured pvbus0 at mainbus0: KVM pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02 pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00 pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel 0 wired to compatibility, channel 1 wired to compatibility pciide0: channel 0 disabled (no drives) atapiscsi0 at pciide0 channel 1 drive 0 scsibus1 at atapiscsi0: 2 targets cd0 at scsibus1 targ 0 lun 0: ATAPI 5/cdrom removable cd0(pciide0:1:0): using PIO mode 4, DMA mode 2 piixpm0 at pci0 dev 1 function 3 "Intel 82371AB Power" rev 0x03: apic 0 int 9 iic0 at piixpm0 vga1 at pci0 dev 2 function 0 "Bochs VGA" rev 0x02 wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation) wsdisplay0: screen 1-5 added (80x25, vt100 emulation) virtio0 at pci0 dev 3 function 0 "Qumranet Virtio Network" rev 0x00 vio0 at virtio0: address aa:aa:aa:aa:aa:01 virtio0: msix shared virtio1 at pci0 dev 4 function 0 "Qumranet Virtio Storage" rev 0x00 vioblk0 at virtio1 scsibus2 at vioblk0: 2 targets sd0 at scsibus2 targ 0 lun 0: SCSI3 0/direct fixed sd0: 20480MB, 512 bytes/sector, 41943040 sectors virtio1: msix shared isa0 at pcib0 isadma0 at isa0 fdc0 at isa0 port 0x3f0/6 irq 6 drq 2 com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo pckbc0 at isa0 port 0x60/5 irq 1 irq 12 pckbd0 at pckbc0 (kbd slot) wskbd0 at pckbd0: console keyboard, using wsdisplay0 pms0 at pckbc0 (aux slot) wsmouse0 at pms0 mux 0 pcppi0 at isa0 port 0x61 spkr0 at pcppi0 lpt0 at isa0 port 0x378/4 irq 7 vscsi0 at root scsibus3 at vscsi0: 256 targets softraid0 at root scsibus4 at softraid0: 256 targets sd1 at scsibus4 targ 1 lun 0: SCSI2 0/direct fixed sd1: 20473MB, 512 bytes/sector, 41929058 sectors root on sd1a (fa1d7de7776af250.a) swap on sd1b dump on sd1b fd0 at fdc0 drive 1: density unknown
Re: pvclock(4)
Hi, > Am 29.11.2018 um 05:27 schrieb johnw : > > >> So far I only got positive reports. Where are the problems? ;) > >> Otherwise: OK? > >> Reyk > > Hi, my kvm/quest/openbsd-amd64 can not boot, after upgrade to today current > (28-nov-2018). > thanks for reporting. Do you have a full dmesg for me? Reyk > It work before upgrade (OpenBSD 6.4-current (GENERIC.MP) #447: Sun Nov 18 > 17:25:58 MST 2018) > > Host: 4.18.0-3-amd64 #1 SMP Debian 4.18.20-2 (2018-11-23) x86_64 GNU/Linux > qemu: QEMU emulator version 2.12.0 (Debian 1:2.12+dfsg-3+b1) > panic photo: https://ibb.co/gjWBhL4 > panic photo: https://ibb.co/HDrJ7Q0 > panic photo: https://ibb.co/g96J5s7 > > Thanks. > > -- > Key fingerprint: CDB3 6C62 254B C088 1E5D DD32 182C 97DB CF2C 80AC
Re: pvclock(4)
So far I only got positive reports. Where are the problems? ;) Otherwise: OK? Reyk Hi, my kvm/quest/openbsd-amd64 can not boot, after upgrade to today current (28-nov-2018). It work before upgrade (OpenBSD 6.4-current (GENERIC.MP) #447: Sun Nov 18 17:25:58 MST 2018) Host: 4.18.0-3-amd64 #1 SMP Debian 4.18.20-2 (2018-11-23) x86_64 GNU/Linux qemu: QEMU emulator version 2.12.0 (Debian 1:2.12+dfsg-3+b1) panic photo: https://ibb.co/gjWBhL4 panic photo: https://ibb.co/HDrJ7Q0 panic photo: https://ibb.co/g96J5s7 Thanks. -- Key fingerprint: CDB3 6C62 254B C088 1E5D DD32 182C 97DB CF2C 80AC
Re: pvclock(4)
> Am 25.11.2018 um 05:02 schrieb Greg Steuck : > > I realize this report is practically useless, but better out than in > (according to Shrek). > I found this in the logs of my GCE VM running syzkaller bot. No further > details were preserved... > > 2018/11/24 09:53:48 ci-openbsd-main: poll: > 94bf4886dbb69e9fbf0f92f975fc23f16fc5c80f > 2018/11/24 09:53:48 ci-openbsd-main: building kernel... > 2018/11/24 09:54:03 ci-openbsd-main: testing image... > 2018/11/24 10:04:07 ci-openbsd-main: VM boot failed with: panic: pvclock0: > unstable result on stable clock > > The host is running > OpenBSD 6.4-current (GENERIC.MP) #456: Tue Nov 20 08:46:59 MST 2018 > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > The VM kernel at the time was built at "zap 10 tab leading whitespace before > 'struct evp_pkey_ctx_st {'", so maybe "only attach pvclock(4) inside a KVM > guest" would've fixed it? Yes, correct. Sorry for that glitch. Reyk
Re: pvclock(4)
I realize this report is practically useless, but better out than in (according to Shrek). I found this in the logs of my GCE VM running syzkaller bot. No further details were preserved... 2018/11/24 09:53:48 ci-openbsd-main: poll: 94bf4886dbb69e9fbf0f92f975fc23f16fc5c80f 2018/11/24 09:53:48 ci-openbsd-main: building kernel... 2018/11/24 09:54:03 ci-openbsd-main: testing image... 2018/11/24 10:04:07 ci-openbsd-main: VM boot failed with: panic: pvclock0: unstable result on stable clock The host is running OpenBSD 6.4-current (GENERIC.MP) #456: Tue Nov 20 08:46:59 MST 2018 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP The VM kernel at the time was built at "zap 10 tab leading whitespace before 'struct evp_pkey_ctx_st {'", so maybe "only attach pvclock(4) inside a KVM guest" would've fixed it? Thanks Greg
Re: pvclock(4)
On Thu, Nov 22, 2018 at 05:24:10PM +0100, Landry Breuil wrote: > On Thu, Nov 22, 2018 at 07:44:01AM -0800, Mike Larkin wrote: > > On Thu, Nov 22, 2018 at 04:37:49PM +0100, Reyk Floeter wrote: > > > On Mon, Nov 19, 2018 at 01:12:46PM +0100, Reyk Floeter wrote: > > > > the attached diff is another attempt at implementing a pvclock(4) > > > > guest driver. This improves the clock on KVM and replaces the need > > > > for using the VM-expensive acpihpet(4). > > > > > > > > > > So far I only got positive reports. Where are the problems? ;) > > > > > > Otherwise: OK? > > > > > > Reyk > > > > > > > Reads ok. One question - you mention in pvclock.c that this is supported > > on i386 and amd64 but I only see GENERIC changes for amd64? > > > > ok mlarkin in any case, but I'd either add GENERIC changes for i386 or > > make this for sure amd64 only. > > I'm giving it a shot on my i386 vm. Also seems to work fine there, i'm stressing the vm (well, unpacking tarball/building firefox) and ntpd still manages to reduce the drift: [17:49] c32:~/ $while sleep 10 ; do ntpctl -sa|grep clock ; date ; done 4/4 peers valid, constraint offset -1s, clock unsynced, clock offset is 404.382ms Thu Nov 22 17:49:49 CET 2018 4/4 peers valid, constraint offset -1s, clock unsynced, clock offset is 354.382ms Thu Nov 22 17:49:59 CET 2018 4/4 peers valid, constraint offset -1s, clock unsynced, clock offset is 304.382ms Thu Nov 22 17:50:09 CET 2018 4/4 peers valid, constraint offset -1s, clock unsynced, clock offset is 254.382ms Thu Nov 22 17:50:19 CET 2018 4/4 peers valid, constraint offset -1s, clock unsynced, clock offset is 204.382ms Thu Nov 22 17:50:29 CET 2018 4/4 peers valid, constraint offset -1s, clock unsynced, clock offset is 154.382ms Thu Nov 22 17:50:39 CET 2018 4/4 peers valid, constraint offset -1s, clock unsynced, clock offset is 104.382ms Thu Nov 22 17:50:49 CET 2018 4/4 peers valid, constraint offset -1s, clock unsynced, clock offset is 54.382ms Thu Nov 22 17:50:59 CET 2018 4/4 peers valid, constraint offset -1s, clock unsynced Thu Nov 22 17:51:09 CET 2018 4/4 peers valid, constraint offset -1s, clock unsynced Thu Nov 22 17:51:19 CET 2018 4/4 peers valid, constraint offset -1s, clock synced, stratum 3 Thu Nov 22 17:51:29 CET 2018 dunno how valid a testcase this is dmesg below, this time emulating a QEMU Standard PC (i440FX + PIIX, 1996) Landry OpenBSD 6.4-current (GENERIC.MP) #4: Thu Nov 22 17:40:05 CET 2018 landry@c32.proxmox2:/usr/src/sys/arch/i386/compile/GENERIC.MP real mem = 3220561920 (3071MB) avail mem = 3146719232 (3000MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: date 06/23/99, BIOS32 rev. 0 @ 0xfd2b1, SMBIOS rev. 2.8 @ 0xf5980 (10 entries) bios0: vendor SeaBIOS version "rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org" date 04/01/2014 bios0: QEMU Standard PC (i440FX + PIIX, 1996) acpi0 at bios0: rev 0 acpi0: sleep states S3 S4 S5 acpi0: tables DSDT FACP APIC HPET acpi0: wakeup devices acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Common KVM processor ("GenuineIntel" 686-class) 2.94 GHz, 0f-06-01 cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,CX16,x2APIC,HV,NXE,LONG,LAHF,MELTDOWN mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 999MHz cpu1 at mainbus0: apid 1 (application processor) cpu1: Common KVM processor ("GenuineIntel" 686-class) 2.94 GHz, 0f-06-01 cpu1: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,CX16,x2APIC,HV,NXE,LONG,LAHF,MELTDOWN ioapic0 at mainbus0: apid 0 pa 0xfec0, version 11, 24 pins acpihpet0 at acpi0: 1 Hz acpiprt0 at acpi0: bus 0 (PCI0) acpicpu0 at acpi0: C1(@1 halt!) acpicpu1 at acpi0: C1(@1 halt!) "ACPI0006" at acpi0 not configured "PNP0A03" at acpi0 not configured acpicmos0 at acpi0 "PNP0A06" at acpi0 not configured "PNP0A06" at acpi0 not configured "PNP0A06" at acpi0 not configured "QEMU0002" at acpi0 not configured "ACPI0010" at acpi0 not configured bios0: ROM list: 0xc
Re: pvclock(4)
On Thu, Nov 22, 2018 at 07:44:01AM -0800, Mike Larkin wrote: > On Thu, Nov 22, 2018 at 04:37:49PM +0100, Reyk Floeter wrote: > > On Mon, Nov 19, 2018 at 01:12:46PM +0100, Reyk Floeter wrote: > > > the attached diff is another attempt at implementing a pvclock(4) > > > guest driver. This improves the clock on KVM and replaces the need > > > for using the VM-expensive acpihpet(4). > > > > > > > So far I only got positive reports. Where are the problems? ;) > > > > Otherwise: OK? > > > > Reyk > > > > Reads ok. One question - you mention in pvclock.c that this is supported > on i386 and amd64 but I only see GENERIC changes for amd64? > > ok mlarkin in any case, but I'd either add GENERIC changes for i386 or > make this for sure amd64 only. I'm giving it a shot on my i386 vm.
Re: pvclock(4)
On Thu, Nov 22, 2018 at 04:37:49PM +0100, Reyk Floeter wrote: > On Mon, Nov 19, 2018 at 01:12:46PM +0100, Reyk Floeter wrote: > > the attached diff is another attempt at implementing a pvclock(4) > > guest driver. This improves the clock on KVM and replaces the need > > for using the VM-expensive acpihpet(4). > > > > So far I only got positive reports. Where are the problems? ;) > > Otherwise: OK? > > Reyk > Reads ok. One question - you mention in pvclock.c that this is supported on i386 and amd64 but I only see GENERIC changes for amd64? ok mlarkin in any case, but I'd either add GENERIC changes for i386 or make this for sure amd64 only. -ml > > Index: share/man/man4/pvclock.4 > > ======= > > RCS file: share/man/man4/pvclock.4 > > diff -N share/man/man4/pvclock.4 > > --- /dev/null 1 Jan 1970 00:00:00 - > > +++ share/man/man4/pvclock.419 Nov 2018 11:48:33 - > > @@ -0,0 +1,45 @@ > > +.\"$OpenBSD$ > > +.\" > > +.\" Copyright (c) 2018 Reyk Floeter > > +.\" > > +.\" Permission to use, copy, modify, and distribute this software for any > > +.\" purpose with or without fee is hereby granted, provided that the above > > +.\" copyright notice and this permission notice appear in all copies. > > +.\" > > +.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL > > WARRANTIES > > +.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF > > +.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR > > +.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES > > +.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN > > +.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF > > +.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. > > +.\" > > +.Dd $Mdocdate$ > > +.Dt PVCLOCK 4 > > +.Os > > +.Sh NAME > > +.Nm pvclock > > +.Nd paravirtual clock driver > > +.Sh SYNOPSIS > > +.Cd "pvclock* at pvbus? > > +.Sh DESCRIPTION > > +The > > +.Nm > > +driver supports the paravirtual clock that is available in KVM and > > +other hypervisors. > > +.Nm > > +uses a shared page between the host and the hypervisor to synchronize > > +the TSC clock in an efficient way. > > +.Sh SEE ALSO > > +.Xr pvbus 4 > > +.Sh HISTORY > > +The > > +.Nm > > +driver first appeared in > > +.Ox 6.5 . > > +.Sh AUTHORS > > +.An -nosplit > > +The > > +.Nm > > +driver was written by > > +.An Reyk Floeter Aq Mt r...@openbsd.org . > > Index: sys/arch/amd64/conf/GENERIC > > === > > RCS file: /cvs/src/sys/arch/amd64/conf/GENERIC,v > > retrieving revision 1.464 > > diff -u -p -u -p -r1.464 GENERIC > > --- sys/arch/amd64/conf/GENERIC 26 Oct 2018 20:26:19 - 1.464 > > +++ sys/arch/amd64/conf/GENERIC 19 Nov 2018 11:48:33 - > > @@ -79,6 +79,8 @@ ipmi0 at mainbus? disable # IPMI > > > > vmt0 at pvbus? # VMware Tools > > > > +pvclock0 at pvbus? # KVM pvclock > > + > > xen0 at pvbus? # Xen HVM domU > > xnf* at xen? # Xen Netfront > > xbf* at xen? # Xen Blkfront > > Index: sys/dev/pv/files.pv > > === > > RCS file: /cvs/src/sys/dev/pv/files.pv,v > > retrieving revision 1.14 > > diff -u -p -u -p -r1.14 files.pv > > --- sys/dev/pv/files.pv 24 Aug 2018 16:07:01 - 1.14 > > +++ sys/dev/pv/files.pv 19 Nov 2018 11:48:33 - > > @@ -8,6 +8,11 @@ device pvbus > > attach pvbus at mainbus > > file dev/pv/pvbus.c pvbus needs-flag > > > > +# KVM clock > > +device pvclock > > +attach pvclock at pvbus > > +file dev/pv/pvclock.cpvclock needs-flag > > + > > # VMware Tools > > device vmt > > attach vmt at pvbus > > Index: sys/dev/pv/pvclock.c > > === > > RCS file: sys/dev/pv/pvclock.c > > diff -N sys/dev/pv/pvclock.c > > --- /dev/null 1 Jan 1970 00:00:00 - > > +++ sys/dev/pv/pvclock.c19 Nov 2018 11:48:33 - > > @@ -0,0 +1,229 @@
Re: pvclock(4)
On Mon, Nov 19, 2018 at 01:12:46PM +0100, Reyk Floeter wrote: > the attached diff is another attempt at implementing a pvclock(4) > guest driver. This improves the clock on KVM and replaces the need > for using the VM-expensive acpihpet(4). > So far I only got positive reports. Where are the problems? ;) Otherwise: OK? Reyk > Index: share/man/man4/pvclock.4 > === > RCS file: share/man/man4/pvclock.4 > diff -N share/man/man4/pvclock.4 > --- /dev/null 1 Jan 1970 00:00:00 - > +++ share/man/man4/pvclock.4 19 Nov 2018 11:48:33 - > @@ -0,0 +1,45 @@ > +.\" $OpenBSD$ > +.\" > +.\" Copyright (c) 2018 Reyk Floeter > +.\" > +.\" Permission to use, copy, modify, and distribute this software for any > +.\" purpose with or without fee is hereby granted, provided that the above > +.\" copyright notice and this permission notice appear in all copies. > +.\" > +.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES > +.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF > +.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR > +.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES > +.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN > +.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF > +.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. > +.\" > +.Dd $Mdocdate$ > +.Dt PVCLOCK 4 > +.Os > +.Sh NAME > +.Nm pvclock > +.Nd paravirtual clock driver > +.Sh SYNOPSIS > +.Cd "pvclock* at pvbus? > +.Sh DESCRIPTION > +The > +.Nm > +driver supports the paravirtual clock that is available in KVM and > +other hypervisors. > +.Nm > +uses a shared page between the host and the hypervisor to synchronize > +the TSC clock in an efficient way. > +.Sh SEE ALSO > +.Xr pvbus 4 > +.Sh HISTORY > +The > +.Nm > +driver first appeared in > +.Ox 6.5 . > +.Sh AUTHORS > +.An -nosplit > +The > +.Nm > +driver was written by > +.An Reyk Floeter Aq Mt r...@openbsd.org . > Index: sys/arch/amd64/conf/GENERIC > === > RCS file: /cvs/src/sys/arch/amd64/conf/GENERIC,v > retrieving revision 1.464 > diff -u -p -u -p -r1.464 GENERIC > --- sys/arch/amd64/conf/GENERIC 26 Oct 2018 20:26:19 - 1.464 > +++ sys/arch/amd64/conf/GENERIC 19 Nov 2018 11:48:33 - > @@ -79,6 +79,8 @@ ipmi0 at mainbus? disable # IPMI > > vmt0 at pvbus? # VMware Tools > > +pvclock0 at pvbus? # KVM pvclock > + > xen0 at pvbus? # Xen HVM domU > xnf* at xen? # Xen Netfront > xbf* at xen? # Xen Blkfront > Index: sys/dev/pv/files.pv > === > RCS file: /cvs/src/sys/dev/pv/files.pv,v > retrieving revision 1.14 > diff -u -p -u -p -r1.14 files.pv > --- sys/dev/pv/files.pv 24 Aug 2018 16:07:01 - 1.14 > +++ sys/dev/pv/files.pv 19 Nov 2018 11:48:33 - > @@ -8,6 +8,11 @@ device pvbus > attach pvbus at mainbus > file dev/pv/pvbus.c pvbus needs-flag > > +# KVM clock > +device pvclock > +attach pvclock at pvbus > +file dev/pv/pvclock.cpvclock needs-flag > + > # VMware Tools > device vmt > attach vmt at pvbus > Index: sys/dev/pv/pvclock.c > === > RCS file: sys/dev/pv/pvclock.c > diff -N sys/dev/pv/pvclock.c > --- /dev/null 1 Jan 1970 00:00:00 - > +++ sys/dev/pv/pvclock.c 19 Nov 2018 11:48:33 - > @@ -0,0 +1,229 @@ > +/* $OpenBSD$ */ > + > +/* > + * Copyright (c) 2018 Reyk Floeter > + * > + * Permission to use, copy, modify, and distribute this software for any > + * purpose with or without fee is hereby granted, provided that the above > + * copyright notice and this permission notice appear in all copies. > + * > + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES > + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF > + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR > + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES > + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN > + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF > + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
Re: pvclock(4)
On Mon, Nov 19, 2018 at 01:12:46PM +0100, Reyk Floeter wrote: > Hi, > > the attached diff is another attempt at implementing a pvclock(4) > guest driver. This improves the clock on KVM and replaces the need > for using the VM-expensive acpihpet(4). > > While pvclock(4) is available on KVM, Xen, Hyper-V (in a modified > form), it currently only attaches under KVM: > > pvbus0 at mainbus0: KVM > pvclock0 at pvbus0 Works fine on my proxmox 5.2 / KVM 2.11 dpb build cluster emulating 6x "QEMU Standard PC (Q35 + ICH9, 2009)". Before i had to restart ntpd every 5mn on each hosts, time would drift like crazy under build load. With pvclock it seems the slave clocks are stable and ntpd stays synced. kern.timecounter.tick=1 kern.timecounter.timestepwarnings=0 kern.timecounter.hardware=pvclock0 kern.timecounter.choice=i8254(0) pvclock0(1500) acpihpet0(1000) acpitimer0(1000) dummy(-100) Landry OpenBSD 6.4-current (GENERIC.MP) #0: Tue Nov 20 18:52:03 CET 2018 landry@c64.proxmox2:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 4276924416 (4078MB) avail mem = 4138016768 (3946MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf5970 (10 entries) bios0: vendor SeaBIOS version "rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org" date 04/01/2014 bios0: QEMU Standard PC (Q35 + ICH9, 2009) acpi0 at bios0: rev 0 acpi0: sleep states S3 S4 S5 acpi0: tables DSDT FACP APIC HPET MCFG acpi0: wakeup devices acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Common KVM processor, 2933.90 MHz, 0f-06-01 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,CX16,x2APIC,HV,NXE,LONG,LAHF,MELTDOWN cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 999MHz cpu1 at mainbus0: apid 1 (application processor) cpu1: Common KVM processor, 2933.54 MHz, 0f-06-01 cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,CX16,x2APIC,HV,NXE,LONG,LAHF,MELTDOWN cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache cpu1: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu1: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu1: smt 0, core 1, package 0 ioapic0 at mainbus0: apid 0 pa 0xfec0, version 11, 24 pins acpihpet0 at acpi0: 1 Hz acpimcfg0 at acpi0 acpimcfg0: addr 0xb000, bus 0-255 acpiprt0 at acpi0: bus 0 (PCI0) no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method no _STA method acpicpu0 at acpi0: C1(@1 halt!) acpicpu1 at acpi0: C1(@1 halt!) "ACPI0006" at acpi0 not configured acpipci0 at acpi0 PCI0: 0x0
Re: pvclock(4)
Reyk Floeter(r...@openbsd.org) on 2018.11.19 13:12:46 +0100: > Feedback? Tests? OKs? test on my incarnation of kvm: OpenBSD 6.4-current (GENERIC.MP) #0: Mon Nov 19 18:32:29 CET 2018 ben...@test.openbsd.fluchtwagenfahrer.de:/sys/arch/amd64/compile/GENERIC.MP real mem = 2097004544 (1999MB) avail mem = 2024218624 (1930MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.8 @ 0x7dfffd30 (14 entries) bios0: vendor SeaBIOS version "1.10.2-1" date 04/01/2014 bios0: QEMU Standard PC (i440FX + PIIX, 1996) acpi0 at bios0: rev 0 acpi0: sleep states S3 S4 S5 acpi0: tables DSDT FACP APIC HPET acpi0: wakeup devices acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Westmere E56xx/L56xx/X56xx (Nehalem-C), 2533.93 MHz, 06-2c-01 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,AES,HV,NXE,LONG,LAHF,ARAT,MELTDOWN cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 999MHz cpu1 at mainbus0: apid 1 (application processor) cpu1: Westmere E56xx/L56xx/X56xx (Nehalem-C), 2533.45 MHz, 06-2c-01 cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,AES,HV,NXE,LONG,LAHF,ARAT,MELTDOWN cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache cpu1: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu1: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu1: smt 0, core 0, package 1 cpu2 at mainbus0: apid 2 (application processor) cpu2: Westmere E56xx/L56xx/X56xx (Nehalem-C), 2533.47 MHz, 06-2c-01 cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,AES,HV,NXE,LONG,LAHF,ARAT,MELTDOWN cpu2: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache cpu2: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu2: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu2: smt 0, core 0, package 2 cpu3 at mainbus0: apid 3 (application processor) cpu3: Westmere E56xx/L56xx/X56xx (Nehalem-C), 2533.47 MHz, 06-2c-01 cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,AES,HV,NXE,LONG,LAHF,ARAT,MELTDOWN cpu3: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache cpu3: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu3: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu3: smt 0, core 0, package 3 cpu4 at mainbus0: apid 4 (application processor) cpu4: Westmere E56xx/L56xx/X56xx (Nehalem-C), 2533.46 MHz, 06-2c-01 cpu4: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,AES,HV,NXE,LONG,LAHF,ARAT,MELTDOWN cpu4: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache cpu4: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu4: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu4: smt 0, core 0, package 4 cpu5 at mainbus0: apid 5 (application processor) cpu5: Westmere E56xx/L56xx/X56xx (Nehalem-C), 2533.45 MHz, 06-2c-01 cpu5: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,AES,HV,NXE,LONG,LAHF,ARAT,MELTDOWN cpu5: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache cpu5: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu5: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped cpu5: smt 0, core 0, package 5 ioapic0 at mainbus0: apid 0 pa 0xfec0, version 11, 24 pins acpihpet0 at acpi0: 1 Hz acpiprt0 at acpi0: bus 0 (PCI0) acpicpu0 at acpi0: C1(@1 halt!) acpicpu1 at acpi0: C1(@1 halt!) acpicpu2 at acpi0: C1(@1 halt!) acpicpu3 at acpi0: C1(@1 halt!) acpicpu4 at acpi0: C1(@1 halt!) acpicpu5 at acpi0: C1(@1 halt!) "ACPI0006" at acpi0 not configured acpipci0 at acpi0 PCI0: _OSC failed acpicmos0 at acpi0 "PNP0A06" at acpi0 not configured "PNP0A06" at acpi0 not configured "PNP0A06" at acpi0 not configured "QEMU0002" at acpi0 not configured "ACPI0010" at acpi0 not configured pvbus0 at mainbus0: KVM pvclock0 at pvbus0 pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 "Intel
pvclock(4)
Hi, the attached diff is another attempt at implementing a pvclock(4) guest driver. This improves the clock on KVM and replaces the need for using the VM-expensive acpihpet(4). While pvclock(4) is available on KVM, Xen, Hyper-V (in a modified form), it currently only attaches under KVM: pvbus0 at mainbus0: KVM pvclock0 at pvbus0 A few notes: - The "invtsc" workaround to get a real TSC is not always possible, as it breaks migration and a few other things. If you're running in a cloud, it is very unlikely that you'll get it enabled. - pvclock suffered from the same problem as early TSC implementations: it wasn't synced across CPUs, so the OS had to do it manually. In OpenBSD, we never supported that and we decided to only support the newer "invariant" TSCs that started to show up around SkyLake. And I think that's why the previous pvlock implementations for OpenBSD didn't take off. - So my diff does the same for pvclock what we did for TSC: only support the "stable" pvclock that guarantees to provide a synchronized clock by either doing the synchronization on the host side or by providing it based on an invariant TSC on the host itself. So this diff might not work on older KVM versions or CPUs; I'd be happy to see some test results. - I did not copy+paste from the Linux and/or FreeBSD (they have suprising similarities but different licenses), and I wasn't convinced that it is worth copying the asm code that they use instead of the following multiplication: ctr = ((delta * mul_frac) >> 32) + system_time; Let me know if I'm wrong and if this code provides significant benefits: https://github.com/freebsd/freebsd/blob/1d6e4247415d264485ee94b59fdbc12e0c566fd0/sys/x86/x86/pvclock.c#L90-L124 - This driver only implements the "system time" clock, and not the static "wall clock". I didn't see a point in supporting the wall clock as its timestamp is only ever updated when you cold-start the VM. The wall clock struct is still defined in the driver for reference but could probably be removed. - I assigned a timecounter priority of 1500 to be in the middle between acpihpet(1000) and tsc(2000). If TSC is available, we should probably use it instead; even if pvclock is theoretically better as it is TSC-based and optimized for VMs. - This is only a guest driver, not a driver for vmm(4)/vmd(8). But it could solve some issues if somebody would implement it for vmm(4)... - I'm running it on a busy build machine on Exoscale.ch for at least a week now. This machine runs stuff that utilizes all CPUs. With HPET, we've sometimes experienced livelocks but the pvclock(4) driver made it much snappier. $ time sleep 2 0m02.00s real 0m00.00s user 0m00.01s system $ cat /var/db/ntpd.drift -4.679 $ sysctl kern.timecounter kern.timecounter.tick=1 kern.timecounter.timestepwarnings=1 kern.timecounter.hardware=pvclock0 kern.timecounter.choice=i8254(0) pvclock0(1500) acpihpet0(1000) acpitimer0(1000) dummy(-100) $ sysctl hw.model hw.ncpuonline hw.vendor hw.product hw.version hw.model=Intel Xeon Processor (Skylake) hw.ncpuonline=8 hw.vendor=Apache Software Foundation hw.product=CloudStack KVM Hypervisor hw.version=pc-i440fx-2.11 Feedback? Tests? OKs? Reyk Index: share/man/man4/pvclock.4 ======= RCS file: share/man/man4/pvclock.4 diff -N share/man/man4/pvclock.4 --- /dev/null 1 Jan 1970 00:00:00 - +++ share/man/man4/pvclock.419 Nov 2018 11:48:33 - @@ -0,0 +1,45 @@ +.\"$OpenBSD$ +.\" +.\" Copyright (c) 2018 Reyk Floeter +.\" +.\" Permission to use, copy, modify, and distribute this software for any +.\" purpose with or without fee is hereby granted, provided that the above +.\" copyright notice and this permission notice appear in all copies. +.\" +.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES +.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF +.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR +.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES +.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN +.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF +.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. +.\" +.Dd $Mdocdate$ +.Dt PVCLOCK 4 +.Os +.Sh NAME +.Nm pvclock +.Nd paravirtual clock driver +.Sh SYNOPSIS +.Cd "pvclock* at pvbus? +.Sh DESCRIPTION +The +.Nm +driver supports the paravirtual clock that is available in KVM and +other hypervisors. +.Nm +uses a shared page between the host and the hypervisor to synchronize +the TSC clock in an efficient way. +.Sh SEE ALSO +.Xr pvbus 4 +.Sh HISTORY +The +.Nm +driver fi