Re: [please test] pvclock(4): fix several bugs

2022-09-09 Thread Scott Cheloha
On Thu, Sep 08, 2022 at 10:17:12AM -0500, Scott Cheloha wrote:
> > On Sep 8, 2022, at 9:05 AM, Mike Larkin  wrote:
> > [...]
> > 
> > You could compile this and then objdump -D it and see for yourself...
> 
> I can't make heads or tails of it.  Please explain what I am looking
> at and why it is, or is not, atomic.

Or I guess I can just wing it.

Okay, so this C code:

volatile uint64_t pvclock_lastcount;

/* [...] */

void
pvclock_get_timecount(struct timecounter *tc)
{
uint64_t ctr, last;

/* [...] */

do {
last = pvclock_lastcount;
if (ctr < last)
return (last);
} while (atomic_cas_64(_lastcount, last, ctr) != last);

return (ctr);
}

... yields this amd64 disassembly (from the linked bsd binary):

81de242b:   75 2b   jne81de2458 

81de242d:   eb 01   jmp81de2430 

81de242f:   cc  int3   
81de2430:   48 8b 0d 91 fc 6c 00mov7142545(%rip),%rcx   
 # 824b20c8 
81de2437:   48 87 d1xchg   %rdx,%rcx
81de243a:   48 39 d1cmp%rdx,%rcx
81de243d:   48 87 d1xchg   %rdx,%rcx
81de2440:   72 13   jb 81de2455 

81de2442:   48 89 c8mov%rcx,%rax
81de2445:   f0 48 0f b1 15 7a fclock cmpxchg %rdx,7142522(%rip) 
   # 824b20c8 
81de244c:   6c 00 
81de244e:   48 39 c8cmp%rcx,%rax
81de2451:   75 dd   jne81de2430 

81de2453:   eb 03   jmp81de2458 

81de2455:   48 8b d1mov%rcx,%rdx
81de2458:   89 d0   mov%edx,%eax
81de245a:   48 83 c4 08 add$0x8,%rsp
81de245e:   41 5e   pop%r14
81de2460:   c9  leaveq 
81de2461:   c3  retq

... and also yields this i386 disassembly (from the pvclock.o object):

 2c7:   75 2e   jne2f7 
 2c9:   eb 05   jmp2d0 
 2cb:   cc  int3   
 2cc:   cc  int3   
 2cd:   cc  int3   
 2ce:   cc  int3   
 2cf:   cc  int3   
 2d0:   8b 15 04 00 00 00   mov0x4,%edx
 2d6:   a1 00 00 00 00  mov0x0,%eax
 2db:   87 d8   xchg   %ebx,%eax
 2dd:   39 d8   cmp%ebx,%eax
 2df:   87 d8   xchg   %ebx,%eax
 2e1:   89 f1   mov%esi,%ecx
 2e3:   19 d1   sbb%edx,%ecx
 2e5:   72 0e   jb 2f5 
 2e7:   89 f1   mov%esi,%ecx
 2e9:   f0 0f c7 0d 00 00 00lock cmpxchg8b 0x0
 2f0:   00 
 2f1:   75 dd   jne2d0 
 2f3:   eb 02   jmp2f7 
 2f5:   8b d8   mov%eax,%ebx
 2f7:   89 d8   mov%ebx,%eax
 2f9:   83 c4 1cadd$0x1c,%esp
 2fc:   5e  pop%esi
 2fd:   5f  pop%edi
 2fe:   5b  pop%ebx
 2ff:   5d  pop%ebp
 300:   c3  ret

If we isolate the pvclock_lastcount loads, on amd64 we have:

81de2430:   48 8b 0d 91 fc 6c 00mov7142545(%rip),%rcx   
 # 824b20c8 

and on i386 we have:

 2d0:   8b 15 04 00 00 00   mov0x4,%edx
 2d6:   a1 00 00 00 00  mov0x0,%eax

so the 8-byte load is atomic on amd64 (one load) and non-atomic on
i386 (two loads).

I don't know what jsg@ meant when he said the ifdefs "seemed
unnecessary", but near as I can tell they are necessary.  I need an
atomic 8-byte load and i386 can't, or won't, do it.

So I guess we go back to my original patch.

This resolves kettenis@'s atomic_cas_64() objections because we no
longer need it.

So, once again, the patch in brief:

- Add rdtsc_lfence() to i386/include/cpufunc.h

- Make pvclock_lastcount volatile uint64_t to fix the
  non-PVCLOCK_FLAG_TSC_STABLE case (see sub.).

- Check for SSE2 support in pvclock_match(), we need it for LFENCE
  in pvclock_get_timecount().

- Do RDTSC as soon as possible in the lockless read loop to get
  a better timestamp.

- Use rdtsc_lfence() instead of rdtsc() to get a better timestamp.

- Check whether our TSC lags ti->ti_tsc_timestamp so we don't
  produce a bogus delta.

- Fix the non-PVCLOCK_FLAG_TSC_STABLE case:

  + On amd64 we can do this with an atomic_cas_ulong(9) loop.  We need
to cast the pointer to (unsigned long *) or the compiler complains.
This is safe because sizeof(long) equals sizeof(uint64_t) on amd64.

  + On i386 we need a mutex, so add pvclock_mtx.  Set 

Re: [please test] pvclock(4): fix several bugs

2022-09-08 Thread Scott Cheloha
> On Sep 8, 2022, at 9:05 AM, Mike Larkin  wrote:
> 
> On Thu, Sep 08, 2022 at 08:32:27AM -0500, Scott Cheloha wrote:
>> On Tue, Sep 06, 2022 at 03:30:44AM -0700, Mike Larkin wrote:
>>> On Sun, Sep 04, 2022 at 02:50:10PM +1000, Jonathan Gray wrote:
>>>> On Sat, Sep 03, 2022 at 05:33:01PM -0500, Scott Cheloha wrote:
>>>>> On Sat, Sep 03, 2022 at 10:37:31PM +1000, Jonathan Gray wrote:
>>>>>> On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote:
>>>>>>>> On Sep 3, 2022, at 02:22, Jonathan Gray  wrote:
>>>>>>>> 
>>>>>>>> ???On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote:
>>>>>>>>> dv@ suggested coming to the list to request testing for the pvclock(4)
>>>>>>>>> driver.  Attached is a patch that corrects several bugs.  Most of
>>>>>>>>> these changes will only matter in the non-TSC_STABLE case on a
>>>>>>>>> multiprocessor VM.
>>>>>>>>> 
>>>>>>>>> Ideally, nothing should break.
>>>>>>>>> 
>>>>>>>>> - pvclock yields a 64-bit value.  The BSD timecounter layer can only
>>>>>>>>> use the lower 32 bits, but internally we need to track the full
>>>>>>>>> 64-bit value to allow comparisons with the full value in the
>>>>>>>>> non-TSC_STABLE case.  So make pvclock_lastcount a 64-bit quantity.
>>>>>>>>> 
>>>>>>>>> - In pvclock_get_timecount(), move rdtsc() up into the lockless read
>>>>>>>>> loop to get a more accurate timestamp.
>>>>>>>>> 
>>>>>>>>> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc().
>>>>>>>>> 
>>>>>>>>> - In pvclock_get_timecount(), check that our TSC value doesn't predate
>>>>>>>>> ti->ti_tsc_timestamp, otherwise we will produce an enormous value.
>>>>>>>>> 
>>>>>>>>> - In pvclock_get_timecount(), update pvclock_lastcount in the
>>>>>>>>> non-TSC_STABLE case with more care.  On amd64 we can do this with an
>>>>>>>>> atomic_cas_ulong(9) loop because u_long is 64 bits.  On i386 we need
>>>>>>>>> to introduce a mutex to protect our comparison and read/write.
>>>>>>>> 
>>>>>>>> i386 has cmpxchg8b, no need to disable interrupts
>>>>>>>> the ifdefs seem excessive
>>>>>>> 
>>>>>>> How do I make use of CMPXCHG8B on i386
>>>>>>> in this context?
>>>>>>> 
>>>>>>> atomic_cas_ulong(9) is a 32-bit CAS on
>>>>>>> i386.
>>>>>> 
>>>>>> static inline uint64_t
>>>>>> atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n)
>>>>>> {
>>>>>>  return __sync_val_compare_and_swap(p, o, n);
>>>>>> }
>>>>>> 
>>>>>> Or md atomic.h files could have an equivalent.
>>>>>> Not possible on all 32-bit archs.
>>>>>> 
>>>>>>> 
>>>>>>> We can't use FP registers in the kernel, no?
>>>>>> 
>>>>>> What do FP registers have to do with it?
>>>>>> 
>>>>>>> 
>>>>>>> Am I missing some other avenue?
>>>>>> 
>>>>>> There is no rdtsc_lfence() on i386.  Initial diff doesn't build.
>>>>> 
>>>>> LFENCE is an SSE2 extension.  As is MFENCE.  I don't think I can just
>>>>> drop rdtsc_lfence() into cpufunc.h and proceed without causing some
>>>>> kind of fault on an older CPU.
>>>>> 
>>>>> What are my options on a 586-class CPU for forcing RDTSC to complete
>>>>> before later instructions?
>>>> 
>>>> "3.3.2. Serializing Operations
>>>> After executing certain instructions the Pentium processor serializes
>>>> instruction execution. This means that any modifications to flags,
>>>> registers, and memory for previous instructions are completed before
>>>> the next instruction is fetched and executed. The prefetch queue
>>>> is flushed as a result of serializing operations.
>>>> 
>>

Re: [please test] pvclock(4): fix several bugs

2022-09-08 Thread Scott Cheloha
> On Sep 8, 2022, at 9:27 AM, Mark Kettenis  wrote:
> 
>> Date: Thu, 8 Sep 2022 08:32:27 -0500
>> From: Scott Cheloha 
>> 
>> On Tue, Sep 06, 2022 at 03:30:44AM -0700, Mike Larkin wrote:
>>> On Sun, Sep 04, 2022 at 02:50:10PM +1000, Jonathan Gray wrote:
>>>> On Sat, Sep 03, 2022 at 05:33:01PM -0500, Scott Cheloha wrote:
>>>>> On Sat, Sep 03, 2022 at 10:37:31PM +1000, Jonathan Gray wrote:
>>>>>> On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote:
>>>>>>>> On Sep 3, 2022, at 02:22, Jonathan Gray  wrote:
>>>>>>>> 
>>>>>>>> ???On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote:
>>>>>>>>> dv@ suggested coming to the list to request testing for the pvclock(4)
>>>>>>>>> driver.  Attached is a patch that corrects several bugs.  Most of
>>>>>>>>> these changes will only matter in the non-TSC_STABLE case on a
>>>>>>>>> multiprocessor VM.
>>>>>>>>> 
>>>>>>>>> Ideally, nothing should break.
>>>>>>>>> 
>>>>>>>>> - pvclock yields a 64-bit value.  The BSD timecounter layer can only
>>>>>>>>> use the lower 32 bits, but internally we need to track the full
>>>>>>>>> 64-bit value to allow comparisons with the full value in the
>>>>>>>>> non-TSC_STABLE case.  So make pvclock_lastcount a 64-bit quantity.
>>>>>>>>> 
>>>>>>>>> - In pvclock_get_timecount(), move rdtsc() up into the lockless read
>>>>>>>>> loop to get a more accurate timestamp.
>>>>>>>>> 
>>>>>>>>> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc().
>>>>>>>>> 
>>>>>>>>> - In pvclock_get_timecount(), check that our TSC value doesn't predate
>>>>>>>>> ti->ti_tsc_timestamp, otherwise we will produce an enormous value.
>>>>>>>>> 
>>>>>>>>> - In pvclock_get_timecount(), update pvclock_lastcount in the
>>>>>>>>> non-TSC_STABLE case with more care.  On amd64 we can do this with an
>>>>>>>>> atomic_cas_ulong(9) loop because u_long is 64 bits.  On i386 we need
>>>>>>>>> to introduce a mutex to protect our comparison and read/write.
>>>>>>>> 
>>>>>>>> i386 has cmpxchg8b, no need to disable interrupts
>>>>>>>> the ifdefs seem excessive
>>>>>>> 
>>>>>>> How do I make use of CMPXCHG8B on i386
>>>>>>> in this context?
>>>>>>> 
>>>>>>> atomic_cas_ulong(9) is a 32-bit CAS on
>>>>>>> i386.
>>>>>> 
>>>>>> static inline uint64_t
>>>>>> atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n)
>>>>>> {
>>>>>>  return __sync_val_compare_and_swap(p, o, n);
>>>>>> }
>>>>>> 
>>>>>> Or md atomic.h files could have an equivalent.
>>>>>> Not possible on all 32-bit archs.
>>>>>> 
>>>>>>> 
>>>>>>> We can't use FP registers in the kernel, no?
>>>>>> 
>>>>>> What do FP registers have to do with it?
>>>>>> 
>>>>>>> 
>>>>>>> Am I missing some other avenue?
>>>>>> 
>>>>>> There is no rdtsc_lfence() on i386.  Initial diff doesn't build.
>>>>> 
>>>>> LFENCE is an SSE2 extension.  As is MFENCE.  I don't think I can just
>>>>> drop rdtsc_lfence() into cpufunc.h and proceed without causing some
>>>>> kind of fault on an older CPU.
>>>>> 
>>>>> What are my options on a 586-class CPU for forcing RDTSC to complete
>>>>> before later instructions?
>>>> 
>>>> "3.3.2. Serializing Operations
>>>> After executing certain instructions the Pentium processor serializes
>>>> instruction execution. This means that any modifications to flags,
>>>> registers, and memory for previous instructions are completed before
>>>> the next instruction is fetched and executed. The prefetch queue
>>>> is flushed as a result of serializing operations.
>>&g

Re: [please test] pvclock(4): fix several bugs

2022-09-08 Thread Mark Kettenis
> Date: Thu, 8 Sep 2022 08:32:27 -0500
> From: Scott Cheloha 
> 
> On Tue, Sep 06, 2022 at 03:30:44AM -0700, Mike Larkin wrote:
> > On Sun, Sep 04, 2022 at 02:50:10PM +1000, Jonathan Gray wrote:
> > > On Sat, Sep 03, 2022 at 05:33:01PM -0500, Scott Cheloha wrote:
> > > > On Sat, Sep 03, 2022 at 10:37:31PM +1000, Jonathan Gray wrote:
> > > > > On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote:
> > > > > > > On Sep 3, 2022, at 02:22, Jonathan Gray  wrote:
> > > > > > >
> > > > > > > ???On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote:
> > > > > > >> dv@ suggested coming to the list to request testing for the 
> > > > > > >> pvclock(4)
> > > > > > >> driver.  Attached is a patch that corrects several bugs.  Most of
> > > > > > >> these changes will only matter in the non-TSC_STABLE case on a
> > > > > > >> multiprocessor VM.
> > > > > > >>
> > > > > > >> Ideally, nothing should break.
> > > > > > >>
> > > > > > >> - pvclock yields a 64-bit value.  The BSD timecounter layer can 
> > > > > > >> only
> > > > > > >>  use the lower 32 bits, but internally we need to track the full
> > > > > > >>  64-bit value to allow comparisons with the full value in the
> > > > > > >>  non-TSC_STABLE case.  So make pvclock_lastcount a 64-bit 
> > > > > > >> quantity.
> > > > > > >>
> > > > > > >> - In pvclock_get_timecount(), move rdtsc() up into the lockless 
> > > > > > >> read
> > > > > > >>  loop to get a more accurate timestamp.
> > > > > > >>
> > > > > > >> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc().
> > > > > > >>
> > > > > > >> - In pvclock_get_timecount(), check that our TSC value doesn't 
> > > > > > >> predate
> > > > > > >>  ti->ti_tsc_timestamp, otherwise we will produce an enormous 
> > > > > > >> value.
> > > > > > >>
> > > > > > >> - In pvclock_get_timecount(), update pvclock_lastcount in the
> > > > > > >>  non-TSC_STABLE case with more care.  On amd64 we can do this 
> > > > > > >> with an
> > > > > > >>  atomic_cas_ulong(9) loop because u_long is 64 bits.  On i386 we 
> > > > > > >> need
> > > > > > >>  to introduce a mutex to protect our comparison and read/write.
> > > > > > >
> > > > > > > i386 has cmpxchg8b, no need to disable interrupts
> > > > > > > the ifdefs seem excessive
> > > > > >
> > > > > > How do I make use of CMPXCHG8B on i386
> > > > > > in this context?
> > > > > >
> > > > > > atomic_cas_ulong(9) is a 32-bit CAS on
> > > > > > i386.
> > > > >
> > > > > static inline uint64_t
> > > > > atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n)
> > > > > {
> > > > >   return __sync_val_compare_and_swap(p, o, n);
> > > > > }
> > > > >
> > > > > Or md atomic.h files could have an equivalent.
> > > > > Not possible on all 32-bit archs.
> > > > >
> > > > > >
> > > > > > We can't use FP registers in the kernel, no?
> > > > >
> > > > > What do FP registers have to do with it?
> > > > >
> > > > > >
> > > > > > Am I missing some other avenue?
> > > > >
> > > > > There is no rdtsc_lfence() on i386.  Initial diff doesn't build.
> > > >
> > > > LFENCE is an SSE2 extension.  As is MFENCE.  I don't think I can just
> > > > drop rdtsc_lfence() into cpufunc.h and proceed without causing some
> > > > kind of fault on an older CPU.
> > > >
> > > > What are my options on a 586-class CPU for forcing RDTSC to complete
> > > > before later instructions?
> > >
> > > "3.3.2. Serializing Operations
> > > After executing certain instructions the Pentium processor serializes
> > > instruction execution. This mean

Re: [please test] pvclock(4): fix several bugs

2022-09-08 Thread Jonathan Gray
On Thu, Sep 08, 2022 at 08:32:27AM -0500, Scott Cheloha wrote:
> On Tue, Sep 06, 2022 at 03:30:44AM -0700, Mike Larkin wrote:
> > On Sun, Sep 04, 2022 at 02:50:10PM +1000, Jonathan Gray wrote:
> > > On Sat, Sep 03, 2022 at 05:33:01PM -0500, Scott Cheloha wrote:
> > > > On Sat, Sep 03, 2022 at 10:37:31PM +1000, Jonathan Gray wrote:
> > > > > On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote:
> > > > > > > On Sep 3, 2022, at 02:22, Jonathan Gray  wrote:
> > > > > > >
> > > > > > > ???On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote:
> > > > > > >> dv@ suggested coming to the list to request testing for the 
> > > > > > >> pvclock(4)
> > > > > > >> driver.  Attached is a patch that corrects several bugs.  Most of
> > > > > > >> these changes will only matter in the non-TSC_STABLE case on a
> > > > > > >> multiprocessor VM.
> > > > > > >>
> > > > > > >> Ideally, nothing should break.
> > > > > > >>
> > > > > > >> - pvclock yields a 64-bit value.  The BSD timecounter layer can 
> > > > > > >> only
> > > > > > >>  use the lower 32 bits, but internally we need to track the full
> > > > > > >>  64-bit value to allow comparisons with the full value in the
> > > > > > >>  non-TSC_STABLE case.  So make pvclock_lastcount a 64-bit 
> > > > > > >> quantity.
> > > > > > >>
> > > > > > >> - In pvclock_get_timecount(), move rdtsc() up into the lockless 
> > > > > > >> read
> > > > > > >>  loop to get a more accurate timestamp.
> > > > > > >>
> > > > > > >> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc().
> > > > > > >>
> > > > > > >> - In pvclock_get_timecount(), check that our TSC value doesn't 
> > > > > > >> predate
> > > > > > >>  ti->ti_tsc_timestamp, otherwise we will produce an enormous 
> > > > > > >> value.
> > > > > > >>
> > > > > > >> - In pvclock_get_timecount(), update pvclock_lastcount in the
> > > > > > >>  non-TSC_STABLE case with more care.  On amd64 we can do this 
> > > > > > >> with an
> > > > > > >>  atomic_cas_ulong(9) loop because u_long is 64 bits.  On i386 we 
> > > > > > >> need
> > > > > > >>  to introduce a mutex to protect our comparison and read/write.
> > > > > > >
> > > > > > > i386 has cmpxchg8b, no need to disable interrupts
> > > > > > > the ifdefs seem excessive
> > > > > >
> > > > > > How do I make use of CMPXCHG8B on i386
> > > > > > in this context?
> > > > > >
> > > > > > atomic_cas_ulong(9) is a 32-bit CAS on
> > > > > > i386.
> > > > >
> > > > > static inline uint64_t
> > > > > atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n)
> > > > > {
> > > > >   return __sync_val_compare_and_swap(p, o, n);
> > > > > }
> > > > >
> > > > > Or md atomic.h files could have an equivalent.
> > > > > Not possible on all 32-bit archs.
> > > > >
> > > > > >
> > > > > > We can't use FP registers in the kernel, no?
> > > > >
> > > > > What do FP registers have to do with it?
> > > > >
> > > > > >
> > > > > > Am I missing some other avenue?
> > > > >
> > > > > There is no rdtsc_lfence() on i386.  Initial diff doesn't build.
> > > >
> > > > LFENCE is an SSE2 extension.  As is MFENCE.  I don't think I can just
> > > > drop rdtsc_lfence() into cpufunc.h and proceed without causing some
> > > > kind of fault on an older CPU.
> > > >
> > > > What are my options on a 586-class CPU for forcing RDTSC to complete
> > > > before later instructions?
> > >
> > > "3.3.2. Serializing Operations
> > > After executing certain instructions the Pentium processor serializes
> > > instruction execution. This means that an

Re: [please test] pvclock(4): fix several bugs

2022-09-08 Thread Mike Larkin
On Thu, Sep 08, 2022 at 08:32:27AM -0500, Scott Cheloha wrote:
> On Tue, Sep 06, 2022 at 03:30:44AM -0700, Mike Larkin wrote:
> > On Sun, Sep 04, 2022 at 02:50:10PM +1000, Jonathan Gray wrote:
> > > On Sat, Sep 03, 2022 at 05:33:01PM -0500, Scott Cheloha wrote:
> > > > On Sat, Sep 03, 2022 at 10:37:31PM +1000, Jonathan Gray wrote:
> > > > > On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote:
> > > > > > > On Sep 3, 2022, at 02:22, Jonathan Gray  wrote:
> > > > > > >
> > > > > > > ???On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote:
> > > > > > >> dv@ suggested coming to the list to request testing for the 
> > > > > > >> pvclock(4)
> > > > > > >> driver.  Attached is a patch that corrects several bugs.  Most of
> > > > > > >> these changes will only matter in the non-TSC_STABLE case on a
> > > > > > >> multiprocessor VM.
> > > > > > >>
> > > > > > >> Ideally, nothing should break.
> > > > > > >>
> > > > > > >> - pvclock yields a 64-bit value.  The BSD timecounter layer can 
> > > > > > >> only
> > > > > > >>  use the lower 32 bits, but internally we need to track the full
> > > > > > >>  64-bit value to allow comparisons with the full value in the
> > > > > > >>  non-TSC_STABLE case.  So make pvclock_lastcount a 64-bit 
> > > > > > >> quantity.
> > > > > > >>
> > > > > > >> - In pvclock_get_timecount(), move rdtsc() up into the lockless 
> > > > > > >> read
> > > > > > >>  loop to get a more accurate timestamp.
> > > > > > >>
> > > > > > >> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc().
> > > > > > >>
> > > > > > >> - In pvclock_get_timecount(), check that our TSC value doesn't 
> > > > > > >> predate
> > > > > > >>  ti->ti_tsc_timestamp, otherwise we will produce an enormous 
> > > > > > >> value.
> > > > > > >>
> > > > > > >> - In pvclock_get_timecount(), update pvclock_lastcount in the
> > > > > > >>  non-TSC_STABLE case with more care.  On amd64 we can do this 
> > > > > > >> with an
> > > > > > >>  atomic_cas_ulong(9) loop because u_long is 64 bits.  On i386 we 
> > > > > > >> need
> > > > > > >>  to introduce a mutex to protect our comparison and read/write.
> > > > > > >
> > > > > > > i386 has cmpxchg8b, no need to disable interrupts
> > > > > > > the ifdefs seem excessive
> > > > > >
> > > > > > How do I make use of CMPXCHG8B on i386
> > > > > > in this context?
> > > > > >
> > > > > > atomic_cas_ulong(9) is a 32-bit CAS on
> > > > > > i386.
> > > > >
> > > > > static inline uint64_t
> > > > > atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n)
> > > > > {
> > > > >   return __sync_val_compare_and_swap(p, o, n);
> > > > > }
> > > > >
> > > > > Or md atomic.h files could have an equivalent.
> > > > > Not possible on all 32-bit archs.
> > > > >
> > > > > >
> > > > > > We can't use FP registers in the kernel, no?
> > > > >
> > > > > What do FP registers have to do with it?
> > > > >
> > > > > >
> > > > > > Am I missing some other avenue?
> > > > >
> > > > > There is no rdtsc_lfence() on i386.  Initial diff doesn't build.
> > > >
> > > > LFENCE is an SSE2 extension.  As is MFENCE.  I don't think I can just
> > > > drop rdtsc_lfence() into cpufunc.h and proceed without causing some
> > > > kind of fault on an older CPU.
> > > >
> > > > What are my options on a 586-class CPU for forcing RDTSC to complete
> > > > before later instructions?
> > >
> > > "3.3.2. Serializing Operations
> > > After executing certain instructions the Pentium processor serializes
> > > instruction execution. This means t

Re: [please test] pvclock(4): fix several bugs

2022-09-08 Thread Scott Cheloha
On Tue, Sep 06, 2022 at 03:30:44AM -0700, Mike Larkin wrote:
> On Sun, Sep 04, 2022 at 02:50:10PM +1000, Jonathan Gray wrote:
> > On Sat, Sep 03, 2022 at 05:33:01PM -0500, Scott Cheloha wrote:
> > > On Sat, Sep 03, 2022 at 10:37:31PM +1000, Jonathan Gray wrote:
> > > > On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote:
> > > > > > On Sep 3, 2022, at 02:22, Jonathan Gray  wrote:
> > > > > >
> > > > > > ???On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote:
> > > > > >> dv@ suggested coming to the list to request testing for the 
> > > > > >> pvclock(4)
> > > > > >> driver.  Attached is a patch that corrects several bugs.  Most of
> > > > > >> these changes will only matter in the non-TSC_STABLE case on a
> > > > > >> multiprocessor VM.
> > > > > >>
> > > > > >> Ideally, nothing should break.
> > > > > >>
> > > > > >> - pvclock yields a 64-bit value.  The BSD timecounter layer can 
> > > > > >> only
> > > > > >>  use the lower 32 bits, but internally we need to track the full
> > > > > >>  64-bit value to allow comparisons with the full value in the
> > > > > >>  non-TSC_STABLE case.  So make pvclock_lastcount a 64-bit quantity.
> > > > > >>
> > > > > >> - In pvclock_get_timecount(), move rdtsc() up into the lockless 
> > > > > >> read
> > > > > >>  loop to get a more accurate timestamp.
> > > > > >>
> > > > > >> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc().
> > > > > >>
> > > > > >> - In pvclock_get_timecount(), check that our TSC value doesn't 
> > > > > >> predate
> > > > > >>  ti->ti_tsc_timestamp, otherwise we will produce an enormous value.
> > > > > >>
> > > > > >> - In pvclock_get_timecount(), update pvclock_lastcount in the
> > > > > >>  non-TSC_STABLE case with more care.  On amd64 we can do this with 
> > > > > >> an
> > > > > >>  atomic_cas_ulong(9) loop because u_long is 64 bits.  On i386 we 
> > > > > >> need
> > > > > >>  to introduce a mutex to protect our comparison and read/write.
> > > > > >
> > > > > > i386 has cmpxchg8b, no need to disable interrupts
> > > > > > the ifdefs seem excessive
> > > > >
> > > > > How do I make use of CMPXCHG8B on i386
> > > > > in this context?
> > > > >
> > > > > atomic_cas_ulong(9) is a 32-bit CAS on
> > > > > i386.
> > > >
> > > > static inline uint64_t
> > > > atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n)
> > > > {
> > > > return __sync_val_compare_and_swap(p, o, n);
> > > > }
> > > >
> > > > Or md atomic.h files could have an equivalent.
> > > > Not possible on all 32-bit archs.
> > > >
> > > > >
> > > > > We can't use FP registers in the kernel, no?
> > > >
> > > > What do FP registers have to do with it?
> > > >
> > > > >
> > > > > Am I missing some other avenue?
> > > >
> > > > There is no rdtsc_lfence() on i386.  Initial diff doesn't build.
> > >
> > > LFENCE is an SSE2 extension.  As is MFENCE.  I don't think I can just
> > > drop rdtsc_lfence() into cpufunc.h and proceed without causing some
> > > kind of fault on an older CPU.
> > >
> > > What are my options on a 586-class CPU for forcing RDTSC to complete
> > > before later instructions?
> >
> > "3.3.2. Serializing Operations
> > After executing certain instructions the Pentium processor serializes
> > instruction execution. This means that any modifications to flags,
> > registers, and memory for previous instructions are completed before
> > the next instruction is fetched and executed. The prefetch queue
> > is flushed as a result of serializing operations.
> >
> > The Pentium processor serializes instruction execution after executing
> > one of the following instructions: Move to Special Register (except
> > CRO), INVD, INVLPG, IRET, IRETD, LGDT, LLDT, LIDT, LTR, WBINVD,
> > CPUID, RSM and WRMSR."
>

Re: [please test] pvclock(4): fix several bugs

2022-09-06 Thread Mike Larkin
On Sun, Sep 04, 2022 at 02:50:10PM +1000, Jonathan Gray wrote:
> On Sat, Sep 03, 2022 at 05:33:01PM -0500, Scott Cheloha wrote:
> > On Sat, Sep 03, 2022 at 10:37:31PM +1000, Jonathan Gray wrote:
> > > On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote:
> > > > > On Sep 3, 2022, at 02:22, Jonathan Gray  wrote:
> > > > >
> > > > > ???On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote:
> > > > >> dv@ suggested coming to the list to request testing for the 
> > > > >> pvclock(4)
> > > > >> driver.  Attached is a patch that corrects several bugs.  Most of
> > > > >> these changes will only matter in the non-TSC_STABLE case on a
> > > > >> multiprocessor VM.
> > > > >>
> > > > >> Ideally, nothing should break.
> > > > >>
> > > > >> - pvclock yields a 64-bit value.  The BSD timecounter layer can only
> > > > >>  use the lower 32 bits, but internally we need to track the full
> > > > >>  64-bit value to allow comparisons with the full value in the
> > > > >>  non-TSC_STABLE case.  So make pvclock_lastcount a 64-bit quantity.
> > > > >>
> > > > >> - In pvclock_get_timecount(), move rdtsc() up into the lockless read
> > > > >>  loop to get a more accurate timestamp.
> > > > >>
> > > > >> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc().
> > > > >>
> > > > >> - In pvclock_get_timecount(), check that our TSC value doesn't 
> > > > >> predate
> > > > >>  ti->ti_tsc_timestamp, otherwise we will produce an enormous value.
> > > > >>
> > > > >> - In pvclock_get_timecount(), update pvclock_lastcount in the
> > > > >>  non-TSC_STABLE case with more care.  On amd64 we can do this with an
> > > > >>  atomic_cas_ulong(9) loop because u_long is 64 bits.  On i386 we need
> > > > >>  to introduce a mutex to protect our comparison and read/write.
> > > > >
> > > > > i386 has cmpxchg8b, no need to disable interrupts
> > > > > the ifdefs seem excessive
> > > >
> > > > How do I make use of CMPXCHG8B on i386
> > > > in this context?
> > > >
> > > > atomic_cas_ulong(9) is a 32-bit CAS on
> > > > i386.
> > >
> > > static inline uint64_t
> > > atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n)
> > > {
> > >   return __sync_val_compare_and_swap(p, o, n);
> > > }
> > >
> > > Or md atomic.h files could have an equivalent.
> > > Not possible on all 32-bit archs.
> > >
> > > >
> > > > We can't use FP registers in the kernel, no?
> > >
> > > What do FP registers have to do with it?
> > >
> > > >
> > > > Am I missing some other avenue?
> > >
> > > There is no rdtsc_lfence() on i386.  Initial diff doesn't build.
> >
> > LFENCE is an SSE2 extension.  As is MFENCE.  I don't think I can just
> > drop rdtsc_lfence() into cpufunc.h and proceed without causing some
> > kind of fault on an older CPU.
> >
> > What are my options on a 586-class CPU for forcing RDTSC to complete
> > before later instructions?
>
> "3.3.2. Serializing Operations
> After executing certain instructions the Pentium processor serializes
> instruction execution. This means that any modifications to flags,
> registers, and memory for previous instructions are completed before
> the next instruction is fetched and executed. The prefetch queue
> is flushed as a result of serializing operations.
>
> The Pentium processor serializes instruction execution after executing
> one of the following instructions: Move to Special Register (except
> CRO), INVD, INVLPG, IRET, IRETD, LGDT, LLDT, LIDT, LTR, WBINVD,
> CPUID, RSM and WRMSR."
>
> from:
> Pentium Processor User's Manual
> Volume 1: Pentium Processor Data Book
> Order Number 241428
>
> http://bitsavers.org/components/intel/pentium/1993_Intel_Pentium_Processor_Users_Manual_Volume_1.pdf
>
> So it could be rdtsc ; cpuid.
> lfence; rdtsc should still be preferred.
>
> It could be tested during boot and set a function pointer.
> Or the codepatch bits could be used.
>
> In the specific case of pvclock, can it be assumed that the host
> has hardware virt and would then have lfence?
>

I think this is a fair assumption.

-ml



Re: [please test] pvclock(4): fix several bugs

2022-09-03 Thread Jonathan Gray
On Sat, Sep 03, 2022 at 05:33:01PM -0500, Scott Cheloha wrote:
> On Sat, Sep 03, 2022 at 10:37:31PM +1000, Jonathan Gray wrote:
> > On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote:
> > > > On Sep 3, 2022, at 02:22, Jonathan Gray  wrote:
> > > > 
> > > > ???On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote:
> > > >> dv@ suggested coming to the list to request testing for the pvclock(4)
> > > >> driver.  Attached is a patch that corrects several bugs.  Most of
> > > >> these changes will only matter in the non-TSC_STABLE case on a
> > > >> multiprocessor VM.
> > > >> 
> > > >> Ideally, nothing should break.
> > > >> 
> > > >> - pvclock yields a 64-bit value.  The BSD timecounter layer can only
> > > >>  use the lower 32 bits, but internally we need to track the full
> > > >>  64-bit value to allow comparisons with the full value in the
> > > >>  non-TSC_STABLE case.  So make pvclock_lastcount a 64-bit quantity.
> > > >> 
> > > >> - In pvclock_get_timecount(), move rdtsc() up into the lockless read
> > > >>  loop to get a more accurate timestamp.
> > > >> 
> > > >> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc().
> > > >> 
> > > >> - In pvclock_get_timecount(), check that our TSC value doesn't predate
> > > >>  ti->ti_tsc_timestamp, otherwise we will produce an enormous value.
> > > >> 
> > > >> - In pvclock_get_timecount(), update pvclock_lastcount in the
> > > >>  non-TSC_STABLE case with more care.  On amd64 we can do this with an
> > > >>  atomic_cas_ulong(9) loop because u_long is 64 bits.  On i386 we need
> > > >>  to introduce a mutex to protect our comparison and read/write.
> > > > 
> > > > i386 has cmpxchg8b, no need to disable interrupts
> > > > the ifdefs seem excessive
> > > 
> > > How do I make use of CMPXCHG8B on i386
> > > in this context?
> > > 
> > > atomic_cas_ulong(9) is a 32-bit CAS on
> > > i386.
> > 
> > static inline uint64_t
> > atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n)
> > {
> > return __sync_val_compare_and_swap(p, o, n);
> > }
> > 
> > Or md atomic.h files could have an equivalent.
> > Not possible on all 32-bit archs.
> > 
> > > 
> > > We can't use FP registers in the kernel, no?
> > 
> > What do FP registers have to do with it?
> > 
> > > 
> > > Am I missing some other avenue?
> > 
> > There is no rdtsc_lfence() on i386.  Initial diff doesn't build.
> 
> LFENCE is an SSE2 extension.  As is MFENCE.  I don't think I can just
> drop rdtsc_lfence() into cpufunc.h and proceed without causing some
> kind of fault on an older CPU.
> 
> What are my options on a 586-class CPU for forcing RDTSC to complete
> before later instructions?

"3.3.2. Serializing Operations
After executing certain instructions the Pentium processor serializes
instruction execution. This means that any modifications to flags,
registers, and memory for previous instructions are completed before
the next instruction is fetched and executed. The prefetch queue
is flushed as a result of serializing operations.

The Pentium processor serializes instruction execution after executing
one of the following instructions: Move to Special Register (except
CRO), INVD, INVLPG, IRET, IRETD, LGDT, LLDT, LIDT, LTR, WBINVD,
CPUID, RSM and WRMSR."

from:
Pentium Processor User's Manual
Volume 1: Pentium Processor Data Book
Order Number 241428

http://bitsavers.org/components/intel/pentium/1993_Intel_Pentium_Processor_Users_Manual_Volume_1.pdf

So it could be rdtsc ; cpuid.
lfence; rdtsc should still be preferred.

It could be tested during boot and set a function pointer.
Or the codepatch bits could be used.

In the specific case of pvclock, can it be assumed that the host
has hardware virt and would then have lfence?



Re: [please test] pvclock(4): fix several bugs

2022-09-03 Thread Scott Cheloha
On Sat, Sep 03, 2022 at 10:37:31PM +1000, Jonathan Gray wrote:
> On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote:
> > > On Sep 3, 2022, at 02:22, Jonathan Gray  wrote:
> > > 
> > > ???On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote:
> > >> dv@ suggested coming to the list to request testing for the pvclock(4)
> > >> driver.  Attached is a patch that corrects several bugs.  Most of
> > >> these changes will only matter in the non-TSC_STABLE case on a
> > >> multiprocessor VM.
> > >> 
> > >> Ideally, nothing should break.
> > >> 
> > >> - pvclock yields a 64-bit value.  The BSD timecounter layer can only
> > >>  use the lower 32 bits, but internally we need to track the full
> > >>  64-bit value to allow comparisons with the full value in the
> > >>  non-TSC_STABLE case.  So make pvclock_lastcount a 64-bit quantity.
> > >> 
> > >> - In pvclock_get_timecount(), move rdtsc() up into the lockless read
> > >>  loop to get a more accurate timestamp.
> > >> 
> > >> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc().
> > >> 
> > >> - In pvclock_get_timecount(), check that our TSC value doesn't predate
> > >>  ti->ti_tsc_timestamp, otherwise we will produce an enormous value.
> > >> 
> > >> - In pvclock_get_timecount(), update pvclock_lastcount in the
> > >>  non-TSC_STABLE case with more care.  On amd64 we can do this with an
> > >>  atomic_cas_ulong(9) loop because u_long is 64 bits.  On i386 we need
> > >>  to introduce a mutex to protect our comparison and read/write.
> > > 
> > > i386 has cmpxchg8b, no need to disable interrupts
> > > the ifdefs seem excessive
> > 
> > How do I make use of CMPXCHG8B on i386
> > in this context?
> > 
> > atomic_cas_ulong(9) is a 32-bit CAS on
> > i386.
> 
> static inline uint64_t
> atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n)
> {
>   return __sync_val_compare_and_swap(p, o, n);
> }
> 
> Or md atomic.h files could have an equivalent.
> Not possible on all 32-bit archs.
> 
> > 
> > We can't use FP registers in the kernel, no?
> 
> What do FP registers have to do with it?
> 
> > 
> > Am I missing some other avenue?
> 
> There is no rdtsc_lfence() on i386.  Initial diff doesn't build.

LFENCE is an SSE2 extension.  As is MFENCE.  I don't think I can just
drop rdtsc_lfence() into cpufunc.h and proceed without causing some
kind of fault on an older CPU.

What are my options on a 586-class CPU for forcing RDTSC to complete
before later instructions?



Re: [please test] pvclock(4): fix several bugs

2022-09-03 Thread Scott Cheloha
> On Sep 3, 2022, at 07:37, Jonathan Gray  wrote:
> 
> On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote:
>>>> On Sep 3, 2022, at 02:22, Jonathan Gray  wrote:
>>> 
>>> On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote:
>>>> dv@ suggested coming to the list to request testing for the pvclock(4)
>>>> driver.  Attached is a patch that corrects several bugs.  Most of
>>>> these changes will only matter in the non-TSC_STABLE case on a
>>>> multiprocessor VM.
>>>> 
>>>> Ideally, nothing should break.
>>>> 
>>>> - pvclock yields a 64-bit value.  The BSD timecounter layer can only
>>>> use the lower 32 bits, but internally we need to track the full
>>>> 64-bit value to allow comparisons with the full value in the
>>>> non-TSC_STABLE case.  So make pvclock_lastcount a 64-bit quantity.
>>>> 
>>>> - In pvclock_get_timecount(), move rdtsc() up into the lockless read
>>>> loop to get a more accurate timestamp.
>>>> 
>>>> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc().
>>>> 
>>>> - In pvclock_get_timecount(), check that our TSC value doesn't predate
>>>> ti->ti_tsc_timestamp, otherwise we will produce an enormous value.
>>>> 
>>>> - In pvclock_get_timecount(), update pvclock_lastcount in the
>>>> non-TSC_STABLE case with more care.  On amd64 we can do this with an
>>>> atomic_cas_ulong(9) loop because u_long is 64 bits.  On i386 we need
>>>> to introduce a mutex to protect our comparison and read/write.
>>> 
>>> i386 has cmpxchg8b, no need to disable interrupts
>>> the ifdefs seem excessive
>> 
>> How do I make use of CMPXCHG8B on i386
>> in this context?
>> 
>> atomic_cas_ulong(9) is a 32-bit CAS on
>> i386.
> 
> static inline uint64_t
> atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n)
> {
>return __sync_val_compare_and_swap(p, o, n);
> }
> 
> Or md atomic.h files could have an equivalent.
> Not possible on all 32-bit archs.

Gotcha.

>> We can't use FP registers in the kernel, no?
> 
> What do FP registers have to do with it?

I read someplace that using FP registers was
a quick and dirty way to take advantage of
the 64-bit-aligned atomic access guarantees
of the Pentium.

>> Am I missing some other avenue?
> 
> There is no rdtsc_lfence() on i386.  Initial diff doesn't build.

I will come back with a fuller patch in a bit.



Re: [please test] pvclock(4): fix several bugs

2022-09-03 Thread Jonathan Gray
On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote:
> > On Sep 3, 2022, at 02:22, Jonathan Gray  wrote:
> > 
> > On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote:
> >> dv@ suggested coming to the list to request testing for the pvclock(4)
> >> driver.  Attached is a patch that corrects several bugs.  Most of
> >> these changes will only matter in the non-TSC_STABLE case on a
> >> multiprocessor VM.
> >> 
> >> Ideally, nothing should break.
> >> 
> >> - pvclock yields a 64-bit value.  The BSD timecounter layer can only
> >>  use the lower 32 bits, but internally we need to track the full
> >>  64-bit value to allow comparisons with the full value in the
> >>  non-TSC_STABLE case.  So make pvclock_lastcount a 64-bit quantity.
> >> 
> >> - In pvclock_get_timecount(), move rdtsc() up into the lockless read
> >>  loop to get a more accurate timestamp.
> >> 
> >> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc().
> >> 
> >> - In pvclock_get_timecount(), check that our TSC value doesn't predate
> >>  ti->ti_tsc_timestamp, otherwise we will produce an enormous value.
> >> 
> >> - In pvclock_get_timecount(), update pvclock_lastcount in the
> >>  non-TSC_STABLE case with more care.  On amd64 we can do this with an
> >>  atomic_cas_ulong(9) loop because u_long is 64 bits.  On i386 we need
> >>  to introduce a mutex to protect our comparison and read/write.
> > 
> > i386 has cmpxchg8b, no need to disable interrupts
> > the ifdefs seem excessive
> 
> How do I make use of CMPXCHG8B on i386
> in this context?
> 
> atomic_cas_ulong(9) is a 32-bit CAS on
> i386.

static inline uint64_t
atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n)
{
return __sync_val_compare_and_swap(p, o, n);
}

Or md atomic.h files could have an equivalent.
Not possible on all 32-bit archs.

> 
> We can't use FP registers in the kernel, no?

What do FP registers have to do with it?

> 
> Am I missing some other avenue?

There is no rdtsc_lfence() on i386.  Initial diff doesn't build.



Re: [please test] pvclock(4): fix several bugs

2022-09-03 Thread Scott Cheloha
> On Sep 3, 2022, at 02:22, Jonathan Gray  wrote:
> 
> On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote:
>> dv@ suggested coming to the list to request testing for the pvclock(4)
>> driver.  Attached is a patch that corrects several bugs.  Most of
>> these changes will only matter in the non-TSC_STABLE case on a
>> multiprocessor VM.
>> 
>> Ideally, nothing should break.
>> 
>> - pvclock yields a 64-bit value.  The BSD timecounter layer can only
>>  use the lower 32 bits, but internally we need to track the full
>>  64-bit value to allow comparisons with the full value in the
>>  non-TSC_STABLE case.  So make pvclock_lastcount a 64-bit quantity.
>> 
>> - In pvclock_get_timecount(), move rdtsc() up into the lockless read
>>  loop to get a more accurate timestamp.
>> 
>> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc().
>> 
>> - In pvclock_get_timecount(), check that our TSC value doesn't predate
>>  ti->ti_tsc_timestamp, otherwise we will produce an enormous value.
>> 
>> - In pvclock_get_timecount(), update pvclock_lastcount in the
>>  non-TSC_STABLE case with more care.  On amd64 we can do this with an
>>  atomic_cas_ulong(9) loop because u_long is 64 bits.  On i386 we need
>>  to introduce a mutex to protect our comparison and read/write.
> 
> i386 has cmpxchg8b, no need to disable interrupts
> the ifdefs seem excessive

How do I make use of CMPXCHG8B on i386
in this context?

atomic_cas_ulong(9) is a 32-bit CAS on
i386.

We can't use FP registers in the kernel, no?

Am I missing some other avenue?



Re: [please test] pvclock(4): fix several bugs

2022-09-03 Thread Matthieu Herrb
On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote:
> dv@ suggested coming to the list to request testing for the pvclock(4)
> driver.  Attached is a patch that corrects several bugs.  Most of
> these changes will only matter in the non-TSC_STABLE case on a
> multiprocessor VM.
> 
> Ideally, nothing should break.
>

I've put this on an OpenBSD VM hosted on a proxmox VE 7 hypervisor. It
uses pvclock as the time source both before and after the patch :

kern.timecounter.hardware=pvclock0
kern.timecounter.choice=i8254(0) pvclock0(1500) acpihpet0(1000)
acpitimer0(1000)

OpenBSD 7.2-beta (GENERIC.MP) #300: Sat Sep  3 09:34:42 CEST 2022
matth...@zuma.herrb.net:/usr/obj/GENERIC.MP
real mem = 4278005760 (4079MB)
avail mem = 4130959360 (3939MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf59a0 (10 entries)
bios0: vendor SeaBIOS version "rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org" 
date 04/01/2014
bios0: QEMU Standard PC (Q35 + ICH9, 2009)
acpi0 at bios0: ACPI 3.0
acpi0: sleep states S3 S4 S5
acpi0: tables DSDT FACP APIC SSDT HPET MCFG WAET
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Common KVM processor, 2000.60 MHz, 0f-06-01
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,CX16,x2APIC,HV,NXE,LONG,LAHF,MELTDOWN
cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 4MB 64b/line 
16-way L2 cache, 16MB 64b/line 16-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 1000MHz
ioapic0 at mainbus0: apid 0 pa 0xfec0, version 11, 24 pins
acpihpet0 at acpi0: 1 Hz
acpimcfg0 at acpi0
acpimcfg0: addr 0xb000, bus 0-255
acpiprt0 at acpi0: bus 0 (PCI0)
"ACPI0006" at acpi0 not configured
acpipci0 at acpi0 PCI0: 0x0010 0x0011 0x
acpicmos0 at acpi0
"PNP0A06" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"QEMU0002" at acpi0 not configured
"ACPI0010" at acpi0 not configured
"QEMUVGID" at acpi0 not configured
acpicpu0 at acpi0: C1(@1 halt!)
pvbus0 at mainbus0: KVM
pvclock0 at pvbus0
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 82G33 Host" rev 0x00
vga1 at pci0 dev 1 function 0 "VMware SVGA II" rev 0x00
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
uhci0 at pci0 dev 26 function 0 "Intel 82801I USB" rev 0x03: apic 0 int 16
uhci1 at pci0 dev 26 function 1 "Intel 82801I USB" rev 0x03: apic 0 int 17
uhci2 at pci0 dev 26 function 2 "Intel 82801I USB" rev 0x03: apic 0 int 18
ehci0 at pci0 dev 26 function 7 "Intel 82801I USB" rev 0x03: apic 0 int 19
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 
addr 1
azalia0 at pci0 dev 27 function 0 "Intel 82801I HD Audio" rev 0x03: msi
azalia0: no HD-Audio codecs
ppb0 at pci0 dev 28 function 0 vendor "Red Hat", unknown product 0x000c rev 
0x00: apic 0 int 16
pci1 at ppb0 bus 1
ppb1 at pci0 dev 28 function 1 vendor "Red Hat", unknown product 0x000c rev 
0x00: apic 0 int 16
pci2 at ppb1 bus 2
ppb2 at pci0 dev 28 function 2 vendor "Red Hat", unknown product 0x000c rev 
0x00: apic 0 int 16
pci3 at ppb2 bus 3
ppb3 at pci0 dev 28 function 3 vendor "Red Hat", unknown product 0x000c rev 
0x00: apic 0 int 16
pci4 at ppb3 bus 4
uhci3 at pci0 dev 29 function 0 "Intel 82801I USB" rev 0x03: apic 0 int 16
uhci4 at pci0 dev 29 function 1 "Intel 82801I USB" rev 0x03: apic 0 int 17
uhci5 at pci0 dev 29 function 2 "Intel 82801I USB" rev 0x03: apic 0 int 18
ehci1 at pci0 dev 29 function 7 "Intel 82801I USB" rev 0x03: apic 0 int 19
usb1 at ehci1: USB revision 2.0
uhub1 at usb1 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 
addr 1
ppb4 at pci0 dev 30 function 0 "Intel 82801BA Hub-to-PCI" rev 0x92
pci5 at ppb4 bus 5
ppb5 at pci5 dev 1 function 0 "Red Hat Qemu PCI-PCI" rev 0x00
pci6 at ppb5 bus 6
virtio0 at pci6 dev 3 function 0 "Qumranet Virtio Memory Balloon" rev 0x00
viomb0 at virtio0
virtio0: apic 0 int 20
virtio1 at pci6 dev 10 function 0 "Qumranet Virtio Storage" rev 0x00
vioblk0 at virtio1
scsibus1 at vioblk0: 1 targets
sd0 at scsibus1 targ 0 lun 0: 
sd0: 40960MB, 512 bytes/sector, 83886080 sectors
virtio1: msix shared
virtio2 at pci6 dev 18 function 0 "Qumranet Virtio Network" rev 0x00
vio0 at virtio2: address 82:da:9e:53:3e:6b
virtio2: msix shared
ppb6 at pci5 dev 2 function 0 "Red Hat Qemu

Re: [please test] pvclock(4): fix several bugs

2022-09-03 Thread Jonathan Gray
On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote:
> dv@ suggested coming to the list to request testing for the pvclock(4)
> driver.  Attached is a patch that corrects several bugs.  Most of
> these changes will only matter in the non-TSC_STABLE case on a
> multiprocessor VM.
> 
> Ideally, nothing should break.
> 
> - pvclock yields a 64-bit value.  The BSD timecounter layer can only
>   use the lower 32 bits, but internally we need to track the full
>   64-bit value to allow comparisons with the full value in the
>   non-TSC_STABLE case.  So make pvclock_lastcount a 64-bit quantity.
> 
> - In pvclock_get_timecount(), move rdtsc() up into the lockless read
>   loop to get a more accurate timestamp.
> 
> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc().
> 
> - In pvclock_get_timecount(), check that our TSC value doesn't predate
>   ti->ti_tsc_timestamp, otherwise we will produce an enormous value.
> 
> - In pvclock_get_timecount(), update pvclock_lastcount in the
>   non-TSC_STABLE case with more care.  On amd64 we can do this with an
>   atomic_cas_ulong(9) loop because u_long is 64 bits.  On i386 we need
>   to introduce a mutex to protect our comparison and read/write.

i386 has cmpxchg8b, no need to disable interrupts
the ifdefs seem excessive

> 
> Index: pvclock.c
> ===
> RCS file: /cvs/src/sys/dev/pv/pvclock.c,v
> retrieving revision 1.8
> diff -u -p -r1.8 pvclock.c
> --- pvclock.c 5 Nov 2021 11:38:29 -   1.8
> +++ pvclock.c 2 Sep 2022 22:54:08 -
> @@ -27,6 +27,10 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#if defined(__i386__)
> +#include 
> +#endif
>  
>  #include 
>  #include 
> @@ -35,7 +39,12 @@
>  #include 
>  #include 
>  
> -uint pvclock_lastcount;
> +#if defined(__amd64__)
> +volatile u_long pvclock_lastcount;
> +#elif defined(__i386__)
> +struct mutex pvclock_mtx = MUTEX_INITIALIZER(IPL_HIGH);
> +uint64_t pvclock_lastcount;
> +#endif
>  
>  struct pvclock_softc {
>   struct devicesc_dev;
> @@ -212,7 +221,7 @@ pvclock_get_timecount(struct timecounter
>  {
>   struct pvclock_softc*sc = tc->tc_priv;
>   struct pvclock_time_info*ti;
> - uint64_t tsc_timestamp, system_time, delta, ctr;
> + uint64_t system_time, delta, ctr, tsc;
>   uint32_t version, mul_frac;
>   int8_t   shift;
>   uint8_t  flags;
> @@ -220,8 +229,12 @@ pvclock_get_timecount(struct timecounter
>   ti = sc->sc_time;
>   do {
>   version = pvclock_read_begin(ti);
> + tsc = rdtsc_lfence();
> + if (ti->ti_tsc_timestamp < tsc)
> + delta = tsc - ti->ti_tsc_timestamp;
> + else
> + delta = 0;
>   system_time = ti->ti_system_time;
> - tsc_timestamp = ti->ti_tsc_timestamp;
>   mul_frac = ti->ti_tsc_to_system_mul;
>   shift = ti->ti_tsc_shift;
>   flags = ti->ti_flags;
> @@ -231,7 +244,6 @@ pvclock_get_timecount(struct timecounter
>* The algorithm is described in
>* linux/Documentation/virtual/kvm/msr.txt
>*/
> - delta = rdtsc() - tsc_timestamp;
>   if (shift < 0)
>   delta >>= -shift;
>   else
> @@ -241,10 +253,20 @@ pvclock_get_timecount(struct timecounter
>   if ((flags & PVCLOCK_FLAG_TSC_STABLE) != 0)
>   return (ctr);
>  
> - if (ctr < pvclock_lastcount)
> - return (pvclock_lastcount);
> -
> - atomic_swap_uint(_lastcount, ctr);
> -
> +#if defined(__amd64__)
> + u_long last;
> + do {
> + last = pvclock_lastcount;
> + if (ctr < last)
> + return last;
> + } while (atomic_cas_ulong(_lastcount, last, ctr) != last);
> +#elif defined(__i386__)
> + mtx_enter(_mtx);
> + if (pvclock_lastcount < ctr)
> + pvclock_lastcount = ctr;
> + else
> + ctr = pvclock_lastcount;
> + mtx_leave(_mtx);
> +#endif
>   return (ctr);
>  }
> 
> 



Re: [please test] pvclock(4): fix several bugs

2022-09-03 Thread Mike Larkin
On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote:
> dv@ suggested coming to the list to request testing for the pvclock(4)
> driver.  Attached is a patch that corrects several bugs.  Most of
> these changes will only matter in the non-TSC_STABLE case on a
> multiprocessor VM.
>
> Ideally, nothing should break.
>
> - pvclock yields a 64-bit value.  The BSD timecounter layer can only
>   use the lower 32 bits, but internally we need to track the full
>   64-bit value to allow comparisons with the full value in the
>   non-TSC_STABLE case.  So make pvclock_lastcount a 64-bit quantity.
>
> - In pvclock_get_timecount(), move rdtsc() up into the lockless read
>   loop to get a more accurate timestamp.
>
> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc().
>
> - In pvclock_get_timecount(), check that our TSC value doesn't predate
>   ti->ti_tsc_timestamp, otherwise we will produce an enormous value.
>
> - In pvclock_get_timecount(), update pvclock_lastcount in the
>   non-TSC_STABLE case with more care.  On amd64 we can do this with an
>   atomic_cas_ulong(9) loop because u_long is 64 bits.  On i386 we need
>   to introduce a mutex to protect our comparison and read/write.
>

I tested on an 8 core ESXi VM, nothing broke. But it doesn't even
have pvclock as a timesource, so I'm not sure the test is meaningful or
useful.

-ml

> Index: pvclock.c
> ===
> RCS file: /cvs/src/sys/dev/pv/pvclock.c,v
> retrieving revision 1.8
> diff -u -p -r1.8 pvclock.c
> --- pvclock.c 5 Nov 2021 11:38:29 -   1.8
> +++ pvclock.c 2 Sep 2022 22:54:08 -
> @@ -27,6 +27,10 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#if defined(__i386__)
> +#include 
> +#endif
>
>  #include 
>  #include 
> @@ -35,7 +39,12 @@
>  #include 
>  #include 
>
> -uint pvclock_lastcount;
> +#if defined(__amd64__)
> +volatile u_long pvclock_lastcount;
> +#elif defined(__i386__)
> +struct mutex pvclock_mtx = MUTEX_INITIALIZER(IPL_HIGH);
> +uint64_t pvclock_lastcount;
> +#endif
>
>  struct pvclock_softc {
>   struct devicesc_dev;
> @@ -212,7 +221,7 @@ pvclock_get_timecount(struct timecounter
>  {
>   struct pvclock_softc*sc = tc->tc_priv;
>   struct pvclock_time_info*ti;
> - uint64_t tsc_timestamp, system_time, delta, ctr;
> + uint64_t system_time, delta, ctr, tsc;
>   uint32_t version, mul_frac;
>   int8_t   shift;
>   uint8_t  flags;
> @@ -220,8 +229,12 @@ pvclock_get_timecount(struct timecounter
>   ti = sc->sc_time;
>   do {
>   version = pvclock_read_begin(ti);
> + tsc = rdtsc_lfence();
> + if (ti->ti_tsc_timestamp < tsc)
> + delta = tsc - ti->ti_tsc_timestamp;
> + else
> + delta = 0;
>   system_time = ti->ti_system_time;
> - tsc_timestamp = ti->ti_tsc_timestamp;
>   mul_frac = ti->ti_tsc_to_system_mul;
>   shift = ti->ti_tsc_shift;
>   flags = ti->ti_flags;
> @@ -231,7 +244,6 @@ pvclock_get_timecount(struct timecounter
>* The algorithm is described in
>* linux/Documentation/virtual/kvm/msr.txt
>*/
> - delta = rdtsc() - tsc_timestamp;
>   if (shift < 0)
>   delta >>= -shift;
>   else
> @@ -241,10 +253,20 @@ pvclock_get_timecount(struct timecounter
>   if ((flags & PVCLOCK_FLAG_TSC_STABLE) != 0)
>   return (ctr);
>
> - if (ctr < pvclock_lastcount)
> - return (pvclock_lastcount);
> -
> - atomic_swap_uint(_lastcount, ctr);
> -
> +#if defined(__amd64__)
> + u_long last;
> + do {
> + last = pvclock_lastcount;
> + if (ctr < last)
> + return last;
> + } while (atomic_cas_ulong(_lastcount, last, ctr) != last);
> +#elif defined(__i386__)
> + mtx_enter(_mtx);
> + if (pvclock_lastcount < ctr)
> + pvclock_lastcount = ctr;
> + else
> + ctr = pvclock_lastcount;
> + mtx_leave(_mtx);
> +#endif
>   return (ctr);
>  }



[please test] pvclock(4): fix several bugs

2022-09-02 Thread Scott Cheloha
dv@ suggested coming to the list to request testing for the pvclock(4)
driver.  Attached is a patch that corrects several bugs.  Most of
these changes will only matter in the non-TSC_STABLE case on a
multiprocessor VM.

Ideally, nothing should break.

- pvclock yields a 64-bit value.  The BSD timecounter layer can only
  use the lower 32 bits, but internally we need to track the full
  64-bit value to allow comparisons with the full value in the
  non-TSC_STABLE case.  So make pvclock_lastcount a 64-bit quantity.

- In pvclock_get_timecount(), move rdtsc() up into the lockless read
  loop to get a more accurate timestamp.

- In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc().

- In pvclock_get_timecount(), check that our TSC value doesn't predate
  ti->ti_tsc_timestamp, otherwise we will produce an enormous value.

- In pvclock_get_timecount(), update pvclock_lastcount in the
  non-TSC_STABLE case with more care.  On amd64 we can do this with an
  atomic_cas_ulong(9) loop because u_long is 64 bits.  On i386 we need
  to introduce a mutex to protect our comparison and read/write.

Index: pvclock.c
===
RCS file: /cvs/src/sys/dev/pv/pvclock.c,v
retrieving revision 1.8
diff -u -p -r1.8 pvclock.c
--- pvclock.c   5 Nov 2021 11:38:29 -   1.8
+++ pvclock.c   2 Sep 2022 22:54:08 -
@@ -27,6 +27,10 @@
 #include 
 #include 
 #include 
+#include 
+#if defined(__i386__)
+#include 
+#endif
 
 #include 
 #include 
@@ -35,7 +39,12 @@
 #include 
 #include 
 
-uint pvclock_lastcount;
+#if defined(__amd64__)
+volatile u_long pvclock_lastcount;
+#elif defined(__i386__)
+struct mutex pvclock_mtx = MUTEX_INITIALIZER(IPL_HIGH);
+uint64_t pvclock_lastcount;
+#endif
 
 struct pvclock_softc {
struct devicesc_dev;
@@ -212,7 +221,7 @@ pvclock_get_timecount(struct timecounter
 {
struct pvclock_softc*sc = tc->tc_priv;
struct pvclock_time_info*ti;
-   uint64_t tsc_timestamp, system_time, delta, ctr;
+   uint64_t system_time, delta, ctr, tsc;
uint32_t version, mul_frac;
int8_t   shift;
uint8_t  flags;
@@ -220,8 +229,12 @@ pvclock_get_timecount(struct timecounter
ti = sc->sc_time;
do {
version = pvclock_read_begin(ti);
+   tsc = rdtsc_lfence();
+   if (ti->ti_tsc_timestamp < tsc)
+   delta = tsc - ti->ti_tsc_timestamp;
+   else
+   delta = 0;
system_time = ti->ti_system_time;
-   tsc_timestamp = ti->ti_tsc_timestamp;
mul_frac = ti->ti_tsc_to_system_mul;
shift = ti->ti_tsc_shift;
flags = ti->ti_flags;
@@ -231,7 +244,6 @@ pvclock_get_timecount(struct timecounter
 * The algorithm is described in
 * linux/Documentation/virtual/kvm/msr.txt
 */
-   delta = rdtsc() - tsc_timestamp;
if (shift < 0)
delta >>= -shift;
else
@@ -241,10 +253,20 @@ pvclock_get_timecount(struct timecounter
if ((flags & PVCLOCK_FLAG_TSC_STABLE) != 0)
return (ctr);
 
-   if (ctr < pvclock_lastcount)
-   return (pvclock_lastcount);
-
-   atomic_swap_uint(_lastcount, ctr);
-
+#if defined(__amd64__)
+   u_long last;
+   do {
+   last = pvclock_lastcount;
+   if (ctr < last)
+   return last;
+   } while (atomic_cas_ulong(_lastcount, last, ctr) != last);
+#elif defined(__i386__)
+   mtx_enter(_mtx);
+   if (pvclock_lastcount < ctr)
+   pvclock_lastcount = ctr;
+   else
+   ctr = pvclock_lastcount;
+   mtx_leave(_mtx);
+#endif
return (ctr);
 }



Re: pvclock(4)

2018-12-05 Thread Peter Hessler
On 2018 Dec 04 (Tue) at 15:14:51 +0100 (+0100), Reyk Floeter wrote:
:On Tue, Dec 04, 2018 at 05:43:48AM -0800, Chris Cappuccio wrote:
:> Of course printf instead of panic for testers 
:> 
:
:Oh, right, thanks!
:
:@john:  Does this "slightly less simple" diff work for you?
:
:@phessler, Chris: Maybe we should get this fix tested and in, wait for
:reports, and I can use the time to think about my other option.  What
:do you think?
:

Yea, I think this diff would help avoid panics for systems where that
bit is never set, and should go in.

OK


:Reyk
:
:Index: sys/dev/pv/pvclock.c
:===
:RCS file: /cvs/src/sys/dev/pv/pvclock.c,v
:retrieving revision 1.2
:diff -u -p -u -p -r1.2 pvclock.c
:--- sys/dev/pv/pvclock.c   24 Nov 2018 13:12:29 -  1.2
:+++ sys/dev/pv/pvclock.c   4 Dec 2018 14:03:57 -
:@@ -70,6 +70,11 @@ uint pvclock_get_timecount(struct timec
: void   pvclock_read_time_info(struct pvclock_softc *,
:   struct pvclock_time_info *);
: 
:+static inline uint32_t
:+   pvclock_read_begin(const struct pvclock_time_info *);
:+static inline int
:+   pvclock_read_done(const struct pvclock_time_info *, uint32_t);
:+
: struct cfattach pvclock_ca = {
:   sizeof(struct pvclock_softc),
:   pvclock_match,
:@@ -127,8 +132,11 @@ pvclock_match(struct device *parent, voi
: void
: pvclock_attach(struct device *parent, struct device *self, void *aux)
: {
:-  struct pvclock_softc*sc = (struct pvclock_softc *)self;
:-  paddr_t  pa;
:+  struct pvclock_softc*sc = (struct pvclock_softc *)self;
:+  struct pvclock_time_info*ti;
:+  paddr_t  pa;
:+  uint32_t version;
:+  uint8_t  flags;
: 
:   if ((sc->sc_time = km_alloc(PAGE_SIZE,
:   _any, _zero, _nowait)) == NULL) {
:@@ -143,6 +151,19 @@ pvclock_attach(struct device *parent, st
: 
:   wrmsr(KVM_MSR_SYSTEM_TIME, pa | PVCLOCK_SYSTEM_TIME_ENABLE);
:   sc->sc_paddr = pa;
:+
:+  ti = sc->sc_time;
:+  do {
:+  version = pvclock_read_begin(ti);
:+  flags = ti->ti_flags;
:+  } while (!pvclock_read_done(ti, version));
:+
:+  if ((flags & PVCLOCK_FLAG_TSC_STABLE) == 0) {
:+  wrmsr(KVM_MSR_SYSTEM_TIME, pa & ~PVCLOCK_SYSTEM_TIME_ENABLE);
:+  km_free(sc->sc_time, PAGE_SIZE, _any, _zero);
:+  printf(": unstable clock\n");
:+  return;
:+  }
: 
:   sc->sc_tc = _timecounter;
:   sc->sc_tc->tc_name = DEVNAME(sc);
:

-- 
Reality is an obstacle to hallucination.



Re: pvclock(4)

2018-12-04 Thread johnw

Reyk Floeter 於 2018-12-04 22:14 寫到:

On Tue, Dec 04, 2018 at 05:43:48AM -0800, Chris Cappuccio wrote:

Of course printf instead of panic for testers



Oh, right, thanks!

@john:  Does this "slightly less simple" diff work for you?


Hi Reyk,

Yes, your patch also work for me, I can boot without manually disable 
pvclock.

Attached dmesg, thanks.



--
Key fingerprint: CDB3 6C62 254B C088 1E5D DD32 182C 97DB CF2C 80AC
OpenBSD 6.4-current (GENERIC.MP) #490: Thu Nov 29 16:00:20 MST 2018
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 788389888 (751MB)
avail mem = 755261440 (720MB)
User Kernel Config
UKC> disable 59
 59 pvclock0 disabled
UKC> quit
Continuing...
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf5b80 (10 entries)
bios0: vendor SeaBIOS version "1.11.1-1" date 04/01/2014
bios0: QEMU Standard PC (i440FX + PIIX, 1996)
acpi0 at bios0: rev 0
acpi0: sleep states S3 S4 S5
acpi0: tables DSDT FACP APIC HPET
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz, 118.79 MHz, 06-17-0a
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,SSSE3,CX16,SSE4.1,x2APIC,DEADLINE,XSAVE,HV,NXE,LONG,LAHF,PERF,ARAT,MELTDOWN
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 1039MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz, 322.28 MHz, 06-17-0a
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,SSSE3,CX16,SSE4.1,x2APIC,DEADLINE,XSAVE,HV,NXE,LONG,LAHF,PERF,ARAT,MELTDOWN
cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu1: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu1: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu1: smt 0, core 0, package 1
ioapic0 at mainbus0: apid 0 pa 0xfec0, version 20, 24 pins
acpihpet0 at acpi0: 1 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpicpu0 at acpi0: C1(@1 halt!)
acpicpu1 at acpi0: C1(@1 halt!)
"ACPI0006" at acpi0 not configured
acpipci0 at acpi0 PCI0: _OSC failed
acpicmos0 at acpi0
"PNP0A06" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"QEMU0002" at acpi0 not configured
"ACPI0010" at acpi0 not configured
pvbus0 at mainbus0: KVM
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02
pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00
pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel 0 
wired to compatibility, channel 1 wired to compatibility
pciide0: channel 0 disabled (no drives)
atapiscsi0 at pciide0 channel 1 drive 0
scsibus1 at atapiscsi0: 2 targets
cd0 at scsibus1 targ 0 lun 0:  ATAPI 5/cdrom removable
cd0(pciide0:1:0): using PIO mode 4, DMA mode 2
piixpm0 at pci0 dev 1 function 3 "Intel 82371AB Power" rev 0x03: apic 0 int 9
iic0 at piixpm0
vga1 at pci0 dev 2 function 0 "Bochs VGA" rev 0x02
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
virtio0 at pci0 dev 3 function 0 "Qumranet Virtio Network" rev 0x00
vio0 at virtio0: address aa:aa:aa:aa:aa:01
virtio0: msix shared
virtio1 at pci0 dev 4 function 0 "Qumranet Virtio Storage" rev 0x00
vioblk0 at virtio1
scsibus2 at vioblk0: 2 targets
sd0 at scsibus2 targ 0 lun 0:  SCSI3 0/direct fixed
sd0: 20480MB, 512 bytes/sector, 41943040 sectors
virtio1: msix shared
isa0 at pcib0
isadma0 at isa0
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
lpt0 at isa0 port 0x378/4 irq 7
vscsi0 at root
scsibus3 at vscsi0: 256 targets
softraid0 at root
scsibus4 at softraid0: 256 targets
sd1 at scsibus4 targ 1 lun 0:  SCSI2 0/direct fixed
sd1: 20473MB, 512 bytes/sector, 41929058 sectors
root on sd1a (fa1d7de7776af250.a) swap on sd1b dump on sd1b
fd0 at fdc0 drive 1: density unknown
syncing disks... done
rebooting...
OpenBSD 6.4-current (GENERIC.MP) #490: Thu Nov 29 16:00:20 MST 2018
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 788389888 (751MB)
avail mem = 755261440 (720MB)
User Kernel Config
UKC> disable 59
 59 pvclock0 disabled
UKC> quit
Continuing...
mpath0 at root
scsibus0 at mpath0: 256 

Re: pvclock(4)

2018-12-04 Thread Reyk Floeter
On Tue, Dec 04, 2018 at 05:43:48AM -0800, Chris Cappuccio wrote:
> Of course printf instead of panic for testers 
> 

Oh, right, thanks!

@john:  Does this "slightly less simple" diff work for you?

@phessler, Chris: Maybe we should get this fix tested and in, wait for
reports, and I can use the time to think about my other option.  What
do you think?

Reyk

Index: sys/dev/pv/pvclock.c
===
RCS file: /cvs/src/sys/dev/pv/pvclock.c,v
retrieving revision 1.2
diff -u -p -u -p -r1.2 pvclock.c
--- sys/dev/pv/pvclock.c24 Nov 2018 13:12:29 -  1.2
+++ sys/dev/pv/pvclock.c4 Dec 2018 14:03:57 -
@@ -70,6 +70,11 @@ uint  pvclock_get_timecount(struct timec
 voidpvclock_read_time_info(struct pvclock_softc *,
struct pvclock_time_info *);
 
+static inline uint32_t
+pvclock_read_begin(const struct pvclock_time_info *);
+static inline int
+pvclock_read_done(const struct pvclock_time_info *, uint32_t);
+
 struct cfattach pvclock_ca = {
sizeof(struct pvclock_softc),
pvclock_match,
@@ -127,8 +132,11 @@ pvclock_match(struct device *parent, voi
 void
 pvclock_attach(struct device *parent, struct device *self, void *aux)
 {
-   struct pvclock_softc*sc = (struct pvclock_softc *)self;
-   paddr_t  pa;
+   struct pvclock_softc*sc = (struct pvclock_softc *)self;
+   struct pvclock_time_info*ti;
+   paddr_t  pa;
+   uint32_t version;
+   uint8_t  flags;
 
if ((sc->sc_time = km_alloc(PAGE_SIZE,
_any, _zero, _nowait)) == NULL) {
@@ -143,6 +151,19 @@ pvclock_attach(struct device *parent, st
 
wrmsr(KVM_MSR_SYSTEM_TIME, pa | PVCLOCK_SYSTEM_TIME_ENABLE);
sc->sc_paddr = pa;
+
+   ti = sc->sc_time;
+   do {
+   version = pvclock_read_begin(ti);
+   flags = ti->ti_flags;
+   } while (!pvclock_read_done(ti, version));
+
+   if ((flags & PVCLOCK_FLAG_TSC_STABLE) == 0) {
+   wrmsr(KVM_MSR_SYSTEM_TIME, pa & ~PVCLOCK_SYSTEM_TIME_ENABLE);
+   km_free(sc->sc_time, PAGE_SIZE, _any, _zero);
+   printf(": unstable clock\n");
+   return;
+   }
 
sc->sc_tc = _timecounter;
sc->sc_tc->tc_name = DEVNAME(sc);



Re: pvclock(4)

2018-12-04 Thread Reyk Floeter
On Tue, Dec 04, 2018 at 12:46:06PM +0100, Peter Hessler wrote:
> On 2018 Dec 03 (Mon) at 16:56:10 -0800 (-0800), Chris Cappuccio wrote:
> :Reyk Floeter [r...@openbsd.org] wrote:
> :>
> :> Yes, KVM???s stable bit is not a reliable indication as it is seems to 
> depend on the capabilities of the KVM version and not the actual availability 
> of the feature on the particular hardware. How annoying.
> :>
> :> As mentioned before: I???d like to disable pvclock for now and I can do 
> that in the morning CET if nobody beats me to it.
> :>
> :> I have an idea how to deal with old platforms afterwards but this needs 
> some more tests and thoughts.
> :>
> :
> :Perhaps the solution is as "simple" as checking the status of the bit
> :after the presence of the bit is established ?
> :
> 
> This makes sense, OK
> 

See my other mail, this diff is not OK.

Reyk

> 
> 
> 
> :Index: pvclock.c
> :===
> :RCS file: /cvs/src/sys/dev/pv/pvclock.c,v
> :retrieving revision 1.2
> :diff -u -p -u -r1.2 pvclock.c
> :--- pvclock.c24 Nov 2018 13:12:29 -  1.2
> :+++ pvclock.c4 Dec 2018 00:53:56 -
> :@@ -127,8 +127,10 @@ pvclock_match(struct device *parent, voi
> : void
> : pvclock_attach(struct device *parent, struct device *self, void *aux)
> : {
> :-struct pvclock_softc*sc = (struct pvclock_softc *)self;
> :-paddr_t  pa;
> :+struct pvclock_softc*sc = (struct pvclock_softc *)self;
> :+struct pvclock_time_info*ti;
> :+paddr_t  pa;
> :+uint8_t  flags;
> : 
> : if ((sc->sc_time = km_alloc(PAGE_SIZE,
> : _any, _zero, _nowait)) == NULL) {
> :@@ -151,6 +153,13 @@ pvclock_attach(struct device *parent, st
> : 
> : /* Better than HPET but below TSC */
> : sc->sc_tc->tc_quality = 1500;
> :+
> :+ti = sc->sc_time;
> :+flags = ti->ti_flags;
> :+if ((flags & PVCLOCK_FLAG_TSC_STABLE) == 0) {
> :+printf(": unstable timestamp counter\n");
> :+return;
> :+}
> : 
> : tc_init(sc->sc_tc);
> : 
> :
> 
> -- 
> fortune -as



Re: pvclock(4)

2018-12-04 Thread Reyk Floeter
On Mon, Dec 03, 2018 at 04:56:10PM -0800, Chris Cappuccio wrote:
> Reyk Floeter [r...@openbsd.org] wrote:
> >
> > Yes, KVM???s stable bit is not a reliable indication as it is seems to 
> > depend on the capabilities of the KVM version and not the actual 
> > availability of the feature on the particular hardware. How annoying.
> >
> > As mentioned before: I???d like to disable pvclock for now and I can do 
> > that in the morning CET if nobody beats me to it.
> >
> > I have an idea how to deal with old platforms afterwards but this needs 
> > some more tests and thoughts.
> >
> 
> Perhaps the solution is as "simple" as checking the status of the bit
> after the presence of the bit is established ?
> 

The approach makes sense but it is not the right way to read the
value.  It only works if you don't hit a refresh cycle of the host.
Better diff below.

But I'm not sure if this is enough and if the "stable" flag mit appear
occasionally.  So I'd like to disable pvclock for now until we did
some more testing and tried different options.

Reyk

Index: sys/dev/pv/pvclock.c
===
RCS file: /cvs/src/sys/dev/pv/pvclock.c,v
retrieving revision 1.2
diff -u -p -u -p -r1.2 pvclock.c
--- sys/dev/pv/pvclock.c24 Nov 2018 13:12:29 -  1.2
+++ sys/dev/pv/pvclock.c4 Dec 2018 12:09:06 -
@@ -70,6 +70,11 @@ uint  pvclock_get_timecount(struct timec
 voidpvclock_read_time_info(struct pvclock_softc *,
struct pvclock_time_info *);
 
+static inline uint32_t
+pvclock_read_begin(const struct pvclock_time_info *);
+static inline int
+pvclock_read_done(const struct pvclock_time_info *, uint32_t);
+
 struct cfattach pvclock_ca = {
sizeof(struct pvclock_softc),
pvclock_match,
@@ -127,8 +132,11 @@ pvclock_match(struct device *parent, voi
 void
 pvclock_attach(struct device *parent, struct device *self, void *aux)
 {
-   struct pvclock_softc*sc = (struct pvclock_softc *)self;
-   paddr_t  pa;
+   struct pvclock_softc*sc = (struct pvclock_softc *)self;
+   struct pvclock_time_info*ti;
+   paddr_t  pa;
+   uint32_t version;
+   uint8_t  flags;
 
if ((sc->sc_time = km_alloc(PAGE_SIZE,
_any, _zero, _nowait)) == NULL) {
@@ -143,6 +151,19 @@ pvclock_attach(struct device *parent, st
 
wrmsr(KVM_MSR_SYSTEM_TIME, pa | PVCLOCK_SYSTEM_TIME_ENABLE);
sc->sc_paddr = pa;
+
+   ti = sc->sc_time;
+   do {
+   version = pvclock_read_begin(ti);
+   flags = ti->ti_flags;
+   } while (!pvclock_read_done(ti, version));
+
+   if ((flags & PVCLOCK_FLAG_TSC_STABLE) == 0) {
+   wrmsr(KVM_MSR_SYSTEM_TIME, pa & ~PVCLOCK_SYSTEM_TIME_ENABLE);
+   km_free(sc->sc_time, PAGE_SIZE, _any, _zero);
+   panic(": unstable clock\n");
+   return;
+   }
 
sc->sc_tc = _timecounter;
sc->sc_tc->tc_name = DEVNAME(sc);



Re: pvclock(4)

2018-12-04 Thread Peter Hessler
On 2018 Dec 03 (Mon) at 16:56:10 -0800 (-0800), Chris Cappuccio wrote:
:Reyk Floeter [r...@openbsd.org] wrote:
:>
:> Yes, KVM???s stable bit is not a reliable indication as it is seems to 
depend on the capabilities of the KVM version and not the actual availability 
of the feature on the particular hardware. How annoying.
:>
:> As mentioned before: I???d like to disable pvclock for now and I can do that 
in the morning CET if nobody beats me to it.
:>
:> I have an idea how to deal with old platforms afterwards but this needs some 
more tests and thoughts.
:>
:
:Perhaps the solution is as "simple" as checking the status of the bit
:after the presence of the bit is established ?
:

This makes sense, OK




:Index: pvclock.c
:===
:RCS file: /cvs/src/sys/dev/pv/pvclock.c,v
:retrieving revision 1.2
:diff -u -p -u -r1.2 pvclock.c
:--- pvclock.c  24 Nov 2018 13:12:29 -  1.2
:+++ pvclock.c  4 Dec 2018 00:53:56 -
:@@ -127,8 +127,10 @@ pvclock_match(struct device *parent, voi
: void
: pvclock_attach(struct device *parent, struct device *self, void *aux)
: {
:-  struct pvclock_softc*sc = (struct pvclock_softc *)self;
:-  paddr_t  pa;
:+  struct pvclock_softc*sc = (struct pvclock_softc *)self;
:+  struct pvclock_time_info*ti;
:+  paddr_t  pa;
:+  uint8_t  flags;
: 
:   if ((sc->sc_time = km_alloc(PAGE_SIZE,
:   _any, _zero, _nowait)) == NULL) {
:@@ -151,6 +153,13 @@ pvclock_attach(struct device *parent, st
: 
:   /* Better than HPET but below TSC */
:   sc->sc_tc->tc_quality = 1500;
:+
:+  ti = sc->sc_time;
:+  flags = ti->ti_flags;
:+  if ((flags & PVCLOCK_FLAG_TSC_STABLE) == 0) {
:+  printf(": unstable timestamp counter\n");
:+  return;
:+  }
: 
:   tc_init(sc->sc_tc);
: 
:

-- 
fortune -as



Re: pvclock(4)

2018-12-03 Thread johnw

Chris Cappuccio 於 2018-12-04 08:56 寫到:

Reyk Floeter [r...@openbsd.org] wrote:


Yes, KVM???s stable bit is not a reliable indication as it is seems to 
depend on the capabilities of the KVM version and not the actual 
availability of the feature on the particular hardware. How annoying.


As mentioned before: I???d like to disable pvclock for now and I can 
do that in the morning CET if nobody beats me to it.


I have an idea how to deal with old platforms afterwards but this 
needs some more tests and thoughts.




Perhaps the solution is as "simple" as checking the status of the bit
after the presence of the bit is established ?

Index: pvclock.c
===
RCS file: /cvs/src/sys/dev/pv/pvclock.c,v
retrieving revision 1.2
diff -u -p -u -r1.2 pvclock.c
--- pvclock.c   24 Nov 2018 13:12:29 -  1.2
+++ pvclock.c   4 Dec 2018 00:53:56 -
@@ -127,8 +127,10 @@ pvclock_match(struct device *parent, voi
 void
 pvclock_attach(struct device *parent, struct device *self, void *aux)
 {
-   struct pvclock_softc*sc = (struct pvclock_softc *)self;
-   paddr_t  pa;
+   struct pvclock_softc*sc = (struct pvclock_softc *)self;
+   struct pvclock_time_info*ti;
+   paddr_t  pa;
+   uint8_t  flags;

if ((sc->sc_time = km_alloc(PAGE_SIZE,
_any, _zero, _nowait)) == NULL) {
@@ -151,6 +153,13 @@ pvclock_attach(struct device *parent, st

/* Better than HPET but below TSC */
sc->sc_tc->tc_quality = 1500;
+
+   ti = sc->sc_time;
+   flags = ti->ti_flags;
+   if ((flags & PVCLOCK_FLAG_TSC_STABLE) == 0) {
+   printf(": unstable timestamp counter\n");
+   return;
+   }

tc_init(sc->sc_tc);


Hi, your patch work for me, the system boot without manually disable 
pvclock, thanks.


Attached dmesg.

--
Key fingerprint: CDB3 6C62 254B C088 1E5D DD32 182C 97DB CF2C 80AC
OpenBSD 6.4-current (GENERIC.MP) #490: Thu Nov 29 16:00:20 MST 2018
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 788389888 (751MB)
avail mem = 755261440 (720MB)
User Kernel Config
UKC> disable 59
 59 pvclock0 disabled
UKC> quit
Continuing...
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf5b80 (10 entries)
bios0: vendor SeaBIOS version "1.11.1-1" date 04/01/2014
bios0: QEMU Standard PC (i440FX + PIIX, 1996)
acpi0 at bios0: rev 0
acpi0: sleep states S3 S4 S5
acpi0: tables DSDT FACP APIC HPET
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz, 118.79 MHz, 06-17-0a
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,SSSE3,CX16,SSE4.1,x2APIC,DEADLINE,XSAVE,HV,NXE,LONG,LAHF,PERF,ARAT,MELTDOWN
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 1039MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz, 322.28 MHz, 06-17-0a
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,SSSE3,CX16,SSE4.1,x2APIC,DEADLINE,XSAVE,HV,NXE,LONG,LAHF,PERF,ARAT,MELTDOWN
cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu1: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu1: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu1: smt 0, core 0, package 1
ioapic0 at mainbus0: apid 0 pa 0xfec0, version 20, 24 pins
acpihpet0 at acpi0: 1 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpicpu0 at acpi0: C1(@1 halt!)
acpicpu1 at acpi0: C1(@1 halt!)
"ACPI0006" at acpi0 not configured
acpipci0 at acpi0 PCI0: _OSC failed
acpicmos0 at acpi0
"PNP0A06" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"QEMU0002" at acpi0 not configured
"ACPI0010" at acpi0 not configured
pvbus0 at mainbus0: KVM
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02
pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00
pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel 0 
wired to compatibility, channel 1 wired to compatibility
pciide0: channel 0 disabled (no drives)
atapiscsi0 at pciide0 channel 1 drive 0
scsibus1 at atapiscsi0: 2 targets
cd0 at scsibus1 targ 0 lun 0:  ATAPI 5/cdrom removable
cd0(pciide0:1:0): using PIO mode 4, DMA mode 2
piixpm0 at pci0 dev 1 function 3 "Intel 82371AB 

Re: pvclock(4)

2018-12-03 Thread Chris Cappuccio
Reyk Floeter [r...@openbsd.org] wrote:
>
> Yes, KVM???s stable bit is not a reliable indication as it is seems to depend 
> on the capabilities of the KVM version and not the actual availability of the 
> feature on the particular hardware. How annoying.
>
> As mentioned before: I???d like to disable pvclock for now and I can do that 
> in the morning CET if nobody beats me to it.
>
> I have an idea how to deal with old platforms afterwards but this needs some 
> more tests and thoughts.
>

Perhaps the solution is as "simple" as checking the status of the bit
after the presence of the bit is established ?

Index: pvclock.c
===
RCS file: /cvs/src/sys/dev/pv/pvclock.c,v
retrieving revision 1.2
diff -u -p -u -r1.2 pvclock.c
--- pvclock.c   24 Nov 2018 13:12:29 -  1.2
+++ pvclock.c   4 Dec 2018 00:53:56 -
@@ -127,8 +127,10 @@ pvclock_match(struct device *parent, voi
 void
 pvclock_attach(struct device *parent, struct device *self, void *aux)
 {
-   struct pvclock_softc*sc = (struct pvclock_softc *)self;
-   paddr_t  pa;
+   struct pvclock_softc*sc = (struct pvclock_softc *)self;
+   struct pvclock_time_info*ti;
+   paddr_t  pa;
+   uint8_t  flags;
 
if ((sc->sc_time = km_alloc(PAGE_SIZE,
_any, _zero, _nowait)) == NULL) {
@@ -151,6 +153,13 @@ pvclock_attach(struct device *parent, st
 
/* Better than HPET but below TSC */
sc->sc_tc->tc_quality = 1500;
+
+   ti = sc->sc_time;
+   flags = ti->ti_flags;
+   if ((flags & PVCLOCK_FLAG_TSC_STABLE) == 0) {
+   printf(": unstable timestamp counter\n");
+   return;
+   }
 
tc_init(sc->sc_tc);
 



Re: pvclock(4)

2018-12-03 Thread Reyk Floeter


> Am 04.12.2018 um 00:52 schrieb Chris Cappuccio :
> 
> johnw [johnw.m...@gmail.com] wrote:
>> 
>> Hi, after disable pvclock, it can boot with new kernel again, thanks.
> ...
>> cpu0: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz, 105.29 MHz, 06-17-0a
>> cpu0: 
>> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,SSSE3,CX16,SSE4.1,x2APIC,DEADLINE,XSAVE,HV,NXE,LONG,LAHF,PERF,ARAT,MELTDOWN
> ...
> 
> This CPU clearly doesn't have the invariant TSC (it is required according
> to Reyk Floeter's tech@ posting from Nov 19th.) So, pvclock does not handle
> this situation properly, apparently checking for KVM_FEATURE_CLOCKSOURCE2
> and KVM_FEATURE_CLOCKSOURCE_STABLE_BIT is not enough. 

Yes, KVM’s stable bit is not a reliable indication as it is seems to depend on 
the capabilities of the KVM version and not the actual availability of the 
feature on the particular hardware. How annoying.

As mentioned before: I‘d like to disable pvclock for now and I can do that in 
the morning CET if nobody beats me to it.

I have an idea how to deal with old platforms afterwards but this needs some 
more tests and thoughts.

Reyk



Re: pvclock(4)

2018-12-03 Thread Chris Cappuccio
johnw [johnw.m...@gmail.com] wrote:
> 
> Hi, after disable pvclock, it can boot with new kernel again, thanks.
...
> cpu0: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz, 105.29 MHz, 06-17-0a
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,SSSE3,CX16,SSE4.1,x2APIC,DEADLINE,XSAVE,HV,NXE,LONG,LAHF,PERF,ARAT,MELTDOWN
...

This CPU clearly doesn't have the invariant TSC (it is required according
to Reyk Floeter's tech@ posting from Nov 19th.) So, pvclock does not handle
this situation properly, apparently checking for KVM_FEATURE_CLOCKSOURCE2
and KVM_FEATURE_CLOCKSOURCE_STABLE_BIT is not enough. 

Chris



Re: pvclock(4)

2018-11-30 Thread johnw

johnw 於 2018-11-29 17:38 寫到:

Reyk Floeter 於 2018-11-29 14:09 寫到:


Do you have a full dmesg for me?


Yes, attached dmesg.log, thanks.


Hi, after disable pvclock, it can boot with new kernel again, thanks.



--
Key fingerprint: CDB3 6C62 254B C088 1E5D DD32 182C 97DB CF2C 80AC
OpenBSD 6.4-current (GENERIC.MP) #447: Sun Nov 18 17:25:58 MST 2018
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 519954432 (495MB)
avail mem = 494972928 (472MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf5b80 (10 entries)
bios0: vendor SeaBIOS version "1.11.1-1" date 04/01/2014
bios0: QEMU Standard PC (i440FX + PIIX, 1996)
acpi0 at bios0: rev 0
acpi0: sleep states S3 S4 S5
acpi0: tables DSDT FACP APIC HPET
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz, 105.29 MHz, 06-17-0a
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,SSSE3,CX16,SSE4.1,x2APIC,DEADLINE,XSAVE,HV,NXE,LONG,LAHF,PERF,ARAT,MELTDOWN
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 1009MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz, 165.45 MHz, 06-17-0a
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,SSSE3,CX16,SSE4.1,x2APIC,DEADLINE,XSAVE,HV,NXE,LONG,LAHF,PERF,ARAT,MELTDOWN
cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu1: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu1: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu1: smt 0, core 0, package 1
ioapic0 at mainbus0: apid 0 pa 0xfec0, version 20, 24 pins
acpihpet0 at acpi0: 1 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpicpu0 at acpi0: C1(@1 halt!)
acpicpu1 at acpi0: C1(@1 halt!)
"ACPI0006" at acpi0 not configured
acpipci0 at acpi0 PCI0: _OSC failed
acpicmos0 at acpi0
"PNP0A06" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"QEMU0002" at acpi0 not configured
"ACPI0010" at acpi0 not configured
pvbus0 at mainbus0: KVM
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02
pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00
pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel 0 
wired to compatibility, channel 1 wired to compatibility
pciide0: channel 0 disabled (no drives)
atapiscsi0 at pciide0 channel 1 drive 0
scsibus1 at atapiscsi0: 2 targets
cd0 at scsibus1 targ 0 lun 0:  ATAPI 5/cdrom removable
cd0(pciide0:1:0): using PIO mode 4, DMA mode 2
piixpm0 at pci0 dev 1 function 3 "Intel 82371AB Power" rev 0x03: apic 0 int 9
iic0 at piixpm0
vga1 at pci0 dev 2 function 0 "Bochs VGA" rev 0x02
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
virtio0 at pci0 dev 3 function 0 "Qumranet Virtio Network" rev 0x00
vio0 at virtio0: address aa:aa:aa:aa:aa:01
virtio0: msix shared
virtio1 at pci0 dev 4 function 0 "Qumranet Virtio Storage" rev 0x00
vioblk0 at virtio1
scsibus2 at vioblk0: 2 targets
sd0 at scsibus2 targ 0 lun 0:  SCSI3 0/direct fixed
sd0: 20480MB, 512 bytes/sector, 41943040 sectors
virtio1: msix shared
isa0 at pcib0
isadma0 at isa0
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
lpt0 at isa0 port 0x378/4 irq 7
vscsi0 at root
scsibus3 at vscsi0: 256 targets
softraid0 at root
scsibus4 at softraid0: 256 targets
sd1 at scsibus4 targ 1 lun 0:  SCSI2 0/direct fixed
sd1: 20473MB, 512 bytes/sector, 41929058 sectors
root on sd1a (fa1d7de7776af250.a) swap on sd1b dump on sd1b
fd0 at fdc0 drive 1: density unknown
syncing disks... done
rebooting...
OpenBSD 6.4-current (GENERIC.MP) #447: Sun Nov 18 17:25:58 MST 2018
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 519954432 (495MB)
avail mem = 494972928 (472MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf5b80 (10 entries)
bios0: vendor SeaBIOS version "1.11.1-1" date 04/01/2014
bios0: QEMU Standard PC (i440FX + PIIX, 1996)
acpi0 at bios0: rev 0
acpi0: sleep states S3 S4 S5
acpi0: tables DSDT FACP APIC HPET
acpi0: wakeup devices
acpitimer0 at 

Re: pvclock(4)

2018-11-29 Thread johnw

Reyk Floeter 於 2018-11-29 14:09 寫到:


Do you have a full dmesg for me?


Yes, attached dmesg.log, thanks.


--
Key fingerprint: CDB3 6C62 254B C088 1E5D DD32 182C 97DB CF2C 80AC
OpenBSD 6.4-current (GENERIC.MP) #447: Sun Nov 18 17:25:58 MST 2018
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 519954432 (495MB)
avail mem = 494972928 (472MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf5b80 (10 entries)
bios0: vendor SeaBIOS version "1.11.1-1" date 04/01/2014
bios0: QEMU Standard PC (i440FX + PIIX, 1996)
acpi0 at bios0: rev 0
acpi0: sleep states S3 S4 S5
acpi0: tables DSDT FACP APIC HPET
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz, 105.29 MHz, 06-17-0a
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,SSSE3,CX16,SSE4.1,x2APIC,DEADLINE,XSAVE,HV,NXE,LONG,LAHF,PERF,ARAT,MELTDOWN
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 1009MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz, 165.45 MHz, 06-17-0a
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,SSSE3,CX16,SSE4.1,x2APIC,DEADLINE,XSAVE,HV,NXE,LONG,LAHF,PERF,ARAT,MELTDOWN
cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu1: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu1: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu1: smt 0, core 0, package 1
ioapic0 at mainbus0: apid 0 pa 0xfec0, version 20, 24 pins
acpihpet0 at acpi0: 1 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpicpu0 at acpi0: C1(@1 halt!)
acpicpu1 at acpi0: C1(@1 halt!)
"ACPI0006" at acpi0 not configured
acpipci0 at acpi0 PCI0: _OSC failed
acpicmos0 at acpi0
"PNP0A06" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"QEMU0002" at acpi0 not configured
"ACPI0010" at acpi0 not configured
pvbus0 at mainbus0: KVM
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02
pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00
pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel 0 
wired to compatibility, channel 1 wired to compatibility
pciide0: channel 0 disabled (no drives)
atapiscsi0 at pciide0 channel 1 drive 0
scsibus1 at atapiscsi0: 2 targets
cd0 at scsibus1 targ 0 lun 0:  ATAPI 5/cdrom removable
cd0(pciide0:1:0): using PIO mode 4, DMA mode 2
piixpm0 at pci0 dev 1 function 3 "Intel 82371AB Power" rev 0x03: apic 0 int 9
iic0 at piixpm0
vga1 at pci0 dev 2 function 0 "Bochs VGA" rev 0x02
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
virtio0 at pci0 dev 3 function 0 "Qumranet Virtio Network" rev 0x00
vio0 at virtio0: address aa:aa:aa:aa:aa:01
virtio0: msix shared
virtio1 at pci0 dev 4 function 0 "Qumranet Virtio Storage" rev 0x00
vioblk0 at virtio1
scsibus2 at vioblk0: 2 targets
sd0 at scsibus2 targ 0 lun 0:  SCSI3 0/direct fixed
sd0: 20480MB, 512 bytes/sector, 41943040 sectors
virtio1: msix shared
isa0 at pcib0
isadma0 at isa0
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
lpt0 at isa0 port 0x378/4 irq 7
vscsi0 at root
scsibus3 at vscsi0: 256 targets
softraid0 at root
scsibus4 at softraid0: 256 targets
sd1 at scsibus4 targ 1 lun 0:  SCSI2 0/direct fixed
sd1: 20473MB, 512 bytes/sector, 41929058 sectors
root on sd1a (fa1d7de7776af250.a) swap on sd1b dump on sd1b
fd0 at fdc0 drive 1: density unknown


Re: pvclock(4)

2018-11-28 Thread Reyk Floeter
Hi,

> Am 29.11.2018 um 05:27 schrieb johnw :
> 
> 
>> So far I only got positive reports.  Where are the problems? ;)
> 
>> Otherwise: OK?
> 
>> Reyk
> 
> Hi, my kvm/quest/openbsd-amd64 can not boot, after upgrade to today current 
> (28-nov-2018).
> 

thanks for reporting.

Do you have a full dmesg for me?

Reyk

> It work before upgrade (OpenBSD 6.4-current (GENERIC.MP) #447: Sun Nov 18 
> 17:25:58 MST 2018)
> 
> Host: 4.18.0-3-amd64 #1 SMP Debian 4.18.20-2 (2018-11-23) x86_64 GNU/Linux
> qemu: QEMU emulator version 2.12.0 (Debian 1:2.12+dfsg-3+b1)
> panic photo: https://ibb.co/gjWBhL4
> panic photo: https://ibb.co/HDrJ7Q0
> panic photo: https://ibb.co/g96J5s7
> 
> Thanks.
> 
> -- 
> Key fingerprint: CDB3 6C62 254B C088 1E5D DD32 182C 97DB CF2C 80AC



Re: pvclock(4)

2018-11-28 Thread johnw




So far I only got positive reports.  Where are the problems? ;)



Otherwise: OK?



Reyk


Hi, my kvm/quest/openbsd-amd64 can not boot, after upgrade to today 
current (28-nov-2018).


It work before upgrade (OpenBSD 6.4-current (GENERIC.MP) #447: Sun Nov 
18 17:25:58 MST 2018)


Host: 4.18.0-3-amd64 #1 SMP Debian 4.18.20-2 (2018-11-23) x86_64 
GNU/Linux

qemu: QEMU emulator version 2.12.0 (Debian 1:2.12+dfsg-3+b1)
panic photo: https://ibb.co/gjWBhL4
panic photo: https://ibb.co/HDrJ7Q0
panic photo: https://ibb.co/g96J5s7

Thanks.

--
Key fingerprint: CDB3 6C62 254B C088 1E5D DD32 182C 97DB CF2C 80AC



Re: pvclock(4)

2018-11-25 Thread Reyk Floeter


> Am 25.11.2018 um 05:02 schrieb Greg Steuck :
> 
> I realize this report is practically useless, but better out than in 
> (according to Shrek).
> I found this in the logs of my GCE VM running syzkaller bot. No further 
> details were preserved...
> 
> 2018/11/24 09:53:48 ci-openbsd-main: poll: 
> 94bf4886dbb69e9fbf0f92f975fc23f16fc5c80f
> 2018/11/24 09:53:48 ci-openbsd-main: building kernel...
> 2018/11/24 09:54:03 ci-openbsd-main: testing image...
> 2018/11/24 10:04:07 ci-openbsd-main: VM boot failed with: panic: pvclock0: 
> unstable result on stable clock
> 
> The host is running
> OpenBSD 6.4-current (GENERIC.MP) #456: Tue Nov 20 08:46:59 MST 2018
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
> The VM kernel at the time was built at "zap 10 tab leading whitespace before 
> 'struct evp_pkey_ctx_st {'", so maybe "only attach pvclock(4) inside a KVM 
> guest" would've fixed it?

Yes, correct. Sorry for that glitch.

Reyk



Re: pvclock(4)

2018-11-24 Thread Greg Steuck
I realize this report is practically useless, but better out than in
(according to Shrek).
I found this in the logs of my GCE VM running syzkaller bot. No further
details were preserved...

2018/11/24 09:53:48 ci-openbsd-main: poll:
94bf4886dbb69e9fbf0f92f975fc23f16fc5c80f
2018/11/24 09:53:48 ci-openbsd-main: building kernel...
2018/11/24 09:54:03 ci-openbsd-main: testing image...
2018/11/24 10:04:07 ci-openbsd-main: VM boot failed with: panic: pvclock0:
unstable result on stable clock

The host is running
OpenBSD 6.4-current (GENERIC.MP) #456: Tue Nov 20 08:46:59 MST 2018
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

The VM kernel at the time was built at "zap 10 tab leading whitespace
before 'struct evp_pkey_ctx_st {'", so maybe "only attach pvclock(4) inside
a KVM guest" would've fixed it?

Thanks
Greg


Re: pvclock(4)

2018-11-22 Thread Landry Breuil
On Thu, Nov 22, 2018 at 05:24:10PM +0100, Landry Breuil wrote:
> On Thu, Nov 22, 2018 at 07:44:01AM -0800, Mike Larkin wrote:
> > On Thu, Nov 22, 2018 at 04:37:49PM +0100, Reyk Floeter wrote:
> > > On Mon, Nov 19, 2018 at 01:12:46PM +0100, Reyk Floeter wrote:
> > > > the attached diff is another attempt at implementing a pvclock(4)
> > > > guest driver.  This improves the clock on KVM and replaces the need
> > > > for using the VM-expensive acpihpet(4).
> > > > 
> > > 
> > > So far I only got positive reports.  Where are the problems? ;)
> > > 
> > > Otherwise: OK?
> > > 
> > > Reyk
> > > 
> > 
> > Reads ok. One question - you mention in pvclock.c that this is supported
> > on i386 and amd64 but I only see GENERIC changes for amd64?
> > 
> > ok mlarkin in any case, but I'd either add GENERIC changes for i386 or
> > make this for sure amd64 only.
> 
> I'm giving it a shot on my i386 vm.

Also seems to work fine there, i'm stressing the vm (well,
unpacking tarball/building firefox) and ntpd still manages to reduce the drift:


[17:49] c32:~/ $while sleep 10 ; do ntpctl -sa|grep clock ; date ; done 
  
4/4 peers valid, constraint offset -1s, clock unsynced, clock offset is 
404.382ms 
Thu Nov 22 17:49:49 CET 2018
4/4 peers valid, constraint offset -1s, clock unsynced, clock offset is 
354.382ms 
Thu Nov 22 17:49:59 CET 2018
4/4 peers valid, constraint offset -1s, clock unsynced, clock offset is 
304.382ms 
Thu Nov 22 17:50:09 CET 2018
4/4 peers valid, constraint offset -1s, clock unsynced, clock offset is 
254.382ms 
Thu Nov 22 17:50:19 CET 2018
4/4 peers valid, constraint offset -1s, clock unsynced, clock offset is 
204.382ms 
Thu Nov 22 17:50:29 CET 2018
4/4 peers valid, constraint offset -1s, clock unsynced, clock offset is 
154.382ms 
Thu Nov 22 17:50:39 CET 2018
4/4 peers valid, constraint offset -1s, clock unsynced, clock offset is 
104.382ms 
Thu Nov 22 17:50:49 CET 2018
4/4 peers valid, constraint offset -1s, clock unsynced, clock offset is 
54.382ms  
Thu Nov 22 17:50:59 CET 2018
4/4 peers valid, constraint offset -1s, clock unsynced
Thu Nov 22 17:51:09 CET 2018
4/4 peers valid, constraint offset -1s, clock unsynced
Thu Nov 22 17:51:19 CET 2018
4/4 peers valid, constraint offset -1s, clock synced, stratum 3
Thu Nov 22 17:51:29 CET 2018

dunno how valid a testcase this is

dmesg below, this time emulating a QEMU Standard PC (i440FX + PIIX, 1996)

Landry
OpenBSD 6.4-current (GENERIC.MP) #4: Thu Nov 22 17:40:05 CET 2018
landry@c32.proxmox2:/usr/src/sys/arch/i386/compile/GENERIC.MP
real mem  = 3220561920 (3071MB)
avail mem = 3146719232 (3000MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: date 06/23/99, BIOS32 rev. 0 @ 0xfd2b1, SMBIOS rev. 2.8 @ 
0xf5980 (10 entries)
bios0: vendor SeaBIOS version 
"rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org" date 04/01/2014
bios0: QEMU Standard PC (i440FX + PIIX, 1996)
acpi0 at bios0: rev 0
acpi0: sleep states S3 S4 S5
acpi0: tables DSDT FACP APIC HPET
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Common KVM processor ("GenuineIntel" 686-class) 2.94 GHz, 0f-06-01
cpu0: 
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,CX16,x2APIC,HV,NXE,LONG,LAHF,MELTDOWN
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 999MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Common KVM processor ("GenuineIntel" 686-class) 2.94 GHz, 0f-06-01
cpu1: 
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,CX16,x2APIC,HV,NXE,LONG,LAHF,MELTDOWN
ioapic0 at mainbus0: apid 0 pa 0xfec0, version 11, 24 pins
acpihpet0 at acpi0: 1 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpicpu0 at acpi0: C1(@1 halt!)
acpicpu1 at acpi0: C1(@1 halt!)
"ACPI0006" at acpi0 not configured
"PNP0A03" at acpi0 not configured
acpicmos0 at acpi0
"PNP0A06" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"QEMU0002" at acpi0 not configured
"ACPI0010" at acpi0 not configured
bios0: ROM list: 0xc

Re: pvclock(4)

2018-11-22 Thread Landry Breuil
On Thu, Nov 22, 2018 at 07:44:01AM -0800, Mike Larkin wrote:
> On Thu, Nov 22, 2018 at 04:37:49PM +0100, Reyk Floeter wrote:
> > On Mon, Nov 19, 2018 at 01:12:46PM +0100, Reyk Floeter wrote:
> > > the attached diff is another attempt at implementing a pvclock(4)
> > > guest driver.  This improves the clock on KVM and replaces the need
> > > for using the VM-expensive acpihpet(4).
> > > 
> > 
> > So far I only got positive reports.  Where are the problems? ;)
> > 
> > Otherwise: OK?
> > 
> > Reyk
> > 
> 
> Reads ok. One question - you mention in pvclock.c that this is supported
> on i386 and amd64 but I only see GENERIC changes for amd64?
> 
> ok mlarkin in any case, but I'd either add GENERIC changes for i386 or
> make this for sure amd64 only.

I'm giving it a shot on my i386 vm.



Re: pvclock(4)

2018-11-22 Thread Mike Larkin
On Thu, Nov 22, 2018 at 04:37:49PM +0100, Reyk Floeter wrote:
> On Mon, Nov 19, 2018 at 01:12:46PM +0100, Reyk Floeter wrote:
> > the attached diff is another attempt at implementing a pvclock(4)
> > guest driver.  This improves the clock on KVM and replaces the need
> > for using the VM-expensive acpihpet(4).
> > 
> 
> So far I only got positive reports.  Where are the problems? ;)
> 
> Otherwise: OK?
> 
> Reyk
> 

Reads ok. One question - you mention in pvclock.c that this is supported
on i386 and amd64 but I only see GENERIC changes for amd64?

ok mlarkin in any case, but I'd either add GENERIC changes for i386 or
make this for sure amd64 only.

-ml

> > Index: share/man/man4/pvclock.4
> > =======
> > RCS file: share/man/man4/pvclock.4
> > diff -N share/man/man4/pvclock.4
> > --- /dev/null   1 Jan 1970 00:00:00 -
> > +++ share/man/man4/pvclock.419 Nov 2018 11:48:33 -
> > @@ -0,0 +1,45 @@
> > +.\"$OpenBSD$
> > +.\"
> > +.\" Copyright (c) 2018 Reyk Floeter 
> > +.\"
> > +.\" Permission to use, copy, modify, and distribute this software for any
> > +.\" purpose with or without fee is hereby granted, provided that the above
> > +.\" copyright notice and this permission notice appear in all copies.
> > +.\"
> > +.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL 
> > WARRANTIES
> > +.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> > +.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> > +.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> > +.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> > +.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> > +.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> > +.\"
> > +.Dd $Mdocdate$
> > +.Dt PVCLOCK 4
> > +.Os
> > +.Sh NAME
> > +.Nm pvclock
> > +.Nd paravirtual clock driver
> > +.Sh SYNOPSIS
> > +.Cd "pvclock* at pvbus?
> > +.Sh DESCRIPTION
> > +The
> > +.Nm
> > +driver supports the paravirtual clock that is available in KVM and
> > +other hypervisors.
> > +.Nm
> > +uses a shared page between the host and the hypervisor to synchronize
> > +the TSC clock in an efficient way.
> > +.Sh SEE ALSO
> > +.Xr pvbus 4
> > +.Sh HISTORY
> > +The
> > +.Nm
> > +driver first appeared in
> > +.Ox 6.5 .
> > +.Sh AUTHORS
> > +.An -nosplit
> > +The
> > +.Nm
> > +driver was written by
> > +.An Reyk Floeter Aq Mt r...@openbsd.org .
> > Index: sys/arch/amd64/conf/GENERIC
> > ===
> > RCS file: /cvs/src/sys/arch/amd64/conf/GENERIC,v
> > retrieving revision 1.464
> > diff -u -p -u -p -r1.464 GENERIC
> > --- sys/arch/amd64/conf/GENERIC 26 Oct 2018 20:26:19 -  1.464
> > +++ sys/arch/amd64/conf/GENERIC 19 Nov 2018 11:48:33 -
> > @@ -79,6 +79,8 @@ ipmi0 at mainbus? disable # IPMI
> >  
> >  vmt0   at pvbus?   # VMware Tools
> >  
> > +pvclock0 at pvbus? # KVM pvclock
> > +
> >  xen0   at pvbus?   # Xen HVM domU
> >  xnf*   at xen? # Xen Netfront
> >  xbf*   at xen? # Xen Blkfront
> > Index: sys/dev/pv/files.pv
> > ===
> > RCS file: /cvs/src/sys/dev/pv/files.pv,v
> > retrieving revision 1.14
> > diff -u -p -u -p -r1.14 files.pv
> > --- sys/dev/pv/files.pv 24 Aug 2018 16:07:01 -  1.14
> > +++ sys/dev/pv/files.pv 19 Nov 2018 11:48:33 -
> > @@ -8,6 +8,11 @@ device pvbus
> >  attach pvbus at mainbus
> >  file   dev/pv/pvbus.c  pvbus   needs-flag
> >  
> > +# KVM clock
> > +device pvclock
> > +attach pvclock at pvbus
> > +file   dev/pv/pvclock.cpvclock needs-flag
> > +
> >  # VMware Tools
> >  device vmt
> >  attach vmt at pvbus
> > Index: sys/dev/pv/pvclock.c
> > ===
> > RCS file: sys/dev/pv/pvclock.c
> > diff -N sys/dev/pv/pvclock.c
> > --- /dev/null   1 Jan 1970 00:00:00 -
> > +++ sys/dev/pv/pvclock.c19 Nov 2018 11:48:33 -
> > @@ -0,0 +1,229 @@

Re: pvclock(4)

2018-11-22 Thread Reyk Floeter
On Mon, Nov 19, 2018 at 01:12:46PM +0100, Reyk Floeter wrote:
> the attached diff is another attempt at implementing a pvclock(4)
> guest driver.  This improves the clock on KVM and replaces the need
> for using the VM-expensive acpihpet(4).
> 

So far I only got positive reports.  Where are the problems? ;)

Otherwise: OK?

Reyk

> Index: share/man/man4/pvclock.4
> ===
> RCS file: share/man/man4/pvclock.4
> diff -N share/man/man4/pvclock.4
> --- /dev/null 1 Jan 1970 00:00:00 -
> +++ share/man/man4/pvclock.4  19 Nov 2018 11:48:33 -
> @@ -0,0 +1,45 @@
> +.\"  $OpenBSD$
> +.\"
> +.\" Copyright (c) 2018 Reyk Floeter 
> +.\"
> +.\" Permission to use, copy, modify, and distribute this software for any
> +.\" purpose with or without fee is hereby granted, provided that the above
> +.\" copyright notice and this permission notice appear in all copies.
> +.\"
> +.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> +.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> +.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> +.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> +.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> +.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> +.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> +.\"
> +.Dd $Mdocdate$
> +.Dt PVCLOCK 4
> +.Os
> +.Sh NAME
> +.Nm pvclock
> +.Nd paravirtual clock driver
> +.Sh SYNOPSIS
> +.Cd "pvclock* at pvbus?
> +.Sh DESCRIPTION
> +The
> +.Nm
> +driver supports the paravirtual clock that is available in KVM and
> +other hypervisors.
> +.Nm
> +uses a shared page between the host and the hypervisor to synchronize
> +the TSC clock in an efficient way.
> +.Sh SEE ALSO
> +.Xr pvbus 4
> +.Sh HISTORY
> +The
> +.Nm
> +driver first appeared in
> +.Ox 6.5 .
> +.Sh AUTHORS
> +.An -nosplit
> +The
> +.Nm
> +driver was written by
> +.An Reyk Floeter Aq Mt r...@openbsd.org .
> Index: sys/arch/amd64/conf/GENERIC
> ===
> RCS file: /cvs/src/sys/arch/amd64/conf/GENERIC,v
> retrieving revision 1.464
> diff -u -p -u -p -r1.464 GENERIC
> --- sys/arch/amd64/conf/GENERIC   26 Oct 2018 20:26:19 -  1.464
> +++ sys/arch/amd64/conf/GENERIC   19 Nov 2018 11:48:33 -
> @@ -79,6 +79,8 @@ ipmi0   at mainbus? disable # IPMI
>  
>  vmt0 at pvbus?   # VMware Tools
>  
> +pvclock0 at pvbus?   # KVM pvclock
> +
>  xen0 at pvbus?   # Xen HVM domU
>  xnf* at xen? # Xen Netfront
>  xbf* at xen? # Xen Blkfront
> Index: sys/dev/pv/files.pv
> ===
> RCS file: /cvs/src/sys/dev/pv/files.pv,v
> retrieving revision 1.14
> diff -u -p -u -p -r1.14 files.pv
> --- sys/dev/pv/files.pv   24 Aug 2018 16:07:01 -  1.14
> +++ sys/dev/pv/files.pv   19 Nov 2018 11:48:33 -
> @@ -8,6 +8,11 @@ device   pvbus
>  attach   pvbus at mainbus
>  file dev/pv/pvbus.c  pvbus   needs-flag
>  
> +# KVM clock
> +device   pvclock
> +attach   pvclock at pvbus
> +file dev/pv/pvclock.cpvclock needs-flag
> +
>  # VMware Tools
>  device   vmt
>  attach   vmt at pvbus
> Index: sys/dev/pv/pvclock.c
> ===
> RCS file: sys/dev/pv/pvclock.c
> diff -N sys/dev/pv/pvclock.c
> --- /dev/null 1 Jan 1970 00:00:00 -
> +++ sys/dev/pv/pvclock.c  19 Nov 2018 11:48:33 -
> @@ -0,0 +1,229 @@
> +/*   $OpenBSD$   */
> +
> +/*
> + * Copyright (c) 2018 Reyk Floeter 
> + *
> + * Permission to use, copy, modify, and distribute this software for any
> + * purpose with or without fee is hereby granted, provided that the above
> + * copyright notice and this permission notice appear in all copies.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS

Re: pvclock(4)

2018-11-20 Thread Landry Breuil
On Mon, Nov 19, 2018 at 01:12:46PM +0100, Reyk Floeter wrote:
> Hi,
> 
> the attached diff is another attempt at implementing a pvclock(4)
> guest driver.  This improves the clock on KVM and replaces the need
> for using the VM-expensive acpihpet(4).
> 
> While pvclock(4) is available on KVM, Xen, Hyper-V (in a modified
> form), it currently only attaches under KVM:
> 
> pvbus0 at mainbus0: KVM
> pvclock0 at pvbus0

Works fine on my proxmox 5.2 / KVM 2.11 dpb build cluster emulating 6x
"QEMU Standard PC (Q35 + ICH9, 2009)". Before i had to restart ntpd
every 5mn on each hosts, time would drift like crazy under build load.
With pvclock it seems the slave clocks are stable and ntpd stays synced.

kern.timecounter.tick=1
kern.timecounter.timestepwarnings=0
kern.timecounter.hardware=pvclock0
kern.timecounter.choice=i8254(0) pvclock0(1500) acpihpet0(1000) 
acpitimer0(1000) dummy(-100)

Landry
OpenBSD 6.4-current (GENERIC.MP) #0: Tue Nov 20 18:52:03 CET 2018
landry@c64.proxmox2:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 4276924416 (4078MB)
avail mem = 4138016768 (3946MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf5970 (10 entries)
bios0: vendor SeaBIOS version 
"rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org" date 04/01/2014
bios0: QEMU Standard PC (Q35 + ICH9, 2009)
acpi0 at bios0: rev 0
acpi0: sleep states S3 S4 S5
acpi0: tables DSDT FACP APIC HPET MCFG
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Common KVM processor, 2933.90 MHz, 0f-06-01
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,CX16,x2APIC,HV,NXE,LONG,LAHF,MELTDOWN
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 999MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Common KVM processor, 2933.54 MHz, 0f-06-01
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,CX16,x2APIC,HV,NXE,LONG,LAHF,MELTDOWN
cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu1: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu1: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu1: smt 0, core 1, package 0
ioapic0 at mainbus0: apid 0 pa 0xfec0, version 11, 24 pins
acpihpet0 at acpi0: 1 Hz
acpimcfg0 at acpi0
acpimcfg0: addr 0xb000, bus 0-255
acpiprt0 at acpi0: bus 0 (PCI0)
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
acpicpu0 at acpi0: C1(@1 halt!)
acpicpu1 at acpi0: C1(@1 halt!)
"ACPI0006" at acpi0 not configured
acpipci0 at acpi0 PCI0: 0x0

Re: pvclock(4)

2018-11-19 Thread Sebastian Benoit
Reyk Floeter(r...@openbsd.org) on 2018.11.19 13:12:46 +0100:
> Feedback? Tests? OKs?

test on my incarnation of kvm:

OpenBSD 6.4-current (GENERIC.MP) #0: Mon Nov 19 18:32:29 CET 2018
ben...@test.openbsd.fluchtwagenfahrer.de:/sys/arch/amd64/compile/GENERIC.MP
real mem = 2097004544 (1999MB)
avail mem = 2024218624 (1930MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0x7dfffd30 (14 entries)
bios0: vendor SeaBIOS version "1.10.2-1" date 04/01/2014
bios0: QEMU Standard PC (i440FX + PIIX, 1996)
acpi0 at bios0: rev 0
acpi0: sleep states S3 S4 S5
acpi0: tables DSDT FACP APIC HPET
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Westmere E56xx/L56xx/X56xx (Nehalem-C), 2533.93 MHz, 06-2c-01
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,AES,HV,NXE,LONG,LAHF,ARAT,MELTDOWN
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 999MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Westmere E56xx/L56xx/X56xx (Nehalem-C), 2533.45 MHz, 06-2c-01
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,AES,HV,NXE,LONG,LAHF,ARAT,MELTDOWN
cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu1: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu1: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu1: smt 0, core 0, package 1
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Westmere E56xx/L56xx/X56xx (Nehalem-C), 2533.47 MHz, 06-2c-01
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,AES,HV,NXE,LONG,LAHF,ARAT,MELTDOWN
cpu2: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu2: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu2: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu2: smt 0, core 0, package 2
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Westmere E56xx/L56xx/X56xx (Nehalem-C), 2533.47 MHz, 06-2c-01
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,AES,HV,NXE,LONG,LAHF,ARAT,MELTDOWN
cpu3: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu3: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu3: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu3: smt 0, core 0, package 3
cpu4 at mainbus0: apid 4 (application processor)
cpu4: Westmere E56xx/L56xx/X56xx (Nehalem-C), 2533.46 MHz, 06-2c-01
cpu4: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,AES,HV,NXE,LONG,LAHF,ARAT,MELTDOWN
cpu4: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu4: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu4: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu4: smt 0, core 0, package 4
cpu5 at mainbus0: apid 5 (application processor)
cpu5: Westmere E56xx/L56xx/X56xx (Nehalem-C), 2533.45 MHz, 06-2c-01
cpu5: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,AES,HV,NXE,LONG,LAHF,ARAT,MELTDOWN
cpu5: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu5: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu5: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu5: smt 0, core 0, package 5
ioapic0 at mainbus0: apid 0 pa 0xfec0, version 11, 24 pins
acpihpet0 at acpi0: 1 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpicpu0 at acpi0: C1(@1 halt!)
acpicpu1 at acpi0: C1(@1 halt!)
acpicpu2 at acpi0: C1(@1 halt!)
acpicpu3 at acpi0: C1(@1 halt!)
acpicpu4 at acpi0: C1(@1 halt!)
acpicpu5 at acpi0: C1(@1 halt!)
"ACPI0006" at acpi0 not configured
acpipci0 at acpi0 PCI0: _OSC failed
acpicmos0 at acpi0
"PNP0A06" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"QEMU0002" at acpi0 not configured
"ACPI0010" at acpi0 not configured
pvbus0 at mainbus0: KVM
pvclock0 at pvbus0
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 

pvclock(4)

2018-11-19 Thread Reyk Floeter
Hi,

the attached diff is another attempt at implementing a pvclock(4)
guest driver.  This improves the clock on KVM and replaces the need
for using the VM-expensive acpihpet(4).

While pvclock(4) is available on KVM, Xen, Hyper-V (in a modified
form), it currently only attaches under KVM:

pvbus0 at mainbus0: KVM
pvclock0 at pvbus0

A few notes:

- The "invtsc" workaround to get a real TSC is not always possible, as
it breaks migration and a few other things.  If you're running in a
cloud, it is very unlikely that you'll get it enabled.

- pvclock suffered from the same problem as early TSC implementations:
it wasn't synced across CPUs, so the OS had to do it manually.  In
OpenBSD, we never supported that and we decided to only support the
newer "invariant" TSCs that started to show up around SkyLake.  And I
think that's why the previous pvlock implementations for OpenBSD
didn't take off.

- So my diff does the same for pvclock what we did for TSC: only
support the "stable" pvclock that guarantees to provide a synchronized
clock by either doing the synchronization on the host side or by
providing it based on an invariant TSC on the host itself.  So this
diff might not work on older KVM versions or CPUs; I'd be happy to see
some test results.

- I did not copy+paste from the Linux and/or FreeBSD (they have
suprising similarities but different licenses), and I wasn't convinced
that it is worth copying the asm code that they use instead of the
following multiplication:
ctr = ((delta * mul_frac) >> 32) + system_time;
Let me know if I'm wrong and if this code provides significant benefits:
https://github.com/freebsd/freebsd/blob/1d6e4247415d264485ee94b59fdbc12e0c566fd0/sys/x86/x86/pvclock.c#L90-L124

- This driver only implements the "system time" clock, and not the
static "wall clock".  I didn't see a point in supporting the wall
clock as its timestamp is only ever updated when you cold-start the
VM.  The wall clock struct is still defined in the driver for
reference but could probably be removed.

- I assigned a timecounter priority of 1500 to be in the middle
between acpihpet(1000) and tsc(2000).  If TSC is available, we should
probably use it instead; even if pvclock is theoretically better as it
is TSC-based and optimized for VMs.

- This is only a guest driver, not a driver for vmm(4)/vmd(8).  But it
could solve some issues if somebody would implement it for vmm(4)...

-  I'm running it on a busy build machine on Exoscale.ch for at least
a week now.  This machine runs stuff that utilizes all CPUs.  With
HPET, we've sometimes experienced livelocks but the pvclock(4) driver
made it much snappier.

$ time sleep 2
0m02.00s real 0m00.00s user 0m00.01s system
$ cat /var/db/ntpd.drift  
-4.679
$ sysctl kern.timecounter 
kern.timecounter.tick=1
kern.timecounter.timestepwarnings=1
kern.timecounter.hardware=pvclock0
kern.timecounter.choice=i8254(0) pvclock0(1500) acpihpet0(1000) 
acpitimer0(1000) dummy(-100)
$ sysctl hw.model hw.ncpuonline hw.vendor hw.product hw.version
hw.model=Intel Xeon Processor (Skylake)
hw.ncpuonline=8
hw.vendor=Apache Software Foundation
hw.product=CloudStack KVM Hypervisor
hw.version=pc-i440fx-2.11

Feedback? Tests? OKs?

Reyk

Index: share/man/man4/pvclock.4
=======
RCS file: share/man/man4/pvclock.4
diff -N share/man/man4/pvclock.4
--- /dev/null   1 Jan 1970 00:00:00 -
+++ share/man/man4/pvclock.419 Nov 2018 11:48:33 -
@@ -0,0 +1,45 @@
+.\"$OpenBSD$
+.\"
+.\" Copyright (c) 2018 Reyk Floeter 
+.\"
+.\" Permission to use, copy, modify, and distribute this software for any
+.\" purpose with or without fee is hereby granted, provided that the above
+.\" copyright notice and this permission notice appear in all copies.
+.\"
+.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+.\"
+.Dd $Mdocdate$
+.Dt PVCLOCK 4
+.Os
+.Sh NAME
+.Nm pvclock
+.Nd paravirtual clock driver
+.Sh SYNOPSIS
+.Cd "pvclock* at pvbus?
+.Sh DESCRIPTION
+The
+.Nm
+driver supports the paravirtual clock that is available in KVM and
+other hypervisors.
+.Nm
+uses a shared page between the host and the hypervisor to synchronize
+the TSC clock in an efficient way.
+.Sh SEE ALSO
+.Xr pvbus 4
+.Sh HISTORY
+The
+.Nm
+driver fi