Re: svn commit: r219646 - head/sys/x86/isa

2011-03-16 Thread Bruce Evans

On Tue, 15 Mar 2011, Jung-uk Kim wrote:


On Tuesday 15 March 2011 03:51 pm, John Baldwin wrote:

On Tuesday, March 15, 2011 3:26:11 pm Jung-uk Kim wrote:

Now don't you think we should really kill delay by TSC? ;-)


Delay by TSC fixed known deadlocks with the i8254 based DELAY() due
to the use of locks.  Be careful that you don't re-introduce old
bugs.


Yeah, that's perfectly understood.


I don't understand it.  I don't know of any deadlocks except the ones
in ddb that I fixed, and there were deadlocks, they can't have been
fixed by using the TSC, since the TSC is not always available.  It
is not even always available on systems that have a P-state invariant
TSC synchronized across all CPUs, since it is not initialized early,
so the old deadlock avoidance is still needed.  In fact, I found the
deadlock using ddb to debug TSC initialization which used the i8254
(not via DELAY()) for calibrating the TSC frequency.


Also, you can use a TSC for DELAY() in cases when it is not safe to
use it for the timecounter (if it is not in sync across cores, but
is used in a machine with invariant TSCs or where the user knows
they won't ever throttle it). Modern Intel CPUs all have invariant
TSCs that are more or less in sync across cores, and we should
certainly still use the TSC for DELAY() in that case. Even if they
aren't in sync (so we can't use it for the timecounter) we should
still use the TSC if they are invariant as it is far cheaper than
anything else.


That, too, is well understood.  You know I added tsc_is_invariant
myself. :-)


1: [pause;] decl %eax; jne 1b in a loop works just as well as rdtsc
in a loop if the TSC is _not_ P-state invariant.  Both break similarly
if the TSC frequency changes.  You have to change the loop count if
the frequency changes, but it is hard to tell if this is necessary.
Telling if it is necessary requires reading a real hardware timer to
recalibrate occasionally from within the loop and without, and the
timer read from within the loop gives the same locking complications
that you are trying to avoid.


However, why do we need cheaper DELAY() when we trying to delay
something with it?


kib mentioned bus activity.  1: decl %eax; jne 1b doesn't have that.
I wonder if rdtsc has it.  Phenom has much slower rdtsc than Athlon
to help make it P-state invariant.  Does this involve bus activity?
What does pause do in a loop that is only accessing registers?

Bruce
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r219646 - head/sys/x86/isa

2011-03-16 Thread Bruce Evans

On Tue, 15 Mar 2011, Jung-uk Kim wrote:


On Tuesday 15 March 2011 02:13 pm, Bruce Evans wrote:

On Tue, 15 Mar 2011, Jung-uk Kim wrote:

...
I disagree.  I think we should keep away from i8254 as much as
possible.


It is adequate for DELAY(), and is the only timer that is available
on all x86.  You need large complications in DELAY() to make it
work as well as the old code using the i8254, for no gain until
there is an x86 without an i8254.


Intel started killing off i8254 (and other legacy ISA devices),
starting from Mobile Internet Device platforms.  Even for other
so-called legacy-free platforms, it isn't adequate any more because
of unpredictable SMI# interference.


I thought of getting a SandyBridge system and decided not to since it
it kills too much of my hardware.

Lengthening the delay due to SMI# interference doesn't matter in DELAY(),
since normal interrupts also lengthen it.  SMI# would have to corrupt
the i8254 state to cause larger problems.  (This is too easy to do
since the i8254 cannot be used without clobbering its current state
by more than advancing its countdown.)


Even in this patch, it isn't clear that the low level
timecounters are initialized before they are used.


The timecounter is always initialized first, then set.


It is still unclear whether they are initialized enough before they
are used since their setting is neither locked nor atomic.
...
accessing it.  In fact, I can now see a reproducible bug in
DELAY(): - find an x86 that can run at 4GHz
- run it in 386 mode
- start it so that tsc_freq is 0x1000ULL
- use the machdep.tsc_freq sysctl to tune tsc_freq to 0x


machdep.tsc_freq should never have modified tsc_freq in the first
place.  I am going to remove that soon.


No, changing tsc_freq is what machdep.tsc_freq is for.  Breaking this
would be backed out :-).  You can fix it using a generation count or
something.


[More examples of races.]


Now don't you think we should really kill delay by TSC? ;-)


I always said that it should never have existed.  This leaves similar
bugs in timecounter intiialization, but there is some protection from
generation counts and time-domain locking there.

The sysctl is hackish in writing directly to the timecounter freqquency
variable instead of calling a tc function like tc_setclock().  I forget
why it does that.  (I said tc_init() before.  tc_init() is only for
initial initialization and tc_setclock() is intended only for adjustments
to the time.)

Bruce
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r219646 - head/sys/x86/isa

2011-03-15 Thread Jung-uk Kim
On Monday 14 March 2011 10:31 pm, Bruce Evans wrote:
 On Mon, 14 Mar 2011, Jung-uk Kim wrote:
  Log:
   When TSC is unavailable, broken or disabled and the current
  timecounter has better quality than i8254 timer, use it for
  DELAY(9).

 You cannot use a random timecounter for DELAY().  DELAY() is
 carefully written to not use any locks, so that it doesn't deadlock
 when called in the any context from a bad console driver like
 syscons when syscons is called from ddb.  (Bad console drivers may
 have their own locks which deadlock in ddb, but that is another
 problem.)  Even the i8254 timer, which is the only timer that
 should be used by DELAY(), normally use locks that used to cause
 deadlock, but DELAY() was unbroken to bypass the locks when it is
 called from ddb.

I disagree.  I think we should keep away from i8254 as much as 
possible.  Actually, i8254 is the only timer that requires locking in 
the first place because of its braindead design.  All other x86 
timers do not require any locking at all.  We even sacrificed upper 
32 bits of HPET in favor of i386. :-(

 Cpufreq and other calibration code should use a normal timecounter
 via nanouptime() in all cases and never go near low level
 timecounters or DELAY().  There are (hopefully minor) problems
 getting the nanotime() initialized before it used.  The dummy
 timecounter is just what you don't want to use.

That's exactly the problem.

 Even in this patch, it isn't clear that the low level timecounters
 are initialized before they are used.

The timecounter is always initialized first, then set.

  Modified: head/sys/x86/isa/clock.c
  =
 = --- head/sys/x86/isa/clock.c   Mon Mar 14 19:31:43
  2011(r219645) +++ head/sys/x86/isa/clock.c  Mon Mar 14 22:05:59
  2011(r219646) @@ -245,6 +245,42 @@ getit(void)
  return ((high  8) | low);
  }
 
  +static __inline void
  +delay_tsc(int n)
  +{
  +   uint64_t start, end, now;
  +
  +   sched_pin();
  +   start = rdtsc();
  +   end = start + (tsc_freq * n) / 100;
  +   do {
  +   cpu_spinwait();
  +   now = rdtsc();
  +   } while (now  end || (now  start  end  start));
  +   sched_unpin();
  +}

 You cannot call random scheduling code from DELAY(), since the
 scheduling code is not designed to be called in the any context. 
 As it happens, sched_pin() and sched_unpin() are safe, and were
 already used in the TSC case.

Actually, I really like to get rid of this code.  It isn't safe for 
many reasons, e.g., its frequency may change while in the loop and 
end result is unpredictable if TSC is not invariant.  We may limit 
its use for invariant case, though.

  +
  +static __inline void
  +delay_timecounter(struct timecounter *tc, int n)
  +{
  +   uint64_t end, now;
  +   u_int last, mask, u;
  +
  +   mask = tc-tc_counter_mask;
  +   last = tc-tc_get_timecount(tc)  mask;
  +   end = tc-tc_frequency * n / 100;

 This depends on the delicate timecounter locking to be safe.  I
 think it it is safe except possibly in tc_get_timecount(), for the
 same reasons that nanouptime() can run safely with no visible
 locking.  tc must be the current timecounter/timehands, which is
 guaranteed to not be in an in-between state or become so while we
 are using it, despite there being no visible locking.  Actually
 there are some minor races:
 - in nanouptime(), the timecounter generation is checked to avoid
 using a new generation if necessary, but updates of the generation
 count and the fileds that it protects are not properly atomic.
 - here, the check of the generation is missing.  So this works
 without races when called from ddb (since ddb stops other CPUs),
 but has minor races in general.

Window of the races is very narrow here and it only happens on i386 
where reading uint64_t is not atomic if my understanding is correct.  
It cannot be any worse than tsc_freq anyway. ;-)

  +   now = 0;
  +   do {
  +   cpu_spinwait();
  +   u = tc-tc_get_timecount(tc)  mask;
  +   if (u  last)
  +   now += mask - last + u + 1;
  +   else
  +   now += u - last;
  +   last = u;
  +   } while (now  end);
  +}
  +
  /*
   * Wait n microseconds.
   * Relies on timer 1 counting down from (i8254_freq / hz)
  @@ -253,6 +289,7 @@ getit(void)
  void
  DELAY(int n)
  {
  +   struct timecounter *tc;
  int delta, prev_tick, tick, ticks_left;
 
  #ifdef DELAYDEBUG
  @@ -262,16 +299,12 @@ DELAY(int n)
  #endif
 
  if (tsc_freq != 0) {
  -   uint64_t start, end, now;
  -
  -   sched_pin();
  -   start = rdtsc();
  -   end = start + (tsc_freq * n) / 100;
  -   do {
  -   cpu_spinwait();
  -   now = rdtsc();
  -   } while (now  end || (now  start  end  start));
  -   sched_unpin();
  +   delay_tsc(n);
  +   return;
  +   }
  +   tc = 

Re: svn commit: r219646 - head/sys/x86/isa

2011-03-15 Thread Bruce Evans

On Tue, 15 Mar 2011, Jung-uk Kim wrote:


On Monday 14 March 2011 10:31 pm, Bruce Evans wrote:

On Mon, 14 Mar 2011, Jung-uk Kim wrote:

Log:
 When TSC is unavailable, broken or disabled and the current
timecounter has better quality than i8254 timer, use it for
DELAY(9).


You cannot use a random timecounter for DELAY().  DELAY() is
carefully written to not use any locks, so that it doesn't deadlock
when called in the any context from a bad console driver like
syscons when syscons is called from ddb.  (Bad console drivers may
have their own locks which deadlock in ddb, but that is another
problem.)  Even the i8254 timer, which is the only timer that
should be used by DELAY(), normally use locks that used to cause
deadlock, but DELAY() was unbroken to bypass the locks when it is
called from ddb.


I disagree.  I think we should keep away from i8254 as much as
possible.



It is adequate for DELAY(), and is the only timer that is available on
all x86.  You need large complications in DELAY() to make it work as
well as the old code using the i8254, for no gain until there is an
x86 without an i8254.


Actually, i8254 is the only timer that requires locking in
the first place because of its braindead design.  All other x86
timers do not require any locking at all.  We even sacrificed upper
32 bits of HPET in favor of i386. :-(


Did we?  Timecounters use only 32 bits as an optimization.  Scaling of
64-bit values is slow even on 64-bit machines since it needs 128-bit
intermediate values for the multiplication/shift method to work.


Cpufreq and other calibration code should use a normal timecounter
via nanouptime() in all cases and never go near low level
timecounters or DELAY().  There are (hopefully minor) problems
getting the nanotime() initialized before it used.  The dummy
timecounter is just what you don't want to use.


That's exactly the problem.


Your fix won't make much difference then.  DELAY() will start with a
dummy timecounter and probably also a zero tsc_freq, so it will use
the i8254 for various things, probably including initial calibration
of the TSC/cpufreq.  Later it may see a better timecounter and use
that to get a better calibration of the TSC/cpufreq.  But its caller
can just as easily check for a better timecounter and not use DELAY()
if it can use a timecounter, without messing with DELAY()'s internals
and having to guard against reentrancy from ddb.  delay_timecounter()
already has all the code for this, except the check for a better
timecounter is external.


Even in this patch, it isn't clear that the low level timecounters
are initialized before they are used.


The timecounter is always initialized first, then set.


It is still unclear whether they are initialized enough before they
are used since their setting is neither locked nor atomic.  Consider
the TSC initialization.  This is init_TSC() from startrtclock() and
init_TSC_tc() from cpu_initclocks().  The first should make the low-
level timecounter useable, but this is unclear.  The second attaches
the low-level timecounter to the high-level timecounter data structures.
There is no locking for this except possibly Giant.  tsc_freq is
initialized early, so the tests of it will succeed long before the TSC
can become the high-level timecounter.  It is probably usable at a low
level, but this is unclear.  tsc_freq is 64 bits, and the accesses to
it are non-atomic, so especially on 32-bit machines there are races
accessing it.  In fact, I can now see a reproducible bug in DELAY():
- find an x86 that can run at 4GHz
- run it in 386 mode
- start it so that tsc_freq is 0x1000ULL
- use the machdep.tsc_freq sysctl to tune tsc_freq to 0x
- trace through this sysctl using gdb, and using a low quality console
  driver like syscons which calls DELAY().
- the 2 words in tsc_freq will normally be written from the low word
  up.  It will be seen to have value 0x1ULL after only the first
  word is written.  Not too bad -- just of by a factor of 2.
- I can't quite see how to make it have a really broken value.  At first
  I tried and example with it starting with a value if 0x and
  changing it to 0x1ULL.  Then after changing only its low word,
  it is 0 so DELAY() will not use the TSC.
Related non-reproducible bugs are also easy to see:
- on one CPU, spin calling the sysctl to toggly tsc_freq between 4G-1 and
  4G
- on another CPU, spin doing something that calls DELAY().  Since there
  is no locking or atomic accesses for the writes to tsc_freq in the sysctl
  or for the reads of it in DELAY(), not to mention elsewhwere, tsc_freq
  can read as zero when it is being changed between 2 nonzero values.  Not
  too bad if it reads as zero -- then it won't be used.  But uses of it
  following it being read as nonzero may find it with changed values that
  are very bad, for example zero.
These bugs are in the old code in DELAY() that uses the TSC, and perhaps
in all code of the form if (tsc_freq != 0) use(tsc). 

Re: svn commit: r219646 - head/sys/x86/isa

2011-03-15 Thread Jung-uk Kim
On Tuesday 15 March 2011 02:13 pm, Bruce Evans wrote:
 On Tue, 15 Mar 2011, Jung-uk Kim wrote:
  On Monday 14 March 2011 10:31 pm, Bruce Evans wrote:
  On Mon, 14 Mar 2011, Jung-uk Kim wrote:
  Log:
   When TSC is unavailable, broken or disabled and the current
  timecounter has better quality than i8254 timer, use it for
  DELAY(9).
 
  You cannot use a random timecounter for DELAY().  DELAY() is
  carefully written to not use any locks, so that it doesn't
  deadlock when called in the any context from a bad console
  driver like syscons when syscons is called from ddb.  (Bad
  console drivers may have their own locks which deadlock in ddb,
  but that is another problem.)  Even the i8254 timer, which is
  the only timer that should be used by DELAY(), normally use
  locks that used to cause deadlock, but DELAY() was unbroken to
  bypass the locks when it is called from ddb.
 
  I disagree.  I think we should keep away from i8254 as much as
  possible.

 It is adequate for DELAY(), and is the only timer that is available
 on all x86.  You need large complications in DELAY() to make it
 work as well as the old code using the i8254, for no gain until
 there is an x86 without an i8254.

Intel started killing off i8254 (and other legacy ISA devices), 
starting from Mobile Internet Device platforms.  Even for other 
so-called legacy-free platforms, it isn't adequate any more because 
of unpredictable SMI# interference.

  Actually, i8254 is the only timer that requires locking in
  the first place because of its braindead design.  All other x86
  timers do not require any locking at all.  We even sacrificed
  upper 32 bits of HPET in favor of i386. :-(

 Did we?  Timecounters use only 32 bits as an optimization.  Scaling
 of 64-bit values is slow even on 64-bit machines since it needs
 128-bit intermediate values for the multiplication/shift method to
 work.

No, I am talking about reading raw value of HPET.  For amd64, it can 
be used as a good alternative to TSC if we detect it earlier for 
legacy-free platforms.  Unfortunately, it is only useful for 
timecounter ATM.

  Cpufreq and other calibration code should use a normal
  timecounter via nanouptime() in all cases and never go near low
  level timecounters or DELAY().  There are (hopefully minor)
  problems getting the nanotime() initialized before it used.  The
  dummy timecounter is just what you don't want to use.
 
  That's exactly the problem.

 Your fix won't make much difference then.  DELAY() will start with
 a dummy timecounter and probably also a zero tsc_freq, so it will
 use the i8254 for various things, probably including initial
 calibration of the TSC/cpufreq.  Later it may see a better
 timecounter and use that to get a better calibration of the
 TSC/cpufreq.  But its caller can just as easily check for a better
 timecounter and not use DELAY() if it can use a timecounter,
 without messing with DELAY()'s internals and having to guard
 against reentrancy from ddb.  delay_timecounter() already has all
 the code for this, except the check for a better timecounter is
 external.

Yes, it doesn't make much difference for that ATM.  However, it DOES 
make differences when DELAY() is used for device drivers.  What I 
really want is DELAY() does the right thing, e.g., holding no spin 
lock, not pinned on a CPU, no tsc_freq hackery, and no side-effect of 
any kind.

  Even in this patch, it isn't clear that the low level
  timecounters are initialized before they are used.
 
  The timecounter is always initialized first, then set.

 It is still unclear whether they are initialized enough before they
 are used since their setting is neither locked nor atomic. 
 Consider the TSC initialization.  This is init_TSC() from
 startrtclock() and init_TSC_tc() from cpu_initclocks().  The first
 should make the low- level timecounter useable, but this is
 unclear.  The second attaches the low-level timecounter to the
 high-level timecounter data structures. There is no locking for
 this except possibly Giant.  tsc_freq is initialized early, so the
 tests of it will succeed long before the TSC can become the
 high-level timecounter.  It is probably usable at a low level, but
 this is unclear.  tsc_freq is 64 bits, and the accesses to it are
 non-atomic, so especially on 32-bit machines there are races
 accessing it.  In fact, I can now see a reproducible bug in
 DELAY(): - find an x86 that can run at 4GHz
 - run it in 386 mode
 - start it so that tsc_freq is 0x1000ULL
 - use the machdep.tsc_freq sysctl to tune tsc_freq to 0x

machdep.tsc_freq should never have modified tsc_freq in the first 
place.  I am going to remove that soon.

 - trace through this sysctl using gdb, and using a low quality
 console driver like syscons which calls DELAY().
 - the 2 words in tsc_freq will normally be written from the low
 word up.  It will be seen to have value 0x1ULL after only
 the first word is written.  Not too bad -- just of by a factor of
 2. - I 

Re: svn commit: r219646 - head/sys/x86/isa

2011-03-15 Thread John Baldwin
On Tuesday, March 15, 2011 3:26:11 pm Jung-uk Kim wrote:
 Now don't you think we should really kill delay by TSC? ;-)

Delay by TSC fixed known deadlocks with the i8254 based DELAY() due to the use 
of locks.  Be careful that you don't re-introduce old bugs.

Also, you can use a TSC for DELAY() in cases when it is not safe to use it for 
the timecounter (if it is not in sync across cores, but is used in a machine 
with invariant TSCs or where the user knows they won't ever throttle it).  
Modern Intel CPUs all have invariant TSCs that are more or less in sync across 
cores, and we should certainly still use the TSC for DELAY() in that case.  
Even if they aren't in sync (so we can't use it for the timecounter) we should 
still use the TSC if they are invariant as it is far cheaper than anything 
else.

-- 
John Baldwin
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r219646 - head/sys/x86/isa

2011-03-15 Thread Jung-uk Kim
On Tuesday 15 March 2011 03:51 pm, John Baldwin wrote:
 On Tuesday, March 15, 2011 3:26:11 pm Jung-uk Kim wrote:
  Now don't you think we should really kill delay by TSC? ;-)

 Delay by TSC fixed known deadlocks with the i8254 based DELAY() due
 to the use of locks.  Be careful that you don't re-introduce old
 bugs.

Yeah, that's perfectly understood.

 Also, you can use a TSC for DELAY() in cases when it is not safe to
 use it for the timecounter (if it is not in sync across cores, but
 is used in a machine with invariant TSCs or where the user knows
 they won't ever throttle it). Modern Intel CPUs all have invariant
 TSCs that are more or less in sync across cores, and we should
 certainly still use the TSC for DELAY() in that case. Even if they
 aren't in sync (so we can't use it for the timecounter) we should
 still use the TSC if they are invariant as it is far cheaper than
 anything else.

That, too, is well understood.  You know I added tsc_is_invariant 
myself. :-)

However, why do we need cheaper DELAY() when we trying to delay 
something with it?

Jung-uk Kim
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r219646 - head/sys/x86/isa

2011-03-15 Thread Kostik Belousov
On Tue, Mar 15, 2011 at 05:21:34PM -0400, Jung-uk Kim wrote:
 However, why do we need cheaper DELAY() when we trying to delay 
 something with it?
Busy-loop performing repeated access to the south bridge eats the
bus capacity, that could be useful for other bus agent and other cores.
I think this is the case with HPET, while TSC is core-local.


pgp874pQWyLYv.pgp
Description: PGP signature


svn commit: r219646 - head/sys/x86/isa

2011-03-14 Thread Jung-uk Kim
Author: jkim
Date: Mon Mar 14 22:05:59 2011
New Revision: 219646
URL: http://svn.freebsd.org/changeset/base/219646

Log:
  When TSC is unavailable, broken or disabled and the current timecounter has
  better quality than i8254 timer, use it for DELAY(9).

Modified:
  head/sys/x86/isa/clock.c

Modified: head/sys/x86/isa/clock.c
==
--- head/sys/x86/isa/clock.cMon Mar 14 19:31:43 2011(r219645)
+++ head/sys/x86/isa/clock.cMon Mar 14 22:05:59 2011(r219646)
@@ -245,6 +245,42 @@ getit(void)
return ((high  8) | low);
 }
 
+static __inline void
+delay_tsc(int n)
+{
+   uint64_t start, end, now;
+
+   sched_pin();
+   start = rdtsc();
+   end = start + (tsc_freq * n) / 100;
+   do {
+   cpu_spinwait();
+   now = rdtsc();
+   } while (now  end || (now  start  end  start));
+   sched_unpin();
+}
+
+static __inline void
+delay_timecounter(struct timecounter *tc, int n)
+{
+   uint64_t end, now;
+   u_int last, mask, u;
+
+   mask = tc-tc_counter_mask;
+   last = tc-tc_get_timecount(tc)  mask;
+   end = tc-tc_frequency * n / 100;
+   now = 0;
+   do {
+   cpu_spinwait();
+   u = tc-tc_get_timecount(tc)  mask;
+   if (u  last)
+   now += mask - last + u + 1;
+   else
+   now += u - last;
+   last = u;
+   } while (now  end);
+}
+
 /*
  * Wait n microseconds.
  * Relies on timer 1 counting down from (i8254_freq / hz)
@@ -253,6 +289,7 @@ getit(void)
 void
 DELAY(int n)
 {
+   struct timecounter *tc;
int delta, prev_tick, tick, ticks_left;
 
 #ifdef DELAYDEBUG
@@ -262,16 +299,12 @@ DELAY(int n)
 #endif
 
if (tsc_freq != 0) {
-   uint64_t start, end, now;
-
-   sched_pin();
-   start = rdtsc();
-   end = start + (tsc_freq * n) / 100;
-   do {
-   cpu_spinwait();
-   now = rdtsc();
-   } while (now  end || (now  start  end  start));
-   sched_unpin();
+   delay_tsc(n);
+   return;
+   }
+   tc = timecounter;
+   if (tc-tc_quality  0) {
+   delay_timecounter(tc, n);
return;
}
 #ifdef DELAYDEBUG
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r219646 - head/sys/x86/isa

2011-03-14 Thread Bruce Evans

On Mon, 14 Mar 2011, Jung-uk Kim wrote:


Log:
 When TSC is unavailable, broken or disabled and the current timecounter has
 better quality than i8254 timer, use it for DELAY(9).


You cannot use a random timecounter for DELAY().  DELAY() is carefully
written to not use any locks, so that it doesn't deadlock when called
in the any context from a bad console driver like syscons when syscons
is called from ddb.  (Bad console drivers may have their own locks which
deadlock in ddb, but that is another problem.)  Even the i8254 timer,
which is the only timer that should be used by DELAY(), normally use locks
that used to cause deadlock, but DELAY() was unbroken to bypass the locks
when it is called from ddb.

Cpufreq and other calibration code should use a normal timecounter via
nanouptime() in all cases and never go near low level timecounters or
DELAY().  There are (hopefully minor) problems getting the nanotime()
initialized before it used.  The dummy timecounter is just what you
don't want to use.  Even in this patch, it isn't clear that the low
level timecounters are initialized before they are used.


Modified: head/sys/x86/isa/clock.c
==
--- head/sys/x86/isa/clock.cMon Mar 14 19:31:43 2011(r219645)
+++ head/sys/x86/isa/clock.cMon Mar 14 22:05:59 2011(r219646)
@@ -245,6 +245,42 @@ getit(void)
return ((high  8) | low);
}

+static __inline void
+delay_tsc(int n)
+{
+   uint64_t start, end, now;
+
+   sched_pin();
+   start = rdtsc();
+   end = start + (tsc_freq * n) / 100;
+   do {
+   cpu_spinwait();
+   now = rdtsc();
+   } while (now  end || (now  start  end  start));
+   sched_unpin();
+}


You cannot call random scheduling code from DELAY(), since the scheduling
code is not designed to be called in the any context.  As it happens,
sched_pin() and sched_unpin() are safe, and were already used in the TSC
case.


+
+static __inline void
+delay_timecounter(struct timecounter *tc, int n)
+{
+   uint64_t end, now;
+   u_int last, mask, u;
+
+   mask = tc-tc_counter_mask;
+   last = tc-tc_get_timecount(tc)  mask;
+   end = tc-tc_frequency * n / 100;


This depends on the delicate timecounter locking to be safe.  I think it
it is safe except possibly in tc_get_timecount(), for the same reasons
that nanouptime() can run safely with no visible locking.  tc must be
the current timecounter/timehands, which is guaranteed to not be in an
in-between state or become so while we are using it, despite there being
no visible locking.  Actually there are some minor races:
- in nanouptime(), the timecounter generation is checked to avoid using
  a new generation if necessary, but updates of the generation count and
  the fileds that it protects are not properly atomic.
- here, the check of the generation is missing.  So this works without
  races when called from ddb (since ddb stops other CPUs), but has minor
  races in general.


+   now = 0;
+   do {
+   cpu_spinwait();
+   u = tc-tc_get_timecount(tc)  mask;
+   if (u  last)
+   now += mask - last + u + 1;
+   else
+   now += u - last;
+   last = u;
+   } while (now  end);
+}
+
/*
 * Wait n microseconds.
 * Relies on timer 1 counting down from (i8254_freq / hz)
@@ -253,6 +289,7 @@ getit(void)
void
DELAY(int n)
{
+   struct timecounter *tc;
int delta, prev_tick, tick, ticks_left;

#ifdef DELAYDEBUG
@@ -262,16 +299,12 @@ DELAY(int n)
#endif

if (tsc_freq != 0) {
-   uint64_t start, end, now;
-
-   sched_pin();
-   start = rdtsc();
-   end = start + (tsc_freq * n) / 100;
-   do {
-   cpu_spinwait();
-   now = rdtsc();
-   } while (now  end || (now  start  end  start));
-   sched_unpin();
+   delay_tsc(n);
+   return;
+   }
+   tc = timecounter;
+   if (tc-tc_quality  0) {
+   delay_timecounter(tc, n);
return;
}


Optimizing DELAY() was bogus.  Now the loop for this is duplicatied.


#ifdef DELAYDEBUG



Stuff keeps accumulating or mutating above the code for testing it.

Bruce
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org