Success! Sorta! (was Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible? )

2002-02-17 Thread Matthew Dillon

:Bruce's patch amounts to a retry if the current timecounter was updated
:while we were calculating time.  It is a bit more defensive than it
:needs to be and generally pessimizes the timecounters elegant lockless
:design a fair bit, but it is still much better than slamming a mutex
:around the entire clock code.
:
:If this patch cures the PIIX problem, something I'm not at all convinced
:about, it should go in, if not only the comment should go in.
:
:Poul-Henning
:
:-- 
:Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20

Ok, I've tested Bruce's patch and it appeaars to mostly solve the problem.
I no longer get 'microuptime ... backwards' errors on a -current SMP
box.

However, I think to be complete we need to make it even less elegant.
The TC module is only flip-flopping between two time counters, which
means that it can flip-flop twice and the test will not work.  We need
a generation count on the timecounter as well:

do {
tc = timecounter;
gen = tc-tc_generation;
*bt = tc-tc_offset;
bintime_addx(bt, tc-tc_scale * tco_delta(tc));
} while (tc != timecounter || tc-tc_generation != gen);

There is also an issue on non-i386 machines.  The timecounter swapping
code requires a memory synchronization point after updating the 
contents of the new timecounter but before setting the 'timecounter' global
to the new timecounter.  If this is not done, non-i386 machines running
SMP may see the new timecounter structure but access pre-updated values
from it.

(i386 boxes do not have this problem because writes are ordered so the
inherent synchronication point at the end of the timer interrupt is
enough).

Is there a memory synchronization point macro available?

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Success! Sorta! (was Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible? )

2002-02-17 Thread Matthew Dillon

Whoop!  I take it back.  I'm still getting the errors:

microuptime() went backwards (458.168990 - 458.168882)
microuptime() went backwards (578.609995 - 577.929801)
microuptime() went backwards (748.912755 - 748.237402)
microuptime() went backwards (775.159625 - 775.159612)

I also think this retry loop has to be done everywhere where the
timecounter structure is accessed directly.

-Matt

:Ok, I've tested Bruce's patch and it appeaars to mostly solve the problem.
:I no longer get 'microuptime ... backwards' errors on a -current SMP
:box.
:...

Index: kern/kern_tc.c
===
RCS file: /home/ncvs/src/sys/kern/kern_tc.c,v
retrieving revision 1.113
diff -u -r1.113 kern_tc.c
--- kern/kern_tc.c  7 Feb 2002 21:21:55 -   1.113
+++ kern/kern_tc.c  17 Feb 2002 20:41:47 -
@@ -107,9 +107,11 @@
 {
struct timecounter *tc;
 
-   tc = timecounter;
-   *bt = tc-tc_offset;
-   bintime_addx(bt, tc-tc_scale * tco_delta(tc));
+   do {
+   tc = timecounter;
+   *bt = tc-tc_offset;
+   bintime_addx(bt, tc-tc_scale * tco_delta(tc));
+   } while (tc != timecounter);
 }
 
 void
@@ -126,8 +128,10 @@
struct timecounter *tc;
 
ngetmicrotime++;
-   tc = timecounter;
-   *tvp = tc-tc_microtime;
+   do {
+   tc = timecounter;
+   *tvp = tc-tc_microtime;
+   } while (tc != timecounter);
 }
 
 void
@@ -136,8 +140,10 @@
struct timecounter *tc;
 
ngetnanotime++;
-   tc = timecounter;
-   *tsp = tc-tc_nanotime;
+   do {
+   tc = timecounter;
+   *tsp = tc-tc_nanotime;
+   } while (tc != timecounter);
 }
 
 void
@@ -166,8 +172,10 @@
struct timecounter *tc;
 
ngetmicrouptime++;
-   tc = timecounter;
-   bintime2timeval(tc-tc_offset, tvp);
+   do {
+   tc = timecounter;
+   bintime2timeval(tc-tc_offset, tvp);
+   } while (tc != timecounter);
 }
 
 void
@@ -176,8 +184,10 @@
struct timecounter *tc;
 
ngetnanouptime++;
-   tc = timecounter;
-   bintime2timespec(tc-tc_offset, tsp);
+   do {
+   tc = timecounter;
+   bintime2timespec(tc-tc_offset, tsp);
+   } while (tc != timecounter);
 }
 
 void

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Success! Sorta! (was Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible? )

2002-02-17 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], Matthew Dillon wri
tes:

However, I think to be complete we need to make it even less elegant.
The TC module is only flip-flopping between two time counters, which
means that it can flip-flop twice and the test will not work.  We need
a generation count on the timecounter as well:

do {
tc = timecounter;
gen = tc-tc_generation;
*bt = tc-tc_offset;
bintime_addx(bt, tc-tc_scale * tco_delta(tc));
} while (tc != timecounter || tc-tc_generation != gen);

No, more like:

do {
tc = timecounter;
gen = tc-gen;
...
} while (gen != tc-gen);

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Success! Sorta! (was Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible? )

2002-02-17 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], Matthew Dillon wri
tes:
Whoop!  I take it back.  I'm still getting the errors:

microuptime() went backwards (458.168990 - 458.168882)
microuptime() went backwards (578.609995 - 577.929801)
microuptime() went backwards (748.912755 - 748.237402)
microuptime() went backwards (775.159625 - 775.159612)

I also think this retry loop has to be done everywhere where the
timecounter structure is accessed directly.

No, only where the timecounter hardware is read and the math
done.  As far as I know, this is the only place.

As I said, I am far from convinced this is the solution to the problem
we are looking at with the PIIX timecounter.

Msmith has some magic code in sys/dev/acpi/acpi_timer.c which tries
to identify if the PIIX counter is broken or OK and I notice that
the mask seems to always be set to 24 bits even if the hardware
is 32 bits.

I am not sure I read his code right, but he seems to default to the
unsafe method, can you try this copypasted patch ?

Index: acpi_timer.c
===
RCS file: /home/ncvs/src/sys/dev/acpica/acpi_timer.c,v
retrieving revision 1.11
diff -u -r1.11 acpi_timer.c
--- acpi_timer.c8 Jan 2002 06:45:56 -   1.11
+++ acpi_timer.c17 Feb 2002 20:48:23 -
@@ -138,7 +138,7 @@
 if (getenv(debug.acpi.timer_test) != NULL)
acpi_timer_test();
 
-acpi_timer_timecounter.tc_get_timecount = acpi_timer_get_timecount;
+acpi_timer_timecounter.tc_get_timecount = acpi_timer_get_timecount_safe;
 acpi_timer_timecounter.tc_frequency = acpi_timer_frequency;
 tc_init(acpi_timer_timecounter);
 



-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Success! Sorta! (was Re: 'microuptime() went backwards ...'using ACPI timer. Shouldn't that be impossible? )

2002-02-17 Thread Bruce Evans

On Sun, 17 Feb 2002, Matthew Dillon wrote:

 Whoop!  I take it back.  I'm still getting the errors:

 microuptime() went backwards (458.168990 - 458.168882)
 microuptime() went backwards (578.609995 - 577.929801)
 microuptime() went backwards (748.912755 - 748.237402)
 microuptime() went backwards (775.159625 - 775.159612)

 I also think this retry loop has to be done everywhere where the
 timecounter structure is accessed directly.

Yes, since the reads of all the relevant timecounter variables are
non-atomic.

 Index: kern/kern_tc.c
 ===
 RCS file: /home/ncvs/src/sys/kern/kern_tc.c,v
 retrieving revision 1.113
 diff -u -r1.113 kern_tc.c
 --- kern/kern_tc.c7 Feb 2002 21:21:55 -   1.113
 +++ kern/kern_tc.c17 Feb 2002 20:41:47 -
 @@ -126,8 +128,10 @@
   struct timecounter *tc;

   ngetmicrotime++;
 - tc = timecounter;
 - *tvp = tc-tc_microtime;
 + do {
 + tc = timecounter;
 + *tvp = tc-tc_microtime;
 + } while (tc != timecounter);
  }

  void

E.g., tc_mictrotime here is a timeval.  It doesn't matter getting a stale
value (although getting a stale value increases the possible incoherency
of the get*() functions from 1/HZ to NTIMECOUNTER/HZ), but getting a
stale value that changed underneath the read would be bad.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message