Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible?

2002-02-17 Thread Glenn Gombert

I also see these messages on my Dell 410 Workstation at work and a Dual
PIII Box I use at home to do builds with...they just seem to 'come and go'
with no particular pattern to them...I have just been ignoring them for the
most part...they don't really seem to cause any problems..


At 06:54 PM 2/17/2002 +1100, Bruce Evans wrote:
On Sat, 16 Feb 2002, Matthew Dillon wrote:

 Testing with a 'make -j 10 buildworld' on a -current box I am getting
 regular:

 microuptime() went backwards (146.826785 - 146.156715)
 microuptime() went backwards (146.826782 - 146.228636)
 ...
 microuptime() went backwards (8945.938288 - 8945.251603)
 microuptime() went backwards (8945.938306 - 8945.347173)
 microuptime() went backwards (9142.847550 - 9142.847546)

 This occurs both with and without the gettimeofday Giant-removal patch, so
 I am fairly sure it has nothing to do with any of my current work.  This is
 running -current on a DELL2550 (2xCPUs), compiled with the SMP option.

The fact that the timecounter usually goes backwards by about 0.68 seconds
is probably significant, but I can't quite explain it.

 Timecounter i8254  frequency 1193182 Hz
 ...
 Timecounter ACPI  frequency 3579545 Hz
 acpi_timer0: 32-bit timer at 3.579545MHz port 0x808-0x80b on acpi0
 acpi_cpu0: CPU on acpi0
 acpi_cpu1: CPU on acpi0
 acpi_pcib0: Host-PCI bridge on acpi0
 ...

 Question:  How can this be occuring at all?  Isn't the ACPI counter a
 32 bit counter that does not have the rollover problems that the 8254 timer
 has?

Timecounters go backwards when the timecounter update or reference code is
called insufficiently often to prevent overflow.  The rollover problems of
the i8254 timecounter actually reduce this problem.  If an i8254 rollover
is missed, then it causes the the i8254 timecounter to go forward less
than it should.

I just wrote the following fix for some of the overflow problems.

%%%
Index: kern_tc.c
===
RCS file: /home/ncvs/src/sys/kern/kern_tc.c,v
retrieving revision 1.113
diff -c -2 -r1.113 kern_tc.c
*** kern_tc.c  7 Feb 2002 21:21:55 -   1.113
--- kern_tc.c  17 Feb 2002 06:25:14 -
***
*** 108,114 
   struct timecounter *tc;

!  tc = timecounter;
!  *bt = tc-tc_offset;
!  bintime_addx(bt, tc-tc_scale * tco_delta(tc));
  }

--- 95,129 
   struct timecounter *tc;

!  /*
!   * The loop is to handle changes of timecounter underneath us.
!   * Such changes may even become normal for preemptive kernels.
!   * It is quite reasonable for idle priority processes to not
!   * run for many seconds, and if they are not running after
!   * being preempted here, the timecounter may cycle many times
!   * underneath them.  An NTIMECOUNTER of  2 is neither necessary
!   * or sufficient for fixing this problem, unless NTIMECOUNTER is
!   * preposterously large.  NTIMECOUNTER == 2 suffices for most
!   * cases, and something more is required to fix the general case.
!   *
!   * I hope this also fixes problems with overflow of the
!   * multiplication.  We depend on tc not becoming stale by more
!   * than 1 second.  We will now normally see such staleness
!   * because it will cause the timecounter to change many times
!   * underneath us.  There will only be problems if hardclock()
!   * doesn't run for many seconds, but hardclock() is a very
!   * high priority interrupt, so such problems can't happen.
!   *
!   * XXX should use a generation count.
!   *
!   * XXX problems with hardclock() can happen, e.g., at boot time
!   * if you have fixed hardclock() to not be a broken fast interrupt
!   * handler, or if you sit at the ddb prompt for several seconds.
!   * Should do something to make them harmless.
!   */
!  do {
!  tc = timecounter;
!  *bt = tc-tc_offset;
!  bintime_addx(bt, tc-tc_scale * tco_delta(tc));
!  } while (tc != timecounter);
  }

%%%

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

Glenn Gombert
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn'tthat be impossible?

2002-02-17 Thread Matthew Dillon

:I just wrote the following fix for some of the overflow problems.

I don't understand how this code is supposed to handle overflows.
You seem only to be checking to see if the master timecounter has
changed to a different type.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

:%%%
:Index: kern_tc.c
:===
:RCS file: /home/ncvs/src/sys/kern/kern_tc.c,v
:retrieving revision 1.113
:diff -c -2 -r1.113 kern_tc.c
:*** kern_tc.c  7 Feb 2002 21:21:55 -   1.113
:--- kern_tc.c  17 Feb 2002 06:25:14 -
:***
:*** 108,114 
:   struct timecounter *tc;
:
:!  tc = timecounter;
:!  *bt = tc-tc_offset;
:!  bintime_addx(bt, tc-tc_scale * tco_delta(tc));
:  }
:
:--- 95,129 
:   struct timecounter *tc;
:
:!  /*
:!   * The loop is to handle changes of timecounter underneath us.
:!   * Such changes may even become normal for preemptive kernels.
:!   * It is quite reasonable for idle priority processes to not
:!   * run for many seconds, and if they are not running after
:!   * being preempted here, the timecounter may cycle many times
:!   * underneath them.  An NTIMECOUNTER of  2 is neither necessary
:!   * or sufficient for fixing this problem, unless NTIMECOUNTER is
:!   * preposterously large.  NTIMECOUNTER == 2 suffices for most
:!   * cases, and something more is required to fix the general case.
:!   *
:!   * I hope this also fixes problems with overflow of the
:!   * multiplication.  We depend on tc not becoming stale by more
:!   * than 1 second.  We will now normally see such staleness
:!   * because it will cause the timecounter to change many times
:!   * underneath us.  There will only be problems if hardclock()
:!   * doesn't run for many seconds, but hardclock() is a very
:!   * high priority interrupt, so such problems can't happen.
:!   *
:!   * XXX should use a generation count.
:!   *
:!   * XXX problems with hardclock() can happen, e.g., at boot time
:!   * if you have fixed hardclock() to not be a broken fast interrupt
:!   * handler, or if you sit at the ddb prompt for several seconds.
:!   * Should do something to make them harmless.
:!   */
:!  do {
:!  tc = timecounter;
:!  *bt = tc-tc_offset;
:!  bintime_addx(bt, tc-tc_scale * tco_delta(tc));
:!  } while (tc != timecounter);
:  }
:
:%%%
:
:Bruce

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible?

2002-02-17 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], Bruce Evans writes:

 This occurs both with and without the gettimeofday Giant-removal patch, so
 I am fairly sure it has nothing to do with any of my current work.  This is
 running -current on a DELL2550 (2xCPUs), compiled with the SMP option.

The Gian removal doesn't come anywhere near this.

The fact that the timecounter usually goes backwards by about 0.68 seconds
is probably significant, but I can't quite explain it.

 acpi_timer0: 32-bit timer at 3.579545MHz port 0x808-0x80b on acpi0

Well:
2^32 / 3579545 = 1199.86 seconds.
2^24 / 3579545 =4.68 seconds.

So even assuming that ACPI reported a wrong width, four seconds should
still be enough to prevent a wraparound even though we cycle through
the timecounter ring in one second.

I just wrote the following fix for some of the overflow problems.

It is true that if a process is not running for arbitrary long time
the timecounter may be modified underneat it, and bruce's patch is
actually a pretty elegant solution for that case.  By all means
try it.

I have had one other report of a similar problem, with the added
information that changing to the TSC or i8254 instead of PIXX
made it go away so I am not entirely convinced this is really what
we are looking for in this case.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible?

2002-02-17 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], Matthew Dillon wri
tes:
:I just wrote the following fix for some of the overflow problems.

I don't understand how this code is supposed to handle overflows.
You seem only to be checking to see if the master timecounter has
changed to a different type.

Bruce's patch amounts to a retry if the current timecounter was updated
while we were calculating time.  It is a bit more defensive than it
needs to be and generally pessimizes the timecounters elegant lockless
design a fair bit, but it is still much better than slamming a mutex
around the entire clock code.

If this patch cures the PIIX problem, something I'm not at all convinced
about, it should go in, if not only the comment should go in.

Poul-Henning

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn'tthat be impossible?

2002-02-17 Thread Matthew Dillon

Ok, I've looked at the code more carefully and I understand how this
works now.  However, it is not enough in an SMP environment.  You
need a generation count in the timecounter structure and you also need
a synchronization point when you switch time counters or a process
running on a different cpu may wind up using a time counter that is being
actively updated.

I'm experimenting with your patch now.  I'll send email when I have 
some test results.

-Matt

:
:I just wrote the following fix for some of the overflow problems.
:
:%%%
:Index: kern_tc.c
:===
:RCS file: /home/ncvs/src/sys/kern/kern_tc.c,v
:retrieving revision 1.113
:diff -c -2 -r1.113 kern_tc.c
:...

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Success! Sorta! (was Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible? )

2002-02-17 Thread Matthew Dillon

:Bruce's patch amounts to a retry if the current timecounter was updated
:while we were calculating time.  It is a bit more defensive than it
:needs to be and generally pessimizes the timecounters elegant lockless
:design a fair bit, but it is still much better than slamming a mutex
:around the entire clock code.
:
:If this patch cures the PIIX problem, something I'm not at all convinced
:about, it should go in, if not only the comment should go in.
:
:Poul-Henning
:
:-- 
:Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20

Ok, I've tested Bruce's patch and it appeaars to mostly solve the problem.
I no longer get 'microuptime ... backwards' errors on a -current SMP
box.

However, I think to be complete we need to make it even less elegant.
The TC module is only flip-flopping between two time counters, which
means that it can flip-flop twice and the test will not work.  We need
a generation count on the timecounter as well:

do {
tc = timecounter;
gen = tc-tc_generation;
*bt = tc-tc_offset;
bintime_addx(bt, tc-tc_scale * tco_delta(tc));
} while (tc != timecounter || tc-tc_generation != gen);

There is also an issue on non-i386 machines.  The timecounter swapping
code requires a memory synchronization point after updating the 
contents of the new timecounter but before setting the 'timecounter' global
to the new timecounter.  If this is not done, non-i386 machines running
SMP may see the new timecounter structure but access pre-updated values
from it.

(i386 boxes do not have this problem because writes are ordered so the
inherent synchronication point at the end of the timer interrupt is
enough).

Is there a memory synchronization point macro available?

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Success! Sorta! (was Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible? )

2002-02-17 Thread Matthew Dillon

Whoop!  I take it back.  I'm still getting the errors:

microuptime() went backwards (458.168990 - 458.168882)
microuptime() went backwards (578.609995 - 577.929801)
microuptime() went backwards (748.912755 - 748.237402)
microuptime() went backwards (775.159625 - 775.159612)

I also think this retry loop has to be done everywhere where the
timecounter structure is accessed directly.

-Matt

:Ok, I've tested Bruce's patch and it appeaars to mostly solve the problem.
:I no longer get 'microuptime ... backwards' errors on a -current SMP
:box.
:...

Index: kern/kern_tc.c
===
RCS file: /home/ncvs/src/sys/kern/kern_tc.c,v
retrieving revision 1.113
diff -u -r1.113 kern_tc.c
--- kern/kern_tc.c  7 Feb 2002 21:21:55 -   1.113
+++ kern/kern_tc.c  17 Feb 2002 20:41:47 -
@@ -107,9 +107,11 @@
 {
struct timecounter *tc;
 
-   tc = timecounter;
-   *bt = tc-tc_offset;
-   bintime_addx(bt, tc-tc_scale * tco_delta(tc));
+   do {
+   tc = timecounter;
+   *bt = tc-tc_offset;
+   bintime_addx(bt, tc-tc_scale * tco_delta(tc));
+   } while (tc != timecounter);
 }
 
 void
@@ -126,8 +128,10 @@
struct timecounter *tc;
 
ngetmicrotime++;
-   tc = timecounter;
-   *tvp = tc-tc_microtime;
+   do {
+   tc = timecounter;
+   *tvp = tc-tc_microtime;
+   } while (tc != timecounter);
 }
 
 void
@@ -136,8 +140,10 @@
struct timecounter *tc;
 
ngetnanotime++;
-   tc = timecounter;
-   *tsp = tc-tc_nanotime;
+   do {
+   tc = timecounter;
+   *tsp = tc-tc_nanotime;
+   } while (tc != timecounter);
 }
 
 void
@@ -166,8 +172,10 @@
struct timecounter *tc;
 
ngetmicrouptime++;
-   tc = timecounter;
-   bintime2timeval(tc-tc_offset, tvp);
+   do {
+   tc = timecounter;
+   bintime2timeval(tc-tc_offset, tvp);
+   } while (tc != timecounter);
 }
 
 void
@@ -176,8 +184,10 @@
struct timecounter *tc;
 
ngetnanouptime++;
-   tc = timecounter;
-   bintime2timespec(tc-tc_offset, tsp);
+   do {
+   tc = timecounter;
+   bintime2timespec(tc-tc_offset, tsp);
+   } while (tc != timecounter);
 }
 
 void

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Success! Sorta! (was Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible? )

2002-02-17 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], Matthew Dillon wri
tes:

However, I think to be complete we need to make it even less elegant.
The TC module is only flip-flopping between two time counters, which
means that it can flip-flop twice and the test will not work.  We need
a generation count on the timecounter as well:

do {
tc = timecounter;
gen = tc-tc_generation;
*bt = tc-tc_offset;
bintime_addx(bt, tc-tc_scale * tco_delta(tc));
} while (tc != timecounter || tc-tc_generation != gen);

No, more like:

do {
tc = timecounter;
gen = tc-gen;
...
} while (gen != tc-gen);

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Success! Sorta! (was Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible? )

2002-02-17 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], Matthew Dillon wri
tes:
Whoop!  I take it back.  I'm still getting the errors:

microuptime() went backwards (458.168990 - 458.168882)
microuptime() went backwards (578.609995 - 577.929801)
microuptime() went backwards (748.912755 - 748.237402)
microuptime() went backwards (775.159625 - 775.159612)

I also think this retry loop has to be done everywhere where the
timecounter structure is accessed directly.

No, only where the timecounter hardware is read and the math
done.  As far as I know, this is the only place.

As I said, I am far from convinced this is the solution to the problem
we are looking at with the PIIX timecounter.

Msmith has some magic code in sys/dev/acpi/acpi_timer.c which tries
to identify if the PIIX counter is broken or OK and I notice that
the mask seems to always be set to 24 bits even if the hardware
is 32 bits.

I am not sure I read his code right, but he seems to default to the
unsafe method, can you try this copypasted patch ?

Index: acpi_timer.c
===
RCS file: /home/ncvs/src/sys/dev/acpica/acpi_timer.c,v
retrieving revision 1.11
diff -u -r1.11 acpi_timer.c
--- acpi_timer.c8 Jan 2002 06:45:56 -   1.11
+++ acpi_timer.c17 Feb 2002 20:48:23 -
@@ -138,7 +138,7 @@
 if (getenv(debug.acpi.timer_test) != NULL)
acpi_timer_test();
 
-acpi_timer_timecounter.tc_get_timecount = acpi_timer_get_timecount;
+acpi_timer_timecounter.tc_get_timecount = acpi_timer_get_timecount_safe;
 acpi_timer_timecounter.tc_frequency = acpi_timer_frequency;
 tc_init(acpi_timer_timecounter);
 



-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible?

2002-02-17 Thread Poul-Henning Kamp


Matt,

Easy now, there is more depth to it than that...  I have promised myself
to get the timecounter paper written and I'll probably present it at
BSDcon-euro-2002 in Amsterdam if they want to listen to me.

For now, lets concentrate on the PIIX hardware because that's where
the problem seems to be...

Poul-Henning

In message [EMAIL PROTECTED], Matthew Dillon wri
tes:
Ok, I've looked at the code more carefully and I understand how this
works now.  However, it is not enough in an SMP environment.  You
need a generation count in the timecounter structure and you also need
a synchronization point when you switch time counters or a process
running on a different cpu may wind up using a time counter that is being
actively updated.

I'm experimenting with your patch now.  I'll send email when I have 
some test results.

   -Matt

:
:I just wrote the following fix for some of the overflow problems.
:
:%%%
:Index: kern_tc.c
:===
:RCS file: /home/ncvs/src/sys/kern/kern_tc.c,v
:retrieving revision 1.113
:diff -c -2 -r1.113 kern_tc.c
:...

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible?

2002-02-17 Thread Michael Smith

 If this patch cures the PIIX problem, something I'm not at all convinced
 about, it should go in, if not only the comment should go in.

I would like to see the PIIX problem caught on camera, personally.  
We're aware of one errata for it already, and we work around it.  If 
there's another problem, or ideally if someone has some relatively quick 
code to test it, that would be much better.

-- 
To announce that there must be no criticism of the president,
or that we are to stand by the president, right or wrong, is not
only unpatriotic and servile, but is morally treasonable to 
the American public.  - Theodore Roosevelt



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ACPI timer is screwed... (was Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible? )

2002-02-17 Thread Michael Smith

 
 :I would like to see the PIIX problem caught on camera, personally.  
 :We're aware of one errata for it already, and we work around it.  If 
 :there's another problem, or ideally if someone has some relatively quick 
 :code to test it, that would be much better.
 
 Holy shit.  We are screwed.  It's a free-running counter with NO
 synchronization whatsoever.  None.  Zip.  Zero.

Sounds like we need to smack whoever made your chipset as well.  Intel 
learned their lesson (finally) with later revisions of the PIIX4.  I'm 
guessing you're running this against a ServerWorks system.

 It looks like the hardware is using an adder with fast carry
 (basically a forward looking carry calculation), which can result in a
 later bit being set before the earlier bits are updated.  However, if
 this is the case then it is almost certain that you can catch the ripple
 as well, or even catch the counter while the bit states are changing.

Lame.  Incredibly lame.

 The only way to get a guarenteed accurate sample under these circumstances
 is something like this, where you calculate a mask that results in
 reasonable accurancy without causing the cpu to go into an infinite
 loop and then read the timer until you get two samples that are the same:
 
 u1 = TIMER_READ;
 u2 = TIMER_READ;
 while ((u1 ^ u2)  0xFFF0) {   mask must be chosen
   u1 = u2;
   u2 = TIMER_READ;
 }
 return(u2  0xFFF0)same mask here
 
 Here are some more from my debug output:

Interesting.  This would be reasonably robust in the ripple-counter case 
we have to deal with on the old PIIX4.  Have you tried implementing the 
above yet, or measuring how much it costs?

At any rate, please let me know for sure whether you're running on a 
ServerWorks board, and I'll see if I can't find a Big Stick to hit them 
with.

Thanks,
Mike

-- 
To announce that there must be no criticism of the president,
or that we are to stand by the president, right or wrong, is not
only unpatriotic and servile, but is morally treasonable to 
the American public.  - Theodore Roosevelt



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



ACPI patch (was Re: 'microuptime() went backwards ...' using ACPI...)

2002-02-17 Thread Matthew Dillon

Ok, here is a patch that executes a brute-force solution to the
asynchronous counter problem.

Basically it figures out a mask and then the timer code loops until two
masked reads yield the same value, guarenteeing that we haven't caught
the timer during a carry.

On my system, the mask I got was: 0xFFFC which means I lost only 2
bits of accuracy in order to be able to retrieve accurate counter values.
This gives my particular box an approximately 1uS accuracy.

I think this may be the only safe way to do it.  It looks like it is
possible to catch the ripple and/or fast-carry on *any* bit, with the
statistical chance of it occuring on higher bits dropping by 2x per bit.

My proposed (tested) patch is below.  Mike?

acpi_timer0: 32-bit timer at 3.579545MHz mask fffc port 0x808-0x80b on acpi0
-Matt

Index: acpi_timer.c
===
RCS file: /home/ncvs/src/sys/dev/acpica/acpi_timer.c,v
retrieving revision 1.11
diff -u -r1.11 acpi_timer.c
--- acpi_timer.c8 Jan 2002 06:45:56 -   1.11
+++ acpi_timer.c17 Feb 2002 23:26:29 -
@@ -56,6 +56,7 @@
 MODULE_NAME(TIMER)
 
 static device_tacpi_timer_dev;
+static u_int32_t acpi_timer_mask;
 struct resource*acpi_timer_reg;
 #define TIMER_READ bus_space_read_4(rman_get_bustag(acpi_timer_reg),   \
 rman_get_bushandle(acpi_timer_reg),\
@@ -113,7 +114,7 @@
 acpi_timer_identify(driver_t *driver, device_t parent)
 {
 device_t   dev;
-char   desc[40];
+char   desc[48];
 intrid;
 
 FUNCTION_TRACE(__func__);
@@ -138,14 +139,32 @@
 if (getenv(debug.acpi.timer_test) != NULL)
acpi_timer_test();
 
-acpi_timer_timecounter.tc_get_timecount = acpi_timer_get_timecount;
+acpi_timer_timecounter.tc_get_timecount = acpi_timer_get_timecount_safe;
 acpi_timer_timecounter.tc_frequency = acpi_timer_frequency;
 tc_init(acpi_timer_timecounter);
 
-sprintf(desc, %d-bit timer at 3.579545MHz, AcpiGbl_FADT-TmrValExt ? 32 : 24);
+for (acpi_timer_mask = 0x; acpi_timer_mask; acpi_timer_mask = 1) {
+   u_int32_t u1;
+   u_int32_t u2;
+   int count = 10;
+
+   u1 = TIMER_READ;
+   u2 = TIMER_READ;
+   while (count  ((u1 ^ u2)  acpi_timer_mask)) {
+   u1 = u2;
+   u2 = TIMER_READ;
+   --count;
+   }
+   if (count)
+   break;
+}
+acpi_timer_mask = 1;
+
+sprintf(desc, %d-bit timer at 3.579545MHz mask %08x, 
+   (AcpiGbl_FADT-TmrValExt ? 32 : 24), acpi_timer_mask);
 device_set_desc_copy(dev, desc);
 
-#if 0
+#if 0
 {
u_int64_t   first;
 
@@ -192,16 +211,22 @@
 static unsigned
 acpi_timer_get_timecount_safe(struct timecounter *tc)
 {
-unsigned u1, u2, u3;
+u_int32_t u1;
+u_int32_t u2;
 
+u1 = TIMER_READ;
 u2 = TIMER_READ;
-u3 = TIMER_READ;
-do {
+while ((u1 ^ u2)  acpi_timer_mask) {
u1 = u2;
-   u2 = u3;
-   u3 = TIMER_READ;
-} while (u1  u2 || u2  u3);
-return (u2);
+   u2 = TIMER_READ;
+}
+#if 0  /* DEBUGGING */
+if (u2  u1) {
+   u_int32_t u3 = TIMER_READ;
+   printf(ACPI TIMER LSB MISREAD %08x %08x %08x\n, u1, u2, u3);
+}
+#endif
+return(u2  acpi_timer_mask);
 }
 
 /*

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ACPI timer is screwed... (was Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible? )

2002-02-17 Thread Matthew Dillon


:Sounds like we need to smack whoever made your chipset as well.  Intel 
:learned their lesson (finally) with later revisions of the PIIX4.  I'm 
:guessing you're running this against a ServerWorks system.

atapci0: ServerWorks ROSB4 ATA33 controller port 0x8b0-0x8bf at device 15.1 on
 pci0

Uh huh.

It might be possible to detect the situation during init-time by 
explicitly looking for a reverse indexed time in a tight loop of 
maybe a thousand reads, but that would still leave us with a statistical
chance of not guessing right.

:Interesting.  This would be reasonably robust in the ripple-counter case 
:we have to deal with on the old PIIX4.  Have you tried implementing the 
:above yet, or measuring how much it costs?
:
:At any rate, please let me know for sure whether you're running on a 
:ServerWorks board, and I'll see if I can't find a Big Stick to hit them 
:with.
:
:Thanks,
:Mike

I haven't measured the cost (extra loops) but I expect it would stabilize
in no more then one additional loop, which would be three counter reads
total or roughly the same as your originaln _safe code in the worst case.

I think we could default to the _safe version and then explicitly change
it to use the fast version if we see specific chipsets which we know
to be good.

-Matt



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ACPI timer is screwed... (was Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible? )

2002-02-17 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], Matthew Dillon wri
tes:

:I would like to see the PIIX problem caught on camera, personally.  
:We're aware of one errata for it already, and we work around it.  If 
:there's another problem, or ideally if someone has some relatively quick 
:code to test it, that would be much better.

Holy shit.  We are screwed.  It's a free-running counter with NO
synchronization whatsoever.  None.  Zip.  Zero.

Yes, there is an errata for just that on early chipsets.

Does the ..._slow patch I sent work for you ?

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ACPI patch (was Re: 'microuptime() went backwards ...' using ACPI...)

2002-02-17 Thread Michael Smith

 Ok, here is a patch that executes a brute-force solution to the
 asynchronous counter problem.
 
 Basically it figures out a mask and then the timer code loops until two
 masked reads yield the same value, guarenteeing that we haven't caught
 the timer during a carry.
 
 On my system, the mask I got was: 0xFFFC which means I lost only 2
 bits of accuracy in order to be able to retrieve accurate counter values.
 This gives my particular box an approximately 1uS accuracy.
 
 I think this may be the only safe way to do it.  It looks like it is
 possible to catch the ripple and/or fast-carry on *any* bit, with the
 statistical chance of it occuring on higher bits dropping by 2x per bit.
 
 My proposed (tested) patch is below.  Mike?

I have some reservations about this, because I'm not sure that 10 
successive reads will catch the ripple-counter problem that the old PIIX4s
have.

I think that if this code detects a need for the mask technique, it 
should set the handler to a function that will deal with the mask.  If it 
doesn't, it should assume that we need the 'safe' function as it 
currently exists (I'm not sure why it's not the default, unless there's a 
log message to explain it, it must have been a mistake on my part).

I'd also like to see the description include the number of bits, rather 
than the mask, if we are using a mask.

Thanks for taking the time to track this one down.

-- 
To announce that there must be no criticism of the president,
or that we are to stand by the president, right or wrong, is not
only unpatriotic and servile, but is morally treasonable to 
the American public.  - Theodore Roosevelt



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ACPI timer is screwed... (was Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible? )

2002-02-17 Thread Michael Smith

 :I would like to see the PIIX problem caught on camera, personally.  
 :We're aware of one errata for it already, and we work around it.  If 
 :there's another problem, or ideally if someone has some relatively quick 
 :code to test it, that would be much better.
 
 Holy shit.  We are screwed.  It's a free-running counter with NO
 synchronization whatsoever.  None.  Zip.  Zero.
 
 Yes, there is an errata for just that on early chipsets.
 
 Does the ..._slow patch I sent work for you ?

Matt's problem (look-ahead carry) will break the three-read algorithm 
because it can generate a sequence of three reads that appear to be in 
succession, but which are all wrong.

We need three different algorithms; works, ripple and look-ahead.  
Of those, works should be based exclusively off a list of known-good 
chipsets, look-ahead seems to be easily enough detected (but we should 
probably have a blacklist anyway) and ripple is hard to detect and 
should be the default case.

I really, really hate hardware.

-- 
To announce that there must be no criticism of the president,
or that we are to stand by the president, right or wrong, is not
only unpatriotic and servile, but is morally treasonable to 
the American public.  - Theodore Roosevelt



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ACPI patch (was Re: 'microuptime() went backwards ...' using ACPI...)

2002-02-17 Thread Matthew Dillon

:
:I have some reservations about this, because I'm not sure that 10 
:successive reads will catch the ripple-counter problem that the old PIIX4s
:have.

Just goes to show that I need to document my code :-)

Those reads are not detecting the ripple-counter problem, they are
figuring out the mask required to guarentee that two successive reads
will return the same counter value so the counter read in the later
code doesn't end up in an infinite loop.

i.e. on a slow cpu the mask might have to remove more bits because the
counter increments a number of times between reads.  On a fast cpu,
fewer bits would have to be chopped off.  That's all.

:I'd also like to see the description include the number of bits, rather 
:than the mask, if we are using a mask.
:
:Thanks for taking the time to track this one down.

Well, I don't want to spend too much time on it because this was
incidental to other work I'm doing on -current.  If you think it's worth
comitting I'll commit it, otherwise you can use it as a template for your
own fix/commit.  It would be nice if some sort of solution were comitted
to the tree relatively soon.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn'tthat be impossible?

2002-02-17 Thread Bruce Evans

On Sun, 17 Feb 2002, Poul-Henning Kamp wrote:

 In message [EMAIL PROTECTED], Bruce Evans writes:

  This occurs both with and without the gettimeofday Giant-removal patch, so
  I am fairly sure it has nothing to do with any of my current work.  This is
  running -current on a DELL2550 (2xCPUs), compiled with the SMP option.

 The Gian removal doesn't come anywhere near this.

Yes it did.  There were Giants in the path for all syscalls.

 The fact that the timecounter usually goes backwards by about 0.68 seconds
 is probably significant, but I can't quite explain it.

  acpi_timer0: 32-bit timer at 3.579545MHz port 0x808-0x80b on acpi0

 Well:
   2^32 / 3579545 = 1199.86 seconds.
   2^24 / 3579545 =4.68 seconds.

 So even assuming that ACPI reported a wrong width, four seconds should
 still be enough to prevent a wraparound even though we cycle through
 the timecounter ring in one second.

No, there is only one seconds worth of of wraparaond, because recent
changes annulled the slightly less recent fixes for loss of nanoseconds
above the first 10^9 of them.  Previously, the multiplication corresponding
to the one in binuptime() overflowed at 2^32 nsec.  Now it overflows
at 2^64 in units of 2^-64 seconds.

 I just wrote the following fix for some of the overflow problems.

 It is true that if a process is not running for arbitrary long time
 the timecounter may be modified underneat it, and bruce's patch is
 actually a pretty elegant solution for that case.  By all means
 try it.

Thanks :-).  In other mail you said it pessimize the timecounters
elegant lockless design a fair bit.  Here the fair bit only applies
to the elegance.  The pessimization at runtime is tiny.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Success! Sorta! (was Re: 'microuptime() went backwards ...'using ACPI timer. Shouldn't that be impossible? )

2002-02-17 Thread Bruce Evans

On Sun, 17 Feb 2002, Matthew Dillon wrote:

 Whoop!  I take it back.  I'm still getting the errors:

 microuptime() went backwards (458.168990 - 458.168882)
 microuptime() went backwards (578.609995 - 577.929801)
 microuptime() went backwards (748.912755 - 748.237402)
 microuptime() went backwards (775.159625 - 775.159612)

 I also think this retry loop has to be done everywhere where the
 timecounter structure is accessed directly.

Yes, since the reads of all the relevant timecounter variables are
non-atomic.

 Index: kern/kern_tc.c
 ===
 RCS file: /home/ncvs/src/sys/kern/kern_tc.c,v
 retrieving revision 1.113
 diff -u -r1.113 kern_tc.c
 --- kern/kern_tc.c7 Feb 2002 21:21:55 -   1.113
 +++ kern/kern_tc.c17 Feb 2002 20:41:47 -
 @@ -126,8 +128,10 @@
   struct timecounter *tc;

   ngetmicrotime++;
 - tc = timecounter;
 - *tvp = tc-tc_microtime;
 + do {
 + tc = timecounter;
 + *tvp = tc-tc_microtime;
 + } while (tc != timecounter);
  }

  void

E.g., tc_mictrotime here is a timeval.  It doesn't matter getting a stale
value (although getting a stale value increases the possible incoherency
of the get*() functions from 1/HZ to NTIMECOUNTER/HZ), but getting a
stale value that changed underneath the read would be bad.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible?

2002-02-16 Thread Matthew Dillon

Testing with a 'make -j 10 buildworld' on a -current box I am getting
regular:

microuptime() went backwards (146.826785 - 146.156715)
microuptime() went backwards (146.826782 - 146.228636)
...
microuptime() went backwards (8945.938288 - 8945.251603)
microuptime() went backwards (8945.938306 - 8945.347173)
microuptime() went backwards (9142.847550 - 9142.847546)

This occurs both with and without the gettimeofday Giant-removal patch, so
I am fairly sure it has nothing to do with any of my current work.  This is
running -current on a DELL2550 (2xCPUs), compiled with the SMP option.

Timecounter i8254  frequency 1193182 Hz
...
Timecounter ACPI  frequency 3579545 Hz
acpi_timer0: 32-bit timer at 3.579545MHz port 0x808-0x80b on acpi0
acpi_cpu0: CPU on acpi0
acpi_cpu1: CPU on acpi0
acpi_pcib0: Host-PCI bridge on acpi0
...

Question:  How can this be occuring at all?  Isn't the ACPI counter a
32 bit counter that does not have the rollover problems that the 8254 timer
has?

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn'tthat be impossible?

2002-02-16 Thread Bruce Evans

On Sat, 16 Feb 2002, Matthew Dillon wrote:

 Testing with a 'make -j 10 buildworld' on a -current box I am getting
 regular:

 microuptime() went backwards (146.826785 - 146.156715)
 microuptime() went backwards (146.826782 - 146.228636)
 ...
 microuptime() went backwards (8945.938288 - 8945.251603)
 microuptime() went backwards (8945.938306 - 8945.347173)
 microuptime() went backwards (9142.847550 - 9142.847546)

 This occurs both with and without the gettimeofday Giant-removal patch, so
 I am fairly sure it has nothing to do with any of my current work.  This is
 running -current on a DELL2550 (2xCPUs), compiled with the SMP option.

The fact that the timecounter usually goes backwards by about 0.68 seconds
is probably significant, but I can't quite explain it.

 Timecounter i8254  frequency 1193182 Hz
 ...
 Timecounter ACPI  frequency 3579545 Hz
 acpi_timer0: 32-bit timer at 3.579545MHz port 0x808-0x80b on acpi0
 acpi_cpu0: CPU on acpi0
 acpi_cpu1: CPU on acpi0
 acpi_pcib0: Host-PCI bridge on acpi0
 ...

 Question:  How can this be occuring at all?  Isn't the ACPI counter a
 32 bit counter that does not have the rollover problems that the 8254 timer
 has?

Timecounters go backwards when the timecounter update or reference code is
called insufficiently often to prevent overflow.  The rollover problems of
the i8254 timecounter actually reduce this problem.  If an i8254 rollover
is missed, then it causes the the i8254 timecounter to go forward less
than it should.

I just wrote the following fix for some of the overflow problems.

%%%
Index: kern_tc.c
===
RCS file: /home/ncvs/src/sys/kern/kern_tc.c,v
retrieving revision 1.113
diff -c -2 -r1.113 kern_tc.c
*** kern_tc.c   7 Feb 2002 21:21:55 -   1.113
--- kern_tc.c   17 Feb 2002 06:25:14 -
***
*** 108,114 
struct timecounter *tc;

!   tc = timecounter;
!   *bt = tc-tc_offset;
!   bintime_addx(bt, tc-tc_scale * tco_delta(tc));
  }

--- 95,129 
struct timecounter *tc;

!   /*
!* The loop is to handle changes of timecounter underneath us.
!* Such changes may even become normal for preemptive kernels.
!* It is quite reasonable for idle priority processes to not
!* run for many seconds, and if they are not running after
!* being preempted here, the timecounter may cycle many times
!* underneath them.  An NTIMECOUNTER of  2 is neither necessary
!* or sufficient for fixing this problem, unless NTIMECOUNTER is
!* preposterously large.  NTIMECOUNTER == 2 suffices for most
!* cases, and something more is required to fix the general case.
!*
!* I hope this also fixes problems with overflow of the
!* multiplication.  We depend on tc not becoming stale by more
!* than 1 second.  We will now normally see such staleness
!* because it will cause the timecounter to change many times
!* underneath us.  There will only be problems if hardclock()
!* doesn't run for many seconds, but hardclock() is a very
!* high priority interrupt, so such problems can't happen.
!*
!* XXX should use a generation count.
!*
!* XXX problems with hardclock() can happen, e.g., at boot time
!* if you have fixed hardclock() to not be a broken fast interrupt
!* handler, or if you sit at the ddb prompt for several seconds.
!* Should do something to make them harmless.
!*/
!   do {
!   tc = timecounter;
!   *bt = tc-tc_offset;
!   bintime_addx(bt, tc-tc_scale * tco_delta(tc));
!   } while (tc != timecounter);
  }

%%%

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: microuptime went backwards

2002-01-27 Thread Greg Lehey

On Saturday, 26 January 2002 at 16:28:04 +0100, Gabriel Ambuehl wrote:
 -BEGIN PGP SIGNED MESSAGE-

 Hello,
 I just installed 5 CURRENT (20.1. snapshot) on a K6-2 500 whic was
 previously running 4.4 STABLE without any problems but since CURRENT
 is running, it keeps printing error messages about microuptime went
 backwards which doesn't seem to hurt the system itself but is very
 annoying to work with as it jams the whole screen in a matter of
 seconds with error messages, thus making it impossible to really work
 with the system. Now in some mailinglist archive it was suggested to
 turn of the APM of the board (some Elitegroup Super Socket 7 board
 based on a SiS chipset, can dig out the specs if that helps) but
 that didn't help at all...

Funny, it always does for me.  Note that you have to build a kernel
without APM support; just disabling it doesn't work.  If that doesn't
help, you're probably on your own.

 Oh and please reply to me privately as well as I'm not reading this
 list (too much traffic)!

If you're running -CURRENT, you should be reading this list.  If you
don't have time, don't run -CURRENT.

Greg
--
See complete headers for address and phone numbers

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



microuptime went backwards

2002-01-26 Thread Gabriel Ambuehl

-BEGIN PGP SIGNED MESSAGE-

Hello,
I just installed 5 CURRENT (20.1. snapshot) on a K6-2 500 whic was
previously running 4.4 STABLE without any problems but since CURRENT
is running, it keeps printing error messages about microuptime went
backwards which doesn't seem to hurt the system itself but is very
annoying to work with as it jams the whole screen in a matter of
seconds with error messages, thus making it impossible to really work
with the system. Now in some mailinglist archive it was suggested to
turn of the APM of the board (some Elitegroup Super Socket 7 board
based on a SiS chipset, can dig out the specs if that helps) but
that didn't help at all...

Oh and please reply to me privately as well as I'm not reading this
list (too much traffic)!


Best regards,
 Gabriel

-BEGIN PGP SIGNATURE-
Version: PGP 6.5i

iQEVAwUBPFK9G8Za2WpymlDxAQFhlQgAtHfA5JbWxM9fZ/UFAq2zoLC6QLCBufXX
PmC4eisjnjGCyWmx2ukmRAZEjDa38b05tRON13Tax8U1kz1dUaFC4HLAhppUlkkO
ZsantX/MMJRAjYa6zS2Ez4lDsKr9GVmuLZxIdBGJ4Mfg+LmI+/owzQHn+m4EEeLe
TC31lNxf4Q8IJDkE4AAiuU/V3iQOPwmeia49jk6J5nqsw4ZZ0PKXEcUs53JkVDMy
jSSUBtm2lJCZnkMi27IdyzXkfwmDNJJDNj10U1qbwt/v1j6lcT8T2NW1rNJLPsQ9
3EKcTXx37RnYk8l7SJa6N69reMjgWlKi1q8f9q7ZGf6ALsqgyhmYGQ==
=bVJy
-END PGP SIGNATURE-


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Seeing a lot of 'microuptime() went backwards' messages during heavy disk I/O

2001-12-19 Thread Matthew Dillon

I'm seeing a lot of this during heavy disk I/O (5 postmark benchmarks
running in parallel):

microuptime() went backwards (44525.3954411 - 44524.563978)
microuptime() went backwards (57425.4282241 - 57424.766121)
microuptime() went backwards (57425.4282241 - 57424.845844)
microuptime() went backwards (60724.4427887 - 60724.686232)
microuptime() went backwards (60724.4427887 - 60724.768808)
microuptime() went backwards (61685.4418845 - 61685.102724)
microuptime() went backwards (61924.3906516 - 61924.246151)
microuptime() went backwards (62344.3800035 - 62344.415407)

Anyone know what's up?

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Seeing a lot of 'microuptime() went backwards' messages duringheavy disk I/O

2001-12-19 Thread Doug White

On Wed, 19 Dec 2001, Matthew Dillon wrote:

 I'm seeing a lot of this during heavy disk I/O (5 postmark benchmarks
 running in parallel):

 microuptime() went backwards (44525.3954411 - 44524.563978)
 microuptime() went backwards (57425.4282241 - 57424.766121)
 microuptime() went backwards (57425.4282241 - 57424.845844)
 microuptime() went backwards (60724.4427887 - 60724.686232)
 microuptime() went backwards (60724.4427887 - 60724.768808)
 microuptime() went backwards (61685.4418845 - 61685.102724)
 microuptime() went backwards (61924.3906516 - 61924.246151)
 microuptime() went backwards (62344.3800035 - 62344.415407)

 Anyone know what's up?

This has been asked on just about every FreeBSD list since the printf was
added.  Use the archives, man! :)

This is when you have a device generating too much interrupt latency --
enough to stall the RTC.  Usually the offender is video cards, but in this
case it could be your IDE controller.

The usual fix is to try changing timecounters; sysctl
kern.timecounter.hardware tells you what you're currently using.  If
you're on CURRENT, it's probably using ACPI.  If you want to override it
back to TSC, put 'hint.acpi_timer.0.disabled=1' in your
/boot/device.hints. Also try compiling with or without apm since this
influences the timecounter as well (although if you're on SMP you might be
stuck with the i8254).

phk can clarify. :)

Doug White|  FreeBSD: The Power to Serve
[EMAIL PROTECTED] |  www.FreeBSD.org


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Seeing a lot of 'microuptime() went backwards' messages duringheavy disk I/O

2001-12-19 Thread Matthew Dillon


:This has been asked on just about every FreeBSD list since the printf was
:added.  Use the archives, man! :)
:
:This is when you have a device generating too much interrupt latency --
:enough to stall the RTC.  Usually the offender is video cards, but in this
:case it could be your IDE controller.

This is on a SCSI machine, but I get your drift.

:The usual fix is to try changing timecounters; sysctl
:kern.timecounter.hardware tells you what you're currently using.  If
:you're on CURRENT, it's probably using ACPI.  If you want to override it
:back to TSC, put 'hint.acpi_timer.0.disabled=1' in your
:/boot/device.hints. Also try compiling with or without apm since this
:influences the timecounter as well (although if you're on SMP you might be
:stuck with the i8254).
:
:phk can clarify. :)
:
:Doug White|  FreeBSD: The Power to Serve

I would prefer that it figure out the correct time counter itself
rather then plague us with fraggin microuptime messages for the last N
years :-(

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Seeing a lot of 'microuptime() went backwards' messages duringheavy disk I/O

2001-12-19 Thread Julian Elischer

I've been seeing them a lot too on a recent (NOV) -current


On Wed, 19 Dec 2001, Matthew Dillon wrote:

 I'm seeing a lot of this during heavy disk I/O (5 postmark benchmarks
 running in parallel):
 
 microuptime() went backwards (44525.3954411 - 44524.563978)
 microuptime() went backwards (57425.4282241 - 57424.766121)
 microuptime() went backwards (57425.4282241 - 57424.845844)
 microuptime() went backwards (60724.4427887 - 60724.686232)
 microuptime() went backwards (60724.4427887 - 60724.768808)
 microuptime() went backwards (61685.4418845 - 61685.102724)
 microuptime() went backwards (61924.3906516 - 61924.246151)
 microuptime() went backwards (62344.3800035 - 62344.415407)
 
 Anyone know what's up?
 
   -Matt
   Matthew Dillon 
   [EMAIL PROTECTED]
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-current in the body of the message
 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



microuptime() went backwards

2001-08-18 Thread Mikhail Teterin

Should I worry about this:

[...]
Aug 18 11:11:51 aldan /boot/mi/kernel: calcru: negative time of -1781452130 usec for 
pid 352 (setiathome)
Aug 18 11:17:10 aldan /boot/mi/kernel: microuptime() went backwards (442732.3850800 - 
442731.165931)
Aug 18 11:17:10 aldan /boot/mi/kernel: microuptime() went backwards (442732.3850800 - 
442731.166366)
Aug 18 11:17:11 aldan /boot/mi/kernel: microuptime() went backwards (442732.3850800 - 
442732.324086)
Aug 18 11:17:11 aldan /boot/mi/kernel: microuptime() went backwards (442732.3850800 - 
442732.324293)
Aug 18 11:17:12 aldan /boot/mi/kernel: microuptime() went backwards (442732.3850800 - 
442732.814924)
Aug 18 11:17:13 aldan /boot/mi/kernel: calcru: negative time of -1498577479 usec for 
pid 352 (setiathome)

? Thanks,

-mi


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: microuptime() went backwards

2000-10-07 Thread Bruce Evans

On Sat, 7 Oct 2000, Valentin Nechayev wrote:

 At Fri, 6 Oct 2000 22:00:23 + (UTC), jhb wrote:

 JB tc_windup() wasn't called soon enough to update the timecounter.  Making
 
 On my system _each_ boot causes hundreds of these messages.
 And as described, long offsets without updating are caused by some
 code in drivers, i.e. DELAY(100) in isa/fd.c. Is it nesessary to call
 tc_windup() between iterations in isa_configure? ;|

Not really.  The timecounters overflow to bizarre values if the boot takes
a long time (more than about 4 seconds), but there is no problem unless
something looks at the time before tc_windup() is working and expects the
time to be monotonic.  Previously mi_switch() looked at the time before
tc_windup() was called, and this was impossible to avoid because getting
to tc_windup() involved a context switch.  Now the misbehaviour should
only be visible using a debugger or debugging code that prints the time.

Bruce



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



microuptime() went backwards: possible diagnosis

2000-10-06 Thread Valentin Nechayev

Following is partial dmesg output from 5.0(13)-CURRENT-20001002 kernel
with advanced debugging prints:

==={{{
mi_startup(): call subsystem 0x880(142606336), microuptime is 1.4027506
mi_startup(): call subsystem 0x880(142606336), microuptime is 1.4027897
mi_startup(): call subsystem 0xa00(167772160), microuptime is 1.4028290
mi_startup(): call subsystem 0xa80(176160768), microuptime is 1.4028701
mi_switch(): microuptime() went backwards (1.4029032 - 0.734106)
mi_switch(): microuptime() went backwards (1.4029032 - 0.734590)
mi_switch(): microuptime() went backwards (1.4029032 - 0.735077)
mi_switch(): microuptime() went backwards (1.4029032 - 0.73)
[...]
mi_switch(): microuptime() went backwards (1.4029032 - 1.053263)
mi_switch(): microuptime() went backwards (1.4029032 - 1.053756)
mi_startup(): call subsystem 0xd00(218103808), microuptime is 1.054550
proc0_post(): switchtime initialized to 1.54864
mi_startup(): call subsystem 0xe00(234881024), microuptime is 1.055501
mi_startup(): call subsystem 0xe40(239075328), microuptime is 1.056012
===}}}

mi_switch() uses static variable 'switchtime' which keeps microuptime
data when previous switching occured. For non-MP systems, proc0_post()
sets it up. But first mi_switch() call occures when SI_SUB_INT_CONFIG_HOOKS
(or SI_SUB_KICK_SCHEDULER?) subsystem called - why? IMHO it is too early
for mi_switch(), it should be enabled on SI_SUB_RUN_SCHEDULER.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

This was first question. Another one is strange value returned by
microuptime() (3-5 millions of microseconds). This value becomes strange
on my system during third SI_SUB_CONFIGURE instance, which is configure()
in i386/i386/autoconf.c:

[...]
isa_probe_children: before probing non-PNP device: microuptime=0.349777, x=0
fdc0: NEC 72065B or clone at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd.c: fd_probe(): before set_motor(): microuptime=0.291463, x=0
fdc0: FIFO enabled, 8 bytes threshold
fd.c: fd_probe(): before NE7CMD_SENSED(): microuptime=1.292409, x=0
fd0: 1440-KB 3.5" drive on fdc0 drive 0
isa_probe_children: after probing non-PNP device: microuptime=1.1653283, x=0
isa_probe_children: before probing non-PNP device: microuptime=1.1604514, x=7
atkbdc0: Keyboard controller (i8042) at port 0x60,0x64 on isa0
atkbd0: AT Keyboard flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
isa_probe_children: after probing non-PNP device: microuptime=1.3138749, x=7
[...]

it isn't possible to normalize value during this subsystem startup, is it?


/netch


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



RE: microuptime() went backwards: possible diagnosis

2000-10-06 Thread John Baldwin


On 06-Oct-00 Valentin Nechayev wrote:
 Following is partial dmesg output from 5.0(13)-CURRENT-20001002 kernel
 with advanced debugging prints:

The problem was that the interrupt threads for the clk interrupt introduced
enough latency that occasionally (mostly during a heavy load of interrupts)
tc_windup() wasn't called soon enough to update the timecounter.  Making
clock interrupts not be threaded fixes this problem.

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: microuptime() went backwards

2000-10-06 Thread Valentin Nechayev

At Fri, 6 Oct 2000 22:00:23 + (UTC), jhb wrote:

JB The problem was that the interrupt threads for the clk interrupt introduced
JB enough latency that occasionally (mostly during a heavy load of interrupts)
JB tc_windup() wasn't called soon enough to update the timecounter.  Making

On my system _each_ boot causes hundreds of these messages.
And as described, long offsets without updating are caused by some
code in drivers, i.e. DELAY(100) in isa/fd.c. Is it nesessary to call
tc_windup() between iterations in isa_configure? ;|

JB clock interrupts not be threaded fixes this problem.

Oh, well, I understand now that scheduling is nesessary to be run early
because interrupts are implemented as kernel threads even when kernel
is in phase of hardware detection.;(


/netch


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: microuptime() went backwards

2000-10-06 Thread John Baldwin


On 06-Oct-00 Valentin Nechayev wrote:
 At Fri, 6 Oct 2000 22:00:23 + (UTC), jhb wrote:
 
 JB The problem was that the interrupt threads for the clk interrupt
 introduced
 JB enough latency that occasionally (mostly during a heavy load of
 interrupts)
 JB tc_windup() wasn't called soon enough to update the timecounter.  Making
 
 On my system _each_ boot causes hundreds of these messages.
 And as described, long offsets without updating are caused by some
 code in drivers, i.e. DELAY(100) in isa/fd.c. Is it nesessary to call
 tc_windup() between iterations in isa_configure? ;|

Yes, I know.  The problem didn't occur as badly on SMP systems as the
clk ithread didn't suffer from as much latency in that case, but on UP
systems it could be quite bad.  I used to get it both during boot and
when doing things like starting up X.
 
 JB clock interrupts not be threaded fixes this problem.
 
 Oh, well, I understand now that scheduling is nesessary to be run early
 because interrupts are implemented as kernel threads even when kernel
 is in phase of hardware detection.;(

This is not done yet.  It is something that is on the todo list though
(to move SI_SUB_INIT_CLOCKS earlier, basically).

 /netch

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: recent kernel, microuptime went backwards

2000-09-20 Thread Bruce Evans

On Wed, 20 Sep 2000, John Baldwin wrote:

 On 19-Sep-00 Bruce Evans wrote:
  It really does go backwards.  This is caused by the giant lock preventing
  the clock interrupt task from running soon enough.  The giant lock can
  also prevent the clock interrupt task from running often enough even
  after booting.  E.g., "dd if=/dev/random of=/dev/null bs=large" does
  several bad things.
 
 It's not the Giant lock that is at fault.  We give up Giant during mi_switch().
  Then scheduling problem is in the way that the top-level scheduler runs.

Then the scheduler is more broken than I thought :-).

Initially there may be a locking problem as well as a scheduling problem.
Giving up Giant in the first mi_switch() is a bit late.  mi_switch() uses
microuptime(), and the clock task needs to be run before then to finish
initialization of the timecounter.

 We
 decide to schedule another process due to the timeslice ending during the clk
 interrupt thread.

How can this work?  The timeslice accounting stuff doesn't get updated until
the clock task runs.

 In the past, this was not run as a thread, so it ran, set
 the AST_* constant for needing a resched and then exited.  During doreti, we
 notice an AST is pending and call ast(). ast() calls userret() which notices
 that a resched is needed and calls mi_switch().  In the New World Order, when
 the clock interrupt occurs, we set the AST_* constant for every interrupt
 before returning from sched_ithd().  This results in the actual interrupt
 threads being schedule from ast().

What should happen is for ast() to normally schedule the clock interrupt
(and other interrupts) immediately (unless they are blocked).  This doesn't
seem to be working, and I can't see how it can work, since there is nothing
except the giant lock to tell us whether interrupts are blocked, and the
giant lock is held most of the time in system mode.  Previously, cpl told
us, but cpl is no longer maintained.


 However, when the clk ithread finishes, it
 simply calls mi_switch() to enter the next process in ithd_loop().  The
 need_resched() that it sets isn't handled until the next call to userret()
 either via a hardware interrupt or a syscall return.  Thus, the problem isn't
 due to Giant, but rather to interrupt threads.

I think this is a different problem.  It is similar to a problem for
scheduling netisrs from non-interrupt context.  schednetisr() sets the
AST flag and some other flags.  Nothing looks at these flags until an
interrupt occurs or the process sleeps.  Previously, this was handled in
splx():

s = splnet();   /* s == 0 in process context. */
queue_net_output(...);
schednetisr(...);
splx(s);/* Since s == 0, the netisr gets run here. */

Will this work again as soom as splx() is replaced by mtx_exit(), etc?
We only have a few thousand spls to change :(.

 As for the micruptime()
 messages on boot, they only occur here on a UP kernel.  On an SMP kernel I
 don't get them.  Also, they always occur during mi_switch() when an interrupt
 thread is finishing and going back to sleep.  The first such thread to be run
 to generate thet error message is the irq0: clk ithread, so the clk ithread is
 running fine.

They are very timing dependent, and probably also very task-mix
dependent.  The primary cause of microuptime() going backwards is
tv_nsec overflowing if the system takes longer than 2^32 nsec (about
4 seconds) between the initialization of the timecounter and the
timecounter maintenance for the first clock interrupt.  On one of my
systems, the first thread to call mi_switch() is the generic thread
(proc0?) that executes run_interrupt_driven_hooks().  mi_switch() is
called for the first time when the ata hook goes to sleep.  Things
would be a little different for SMP.  Hopefully another cpu handles
the clock interrupt.

Bruce



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: recent kernel, microuptime went backwards

2000-09-20 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], John Baldwin writes:

As for the micruptime()
messages on boot, they only occur here on a UP kernel.  On an SMP kernel I
don't get them.  Also, they always occur during mi_switch() when an interrupt
thread is finishing and going back to sleep.  The first such thread to be run
to generate thet error message is the irq0: clk ithread, so the clk ithread is
running fine.

The microuptime() messages occur because the timecounters expect the
i8254 clock interrupt to run "hz" times per second, and it doesn't.

In particular it doesn't during then 10-20 seconds we probe/attach
devices.

--
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD coreteam member | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: recent kernel, microuptime went backwards

2000-09-20 Thread Siobhan Patricia Lynch

John,
I get these on an SMP kernel, which locks up the box, I can't even
figure out where exactly its happening. Maybe I'm just missing something
in my kernel config file? I assumed (from UPDATING) that no real change
was
needed to the SMP options?

The hardware is an Intel N440BX motherboard with two PII333 procs.

I'm not opposed to the option that this is *my* error and not SMPNG's

PRE_SMPNG is VERY stable on this box, however the constant crashes (mostly
during high I/O periods, such as make world) have made it impossible to
use the machine for any type of real development under SMPNG. (which if I
had time since the promotion, I'd be a bit crazy right now)

If theres is anything I can do to help debug this, let me know, you can
email me privately.

-Trish


__

Trish Lynch
FreeBSD - The Power to Serve[EMAIL PROTECTED]
Rush Networking [EMAIL PROTECTED]
VA Linux Systems[EMAIL PROTECTED]
O|S|D|N [EMAIL PROTECTED]
---

"So if you ask me how do I fell inside I could honestly
 Tell you we've been taken on a very long ride
 And if my owners let me have some free time some day
 With all good intention I would probably run away
 Clutching the short straw"
-Marillion, That Time of the Night



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: recent kernel, microuptime went backwards

2000-09-20 Thread Siobhan Patricia Lynch

more info,
the N440BX has a symbios scsi card on board and an fxp card onboard.
I'm using relatively old scsi drives (4500 RPM seagates and HP drives)
I have softupdates on, and I'm now using COMPAT_OLDPCI for now, although I
only turned it back on after seeing that was the only real change I made
to my kernel config pre and post SMPNG.

-Trish

__

Trish Lynch
FreeBSD - The Power to Serve[EMAIL PROTECTED]
Rush Networking [EMAIL PROTECTED]
VA Linux Systems[EMAIL PROTECTED]
O|S|D|N [EMAIL PROTECTED]
---

"what makes me think i could start clean slated
 the hardest to learn was the least complicated"
-Indigo Girls, Least Complicated



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: recent kernel, microuptime went backwards

2000-09-20 Thread John Baldwin


On 19-Sep-00 Bruce Evans wrote:
 On Tue, 19 Sep 2000, Andrey A. Chernov wrote:
 
 With very latest kernel I got lots of
 
 microuptime() went backwards (1.3624050 - 1.998840)
 
 messages just before
 
 Mounting root from ufs:/dev/da0s1a
 
 It really does go backwards.  This is caused by the giant lock preventing
 the clock interrupt task from running soon enough.  The giant lock can
 also prevent the clock interrupt task from running often enough even
 after booting.  E.g., "dd if=/dev/random of=/dev/null bs=large" does
 several bad things.

It's not the Giant lock that is at fault.  We give up Giant during mi_switch().
 Then scheduling problem is in the way that the top-level scheduler runs.  We
decide to schedule another process due to the timeslice ending during the clk
interrupt thread.  In the past, this was not run as a thread, so it ran, set
the AST_* constant for needing a resched and then exited.  During doreti, we
notice an AST is pending and call ast(). ast() calls userret() which notices
that a resched is needed and calls mi_switch().  In the New World Order, when
the clock interrupt occurs, we set the AST_* constant for every interrupt
before returning from sched_ithd().  This results in the actual interrupt
threads being schedule from ast().  However, when the clk ithread finishes, it
simply calls mi_switch() to enter the next process in ithd_loop().  The
need_resched() that it sets isn't handled until the next call to userret()
either via a hardware interrupt or a syscall return.  Thus, the problem isn't
due to Giant, but rather to interrupt threads.  As for the micruptime()
messages on boot, they only occur here on a UP kernel.  On an SMP kernel I
don't get them.  Also, they always occur during mi_switch() when an interrupt
thread is finishing and going back to sleep.  The first such thread to be run
to generate thet error message is the irq0: clk ithread, so the clk ithread is
running fine.

 Bruce

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.cslab.vt.edu/~jobaldwi/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



recent kernel, microuptime went backwards

2000-09-19 Thread Andrey A. Chernov

With very latest kernel I got lots of

microuptime() went backwards (1.3624050 - 1.998840)

messages just before

Mounting root from ufs:/dev/da0s1a

-- 
Andrey A. Chernov
[EMAIL PROTECTED]
http://ache.pp.ru/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: recent kernel, microuptime went backwards

2000-09-19 Thread Bruce Evans

On Tue, 19 Sep 2000, Andrey A. Chernov wrote:

 With very latest kernel I got lots of
 
 microuptime() went backwards (1.3624050 - 1.998840)
 
 messages just before
 
 Mounting root from ufs:/dev/da0s1a

It really does go backwards.  This is caused by the giant lock preventing
the clock interrupt task from running soon enough.  The giant lock can
also prevent the clock interrupt task from running often enough even
after booting.  E.g., "dd if=/dev/random of=/dev/null bs=large" does
several bad things.

Bruce



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: recent kernel, microuptime went backwards

2000-09-19 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], Bruce Eva
ns writes:
On Tue, 19 Sep 2000, Andrey A. Chernov wrote:

 With very latest kernel I got lots of
 
 microuptime() went backwards (1.3624050 - 1.998840)
 
 messages just before
 
 Mounting root from ufs:/dev/da0s1a

It really does go backwards.  This is caused by the giant lock preventing
the clock interrupt task from running soon enough.  The giant lock can
also prevent the clock interrupt task from running often enough even
after booting.  E.g., "dd if=/dev/random of=/dev/null bs=large" does
several bad things.

Yes, we are royally hosed right now with respect to interrupt service.

It's a testimony to the robustness of the majority of our drivers that
they don't explode more often than they do.

--
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD coreteam member | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: recent kernel, microuptime went backwards

2000-09-19 Thread Donn Miller

On Wed, 20 Sep 2000, Bruce Evans wrote:

 On Tue, 19 Sep 2000, Andrey A. Chernov wrote:
 
  microuptime() went backwards (1.3624050 - 1.998840)

 It really does go backwards.  This is caused by the giant lock preventing
 the clock interrupt task from running soon enough.  The giant lock can
 also prevent the clock interrupt task from running often enough even
 after booting.  E.g., "dd if=/dev/random of=/dev/null bs=large" does
 several bad things.

I've noticed that, every time I do ntpdate, my clock loses about 5-11
seconds per hour.  At first I thought it may have been my CMOS battery,
but maybe it's that microuptime() thing that's causing it.  (I've got a UP
machine.)

- Donn



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: recent kernel, microuptime went backwards

2000-09-19 Thread Andrey A. Chernov

On Tue, Sep 19, 2000 at 08:30:25PM +0200, Poul-Henning Kamp wrote:
 In message [EMAIL PROTECTED], Bruce Eva
 ns writes:
 On Tue, 19 Sep 2000, Andrey A. Chernov wrote:
 
  With very latest kernel I got lots of
  
  microuptime() went backwards (1.3624050 - 1.998840)
  
  messages just before
  
  Mounting root from ufs:/dev/da0s1a
 
 It really does go backwards.  This is caused by the giant lock preventing
 the clock interrupt task from running soon enough.  The giant lock can
 also prevent the clock interrupt task from running often enough even
 after booting.  E.g., "dd if=/dev/random of=/dev/null bs=large" does
 several bad things.
 
 Yes, we are royally hosed right now with respect to interrupt service.
 
 It's a testimony to the robustness of the majority of our drivers that
 they don't explode more often than they do.

It seems it was SCSI driver in my case waiting for SCSI device settle (or
in some another state, I am not sure).

BTW, could you please summarize this diagnostic? I.e. istead of tons of
messages effectively overflowing dmesg buffer and maybe cause backwards by
time consumed by themselfs put only one message summarizing all of them,
something like "went backwards N times".

-- 
Andrey A. Chernov
[EMAIL PROTECTED]
http://ache.pp.ru/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: looking for microuptime went backwards victims...

2000-09-15 Thread Jaye Mathisen

I get it on my newsserver pretty much every night when expire runs.

12 IDE drives on 3 Promise controllers.

I was just actually about to replace the MB, thinking it was just
something freaky or unstable.

I will try the patch.

On Mon, 4 Sep 2000, Poul-Henning Kamp wrote:

 
 I'm looking for the remaining victims of the dreaded "microuptime
 went backwards" message.
 
 If you can reliably reproduce the problem, please contact me, so
 we can arrange for some very detailed tracing to try to find out
 what exactly is going on.  I have not been able to trigger the
 problem in my lab in a long time.
 
 If you see the message only occationally, please try the attached
 patch and let me know if this changes the picture.  The patch is
 not meant as a fix, but it might reduce the impact of this condition
 considerably when it happens.  Basically by reducing the width of
 the timecounter the magnitude of the hit we take if the timecounter
 goes backwards is reduced from about an hour to approx 110msec.
 
 Reducing the width to less than 17 bits starts to run the risk of
 ambiguity due to clock rollover.
 
 This patch may not be safe with PCAUDIO.
 
 Poul-Henning
 
 Index: clock.c
 ===
 RCS file: /home/ncvs/src/sys/i386/isa/clock.c,v
 retrieving revision 1.155
 diff -u -r1.155 clock.c
 --- clock.c   2000/07/30 21:05:22 1.155
 +++ clock.c   2000/09/04 16:34:16
 @@ -194,7 +194,7 @@
  static struct timecounter i8254_timecounter = {
   i8254_get_timecount,/* get_timecount */
   0,  /* no poll_pps */
 - ~0u,/* counter_mask */
 + 0x1,/* counter_mask */
   0,  /* frequency */
   "i8254" /* name */
  };
 
 --
 Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
 [EMAIL PROTECTED] | TCP/IP since RFC 956
 FreeBSD coreteam member | BSD since 4.3-tahoe
 Never attribute to malice what can adequately be explained by incompetence.
 
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with "unsubscribe freebsd-current" in the body of the message
 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: microuptime() went backwards

2000-09-10 Thread Valentin Nechayev

Hello John Baldwin!

 microuptime() went backwards (1.7682417 - 1.997434)
 
 I recall reading in -current earlier this week that someone was 
 looking for victims getting this. What further information can I provide?

JB This is a SMPng issue on UP machines.

I have obtained the same on non-SMP machine. (See my parallel letter.)


/netch


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: microuptime() went backwards

2000-09-08 Thread Kenneth Wayne Culver

If I build the kernel without apm, then there is no clock because the
statclock is broken on this computer.


=
| Kenneth Culver  | FreeBSD: The best NT upgrade|
| Unix Systems Administrator  | ICQ #: 24767726 |
| and student at The  | AIM: muythaibxr |
| The University of Maryland, | Website: (Under Construction)   |
| College Park.   | http://www.wam.umd.edu/~culverk/|
=

On Fri, 8 Sep 2000, Greg Lehey wrote:

 On Friday,  8 September 2000 at  0:18:07 -0400, Kenneth Wayne Culver wrote:
  On Fri, 8 Sep 2000, Greg Lehey wrote:
 
  On Thursday,  7 September 2000 at 22:49:30 -0400, Kenneth Wayne Culver wrote:
  The point I'm making is that we've had these problems before SMPng,
  and that you can't automatically assume that it's SMPng just because
  you get the messages.  On the other hand, the 7 digits seem to be a
  pretty reliable signature.
 
  I'm getting this error while starting XFree86 4.0.1 on my laptop computer,
  and I'm using FreeBSD 4.1-STABLE, so I'm sure it's not the SMP stuff
  that's causing it.
 
  Right, but you're not getting the 7 digit microsecond count, right?
  You should contact phk.
 
  Well, here is one of the messeges:
 
  Sep  7 14:35:55 laptop /kernel: microuptime() went backwards (10412.355980 - 
10412, -694583121)
 
  this is bad.. right ? :-)
 
 Well, at any rate it looks very funny.  If this is a laptop, try
 building a kernel without apm and see if that helps.
 
 Greg
 --
 Finger [EMAIL PROTECTED] for PGP public key
 See complete headers for address and phone numbers
 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: microuptime() went backwards / SHMSEG

2000-09-08 Thread John Toon

Steve Ames wrote:
 
 Just upgraded to -CURRENT as of about noon (EST) today. At reboot
 I got a lot of these:
 
 microuptime() went backwards (1.7682417 - 1.997434)
 
 I recall reading in -current earlier this week that someone was
 looking for victims getting this. What further information can I provide?
 
 -Steve

If it is of any relevance, I got hundreds of these messages from
4.0-RELEASE, on my Athlon 1Ghz, Abit KA7-100 system. So many, in fact,
that every time it happened, I was left with no choice but to reboot, as
it slowly brought the system to its knees. Building the kernel without
apm fixed the problem, and I have since had no problems.

-- 
"I'm entirely unaccountable for any moments of sanity contained herein."
:s/Windows/UNIX
John Toon |  [EMAIL PROTECTED]  | "Sonnilon"




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: microuptime() went backwards

2000-09-08 Thread Mike Smith

  Sep  7 14:35:55 laptop /kernel: microuptime() went backwards (10412.355980 - 
10412, -694583121)y
 
  this is bad.. right ? :-)
 
 Well, at any rate it looks very funny.  If this is a laptop, try
 building a kernel without apm and see if that helps.

It only helps "hide" the problem.  There's either *extremely* bogus data 
coming in, or an arithmetic or sequencing error that's allowing a corrupt 
timecounter to be seen.

It might help to see the negative number as hex...

-- 
... every activity meets with opposition, everyone who acts has his
rivals and unfortunately opponents also.  But not because people want
to be opponents, rather because the tasks and relationships force
people to take different points of view.  [Dr. Fritz Todt]




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: microuptime() went backwards

2000-09-08 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], Mike Smith writes:
  Sep  7 14:35:55 laptop /kernel: microuptime() went backwards (10412.355980 - 
10412, -694583121)y
 
  this is bad.. right ? :-)
 
 Well, at any rate it looks very funny.  If this is a laptop, try
 building a kernel without apm and see if that helps.

It only helps "hide" the problem.  There's either *extremely* bogus data 
coming in, or an arithmetic or sequencing error that's allowing a corrupt 
timecounter to be seen.

It might help to see the negative number as hex...

I have collected all the emails I've received and I have identified
at least two different causes:

There is a bogus i8254 implementation on certain Athlon Mobos, this
is a non-brainer since they should not use the i8254 but the TSC.

There are negative numbers coming in from both the i8254 and in a
few cases from the TSC.  NTIMECOUNTER may be too low for certain
systems, I'm still waiting for some feedback on that.

Either way, I have a patch which I need to burn in in my lab, but
right now I have a hard time getting my SMP box to even print out
"Copyright..." when it boots :-(

--
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD coreteam member | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: microuptime() went backwards

2000-09-08 Thread Mike Smith

 
 I have collected all the emails I've received and I have identified
 at least two different causes:
 
 There is a bogus i8254 implementation on certain Athlon Mobos, this
 is a non-brainer since they should not use the i8254 but the TSC.

s/TSC/ACPI timer/

It might even work on those systems. 8)

-- 
... every activity meets with opposition, everyone who acts has his
rivals and unfortunately opponents also.  But not because people want
to be opponents, rather because the tasks and relationships force
people to take different points of view.  [Dr. Fritz Todt]




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: microuptime() went backwards

2000-09-08 Thread Kenneth Wayne Culver

Could all this be in any way related to the statclock not working on
certain systems (like my Dell Inspiron)?? Also, could the "rtc" device as
listed in systat -vm not being there (I'm assuming this is the
statclock) be causing scheduling anomalies? I ask because I am trying to
get gaim (multi-threaded, and using gtk which is threaded as well) to work
on my laptop. It works on my desktop (-CURRENT), and on my friend's
desktop (-STABLE) but not on my laptop (-STABLE). Just
wondering... Thanks.


=
| Kenneth Culver  | FreeBSD: The best NT upgrade|
| Unix Systems Administrator  | ICQ #: 24767726 |
| and student at The  | AIM: muythaibxr |
| The University of Maryland, | Website: (Under Construction)   |
| College Park.   | http://www.wam.umd.edu/~culverk/|
=

On Fri, 8 Sep 2000, Poul-Henning Kamp wrote:

 In message [EMAIL PROTECTED], Mike Smith writes:
   Sep  7 14:35:55 laptop /kernel: microuptime() went backwards (10412.355980 - 
10412, -694583121)y
  
   this is bad.. right ? :-)
  
  Well, at any rate it looks very funny.  If this is a laptop, try
  building a kernel without apm and see if that helps.
 
 It only helps "hide" the problem.  There's either *extremely* bogus data 
 coming in, or an arithmetic or sequencing error that's allowing a corrupt 
 timecounter to be seen.
 
 It might help to see the negative number as hex...
 
 I have collected all the emails I've received and I have identified
 at least two different causes:
 
 There is a bogus i8254 implementation on certain Athlon Mobos, this
 is a non-brainer since they should not use the i8254 but the TSC.
 
 There are negative numbers coming in from both the i8254 and in a
 few cases from the TSC.  NTIMECOUNTER may be too low for certain
 systems, I'm still waiting for some feedback on that.
 
 Either way, I have a patch which I need to burn in in my lab, but
 right now I have a hard time getting my SMP box to even print out
 "Copyright..." when it boots :-(
 
 --
 Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
 [EMAIL PROTECTED] | TCP/IP since RFC 956
 FreeBSD coreteam member | BSD since 4.3-tahoe
 Never attribute to malice what can adequately be explained by incompetence.
 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: microuptime() went backwards

2000-09-08 Thread Siobhan Patricia Lynch

On Thu, 7 Sep 2000, John Baldwin wrote:

 
 Most definitely an SMPng issue.  If you take a SMP machine and compile a UP
 and an SMP kernel on it, the SMP kernel will boot fine, whereas the UP kernel
 will generate these warnings.
 

John,
not quite, I have an SMP box that I got these warnings on, spent
the better part of the day trying to reproduce them (after the machine
became unresponsive last night)

the machine is a N440BX Night (Intel) motherboard, with two PII
333's. the board has everything onboard except the disks. the only change
besides building an SMPng kernel was to take out COMPAT_OLDPCI.

If I can find out more for you I will, however I'm leaving for
vacation at 7:30 AM tomorrow morning.

-Trish


__

Trish Lynch
FreeBSD - The Power to Serve[EMAIL PROTECTED]
Rush Networking [EMAIL PROTECTED]
VA Linux Systems[EMAIL PROTECTED]
O|S|D|N [EMAIL PROTECTED]
---

Q: Have you guys ever been on Sesame Street as special guests?
A (from Geddy): No. Much to the dismay of my daughter.
Q: Would you be on Sesame Street if you were asked?
A (again from Geddy): G: Yes. How could you turn Sesame Street down? 
That borders on criminal. 
-Geddy Lee From Rush, on Rockline.



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



microuptime() went backwards

2000-09-07 Thread Steve Ames

Just upgraded to -CURRENT as of about noon (EST) today. At reboot
I got a lot of these:

microuptime() went backwards (1.7682417 - 1.997434)

I recall reading in -current earlier this week that someone was 
looking for victims getting this. What further information can I provide?

-Steve


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: microuptime() went backwards

2000-09-07 Thread John Baldwin

Steve Ames wrote:
 Just upgraded to -CURRENT as of about noon (EST) today. At reboot
 I got a lot of these:
 
 microuptime() went backwards (1.7682417 - 1.997434)
 
 I recall reading in -current earlier this week that someone was 
 looking for victims getting this. What further information can I provide?

This is a SMPng issue on UP machines.

 -Steve

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.cslab.vt.edu/~jobaldwi/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: microuptime() went backwards

2000-09-07 Thread Greg Lehey

On Thursday,  7 September 2000 at 17:28:02 -0700, John Baldwin wrote:
 Steve Ames wrote:
 Just upgraded to -CURRENT as of about noon (EST) today. At reboot
 I got a lot of these:

 microuptime() went backwards (1.7682417 - 1.997434)

 I recall reading in -current earlier this week that someone was
 looking for victims getting this. What further information can I provide?

 This is a SMPng issue on UP machines.

Yes, but it's not the only cause.  We've also seen it on pre-SMPng
stuff, and Athlons seem to be particularly capable of doing it.  phk
is looking at it at the moment, and he has an idea what's causing it.

I'd guess, however, that this *is* the SMPng problem.  It's subtly
different from other manifestations in that the first time typically
has 7 digits after the decimal point.

Greg
--
Finger [EMAIL PROTECTED] for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: microuptime() went backwards

2000-09-07 Thread John Baldwin

Greg Lehey wrote:
 On Thursday,  7 September 2000 at 17:28:02 -0700, John Baldwin wrote:
  Steve Ames wrote:
  Just upgraded to -CURRENT as of about noon (EST) today. At reboot
  I got a lot of these:
 
  microuptime() went backwards (1.7682417 - 1.997434)
 
  I recall reading in -current earlier this week that someone was
  looking for victims getting this. What further information can I provide?
 
  This is a SMPng issue on UP machines.
 
 Yes, but it's not the only cause.  We've also seen it on pre-SMPng
 stuff, and Athlons seem to be particularly capable of doing it.  phk
 is looking at it at the moment, and he has an idea what's causing it.
 
 I'd guess, however, that this *is* the SMPng problem.  It's subtly
 different from other manifestations in that the first time typically
 has 7 digits after the decimal point.

Most definitely an SMPng issue.  If you take a SMP machine and compile a UP
and an SMP kernel on it, the SMP kernel will boot fine, whereas the UP kernel
will generate these warnings.

 Greg
 --
 Finger [EMAIL PROTECTED] for PGP public key
 See complete headers for address and phone numbers

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.cslab.vt.edu/~jobaldwi/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: microuptime() went backwards

2000-09-07 Thread Greg Lehey

On Thursday,  7 September 2000 at 17:48:06 -0700, John Baldwin wrote:
 Greg Lehey wrote:
 On Thursday,  7 September 2000 at 17:28:02 -0700, John Baldwin wrote:
 Steve Ames wrote:
 Just upgraded to -CURRENT as of about noon (EST) today. At reboot
 I got a lot of these:

 microuptime() went backwards (1.7682417 - 1.997434)

 I recall reading in -current earlier this week that someone was
 looking for victims getting this. What further information can I provide?

 This is a SMPng issue on UP machines.

 Yes, but it's not the only cause.  We've also seen it on pre-SMPng
 stuff, and Athlons seem to be particularly capable of doing it.  phk
 is looking at it at the moment, and he has an idea what's causing it.

 I'd guess, however, that this *is* the SMPng problem.  It's subtly
 different from other manifestations in that the first time typically
 has 7 digits after the decimal point.

 Most definitely an SMPng issue.  If you take a SMP machine and compile a UP
 and an SMP kernel on it, the SMP kernel will boot fine, whereas the UP kernel
 will generate these warnings.

The point I'm making is that we've had these problems before SMPng,
and that you can't automatically assume that it's SMPng just because
you get the messages.  On the other hand, the 7 digits seem to be a
pretty reliable signature.

Greg
--
Finger [EMAIL PROTECTED] for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: microuptime() went backwards

2000-09-07 Thread Kenneth Wayne Culver

 The point I'm making is that we've had these problems before SMPng,
 and that you can't automatically assume that it's SMPng just because
 you get the messages.  On the other hand, the 7 digits seem to be a
 pretty reliable signature.
 
I'm getting this error while starting XFree86 4.0.1 on my laptop computer,
and I'm using FreeBSD 4.1-STABLE, so I'm sure it's not the SMP stuff
that's causing it.

Ken (it's a 600 MHz coppermine based celeron BTW with a mobile BX chipset)

Ken



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: microuptime() went backwards

2000-09-07 Thread Greg Lehey

On Thursday,  7 September 2000 at 22:49:30 -0400, Kenneth Wayne Culver wrote:
 The point I'm making is that we've had these problems before SMPng,
 and that you can't automatically assume that it's SMPng just because
 you get the messages.  On the other hand, the 7 digits seem to be a
 pretty reliable signature.

 I'm getting this error while starting XFree86 4.0.1 on my laptop computer,
 and I'm using FreeBSD 4.1-STABLE, so I'm sure it's not the SMP stuff
 that's causing it.

Right, but you're not getting the 7 digit microsecond count, right?
You should contact phk.

Greg
--
Finger [EMAIL PROTECTED] for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: microuptime() went backwards

2000-09-07 Thread Greg Lehey

On Friday,  8 September 2000 at  0:18:07 -0400, Kenneth Wayne Culver wrote:
 On Fri, 8 Sep 2000, Greg Lehey wrote:

 On Thursday,  7 September 2000 at 22:49:30 -0400, Kenneth Wayne Culver wrote:
 The point I'm making is that we've had these problems before SMPng,
 and that you can't automatically assume that it's SMPng just because
 you get the messages.  On the other hand, the 7 digits seem to be a
 pretty reliable signature.

 I'm getting this error while starting XFree86 4.0.1 on my laptop computer,
 and I'm using FreeBSD 4.1-STABLE, so I'm sure it's not the SMP stuff
 that's causing it.

 Right, but you're not getting the 7 digit microsecond count, right?
 You should contact phk.

 Well, here is one of the messeges:

 Sep  7 14:35:55 laptop /kernel: microuptime() went backwards (10412.355980 - 10412, 
-694583121)

 this is bad.. right ? :-)

Well, at any rate it looks very funny.  If this is a laptop, try
building a kernel without apm and see if that helps.

Greg
--
Finger [EMAIL PROTECTED] for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: microuptime() went backwards

2000-09-07 Thread Greg Lehey

On Thursday,  7 September 2000 at 22:43:11 -0700, Mike Smith wrote:
 Sep  7 14:35:55 laptop /kernel: microuptime() went backwards (10412.355980 - 
10412, -694583121)y

 this is bad.. right ? :-)

 Well, at any rate it looks very funny.  If this is a laptop, try
 building a kernel without apm and see if that helps.

 It only helps "hide" the problem.  There's either *extremely* bogus data
 coming in, or an arithmetic or sequencing error that's allowing a corrupt
 timecounter to be seen.

Correct.  As I mentioned earlier in the thread, phk has an idea what
the problem is, and he's asked for people with it to contact him.

Greg
--
Finger [EMAIL PROTECTED] for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: looking for microuptime went backwards victims...

2000-09-06 Thread Josef Karthauser

I got this last night running 'mtv' on a system with heavy disk I/O.
The errors were:

Sep  5 23:12:14 genius /kernel: pcm0: hwptr went backwards 8208 - 8192
Sep  5 23:12:47 genius /kernel: pcm0: hwptr went backwards 8420 - 8192
Sep  5 23:12:57 genius /kernel: pcm0: hwptr went backwards 8212 - 8192
Sep  5 23:13:04 genius /kernel: pcm0: hwptr went backwards 8196 - 8192
Sep  5 23:13:21 genius /kernel: pcm0: hwptr went backwards 8208 - 8192

etc.

Joe

FreeBSD genius.systems.pavilion.net 5.0-CURRENT FreeBSD 5.0-CURRENT #11: Tue Sep  5 
12:45:45 BST 2000 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENIUS  i386

On Mon, Sep 04, 2000 at 06:41:14PM +0200, Poul-Henning Kamp wrote:
 
 I'm looking for the remaining victims of the dreaded "microuptime
 went backwards" message.
 
 If you can reliably reproduce the problem, please contact me, so
 we can arrange for some very detailed tracing to try to find out
 what exactly is going on.  I have not been able to trigger the
 problem in my lab in a long time.
 
 If you see the message only occationally, please try the attached
 patch and let me know if this changes the picture.  The patch is
 not meant as a fix, but it might reduce the impact of this condition
 considerably when it happens.  Basically by reducing the width of
 the timecounter the magnitude of the hit we take if the timecounter
 goes backwards is reduced from about an hour to approx 110msec.
 
 Reducing the width to less than 17 bits starts to run the risk of
 ambiguity due to clock rollover.
 
 This patch may not be safe with PCAUDIO.
 
 Poul-Henning
 
 Index: clock.c
 ===
 RCS file: /home/ncvs/src/sys/i386/isa/clock.c,v
 retrieving revision 1.155
 diff -u -r1.155 clock.c
 --- clock.c   2000/07/30 21:05:22 1.155
 +++ clock.c   2000/09/04 16:34:16
 @@ -194,7 +194,7 @@
  static struct timecounter i8254_timecounter = {
   i8254_get_timecount,/* get_timecount */
   0,  /* no poll_pps */
 - ~0u,/* counter_mask */
 + 0x1,/* counter_mask */
   0,  /* frequency */
   "i8254" /* name */
  };
 
 --
 Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
 [EMAIL PROTECTED] | TCP/IP since RFC 956
 FreeBSD coreteam member | BSD since 4.3-tahoe
 Never attribute to malice what can adequately be explained by incompetence.
 
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with "unsubscribe freebsd-hackers" in the body of the message

-- 
Josef KarthauserFreeBSD: How many times have you booted today?
Technical Manager   Viagra for your server (http://www.uk.freebsd.org)
Pavilion Internet plc.  [[EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: looking for microuptime went backwards victims...

2000-09-06 Thread Poul-Henning Kamp


Uhm, that is from the sound driver, not from the timecounter...

Poul-Henning

In message [EMAIL PROTECTED], Josef Karthauser writes:
I got this last night running 'mtv' on a system with heavy disk I/O.
The errors were:

Sep  5 23:12:14 genius /kernel: pcm0: hwptr went backwards 8208 - 8192
Sep  5 23:12:47 genius /kernel: pcm0: hwptr went backwards 8420 - 8192
Sep  5 23:12:57 genius /kernel: pcm0: hwptr went backwards 8212 - 8192
Sep  5 23:13:04 genius /kernel: pcm0: hwptr went backwards 8196 - 8192
Sep  5 23:13:21 genius /kernel: pcm0: hwptr went backwards 8208 - 8192

etc.

Joe

FreeBSD genius.systems.pavilion.net 5.0-CURRENT FreeBSD 5.0-CURRENT #11: Tue Sep  5 
12:45:45 BST 2000 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENIUS  
i386

On Mon, Sep 04, 2000 at 06:41:14PM +0200, Poul-Henning Kamp wrote:
 
 I'm looking for the remaining victims of the dreaded "microuptime
 went backwards" message.
 
 If you can reliably reproduce the problem, please contact me, so
 we can arrange for some very detailed tracing to try to find out
 what exactly is going on.  I have not been able to trigger the
 problem in my lab in a long time.
 
 If you see the message only occationally, please try the attached
 patch and let me know if this changes the picture.  The patch is
 not meant as a fix, but it might reduce the impact of this condition
 considerably when it happens.  Basically by reducing the width of
 the timecounter the magnitude of the hit we take if the timecounter
 goes backwards is reduced from about an hour to approx 110msec.
 
 Reducing the width to less than 17 bits starts to run the risk of
 ambiguity due to clock rollover.
 
 This patch may not be safe with PCAUDIO.
 
 Poul-Henning
 
 Index: clock.c
 ===
 RCS file: /home/ncvs/src/sys/i386/isa/clock.c,v
 retrieving revision 1.155
 diff -u -r1.155 clock.c
 --- clock.c  2000/07/30 21:05:22 1.155
 +++ clock.c  2000/09/04 16:34:16
 @@ -194,7 +194,7 @@
  static struct timecounter i8254_timecounter = {
  i8254_get_timecount,/* get_timecount */
  0,  /* no poll_pps */
 -~0u,/* counter_mask */
 +0x1,/* counter_mask */
  0,  /* frequency */
  "i8254" /* name */
  };
 
 --
 Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
 [EMAIL PROTECTED] | TCP/IP since RFC 956
 FreeBSD coreteam member | BSD since 4.3-tahoe
 Never attribute to malice what can adequately be explained by incompetence.
 
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with "unsubscribe freebsd-hackers" in the body of the message

-- 
Josef Karthauser   FreeBSD: How many times have you booted today?
Technical Manager  Viagra for your server (http://www.uk.freebsd.org)
Pavilion Internet plc.  [[EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]]


--
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD coreteam member | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: looking for microuptime went backwards victims...

2000-09-06 Thread Josef Karthauser

On Wed, Sep 06, 2000 at 12:25:53PM +0200, Poul-Henning Kamp wrote:
 
 Uhm, that is from the sound driver, not from the timecounter...
 
 Poul-Henning

Oops, "You are in a maze of twisty passages all alike.  Which direction
do you want to go?".

Joe

 
 In message [EMAIL PROTECTED], Josef Karthauser writes:
 I got this last night running 'mtv' on a system with heavy disk I/O.
 The errors were:
 
 Sep  5 23:12:14 genius /kernel: pcm0: hwptr went backwards 8208 - 8192
 Sep  5 23:12:47 genius /kernel: pcm0: hwptr went backwards 8420 - 8192
 Sep  5 23:12:57 genius /kernel: pcm0: hwptr went backwards 8212 - 8192
 Sep  5 23:13:04 genius /kernel: pcm0: hwptr went backwards 8196 - 8192
 Sep  5 23:13:21 genius /kernel: pcm0: hwptr went backwards 8208 - 8192
-- 
Josef KarthauserFreeBSD: How many times have you booted today?
Technical Manager   Viagra for your server (http://www.uk.freebsd.org)
Pavilion Internet plc.  [[EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



looking for microuptime went backwards victims...

2000-09-04 Thread Poul-Henning Kamp


I'm looking for the remaining victims of the dreaded "microuptime
went backwards" message.

If you can reliably reproduce the problem, please contact me, so
we can arrange for some very detailed tracing to try to find out
what exactly is going on.  I have not been able to trigger the
problem in my lab in a long time.

If you see the message only occationally, please try the attached
patch and let me know if this changes the picture.  The patch is
not meant as a fix, but it might reduce the impact of this condition
considerably when it happens.  Basically by reducing the width of
the timecounter the magnitude of the hit we take if the timecounter
goes backwards is reduced from about an hour to approx 110msec.

Reducing the width to less than 17 bits starts to run the risk of
ambiguity due to clock rollover.

This patch may not be safe with PCAUDIO.

Poul-Henning

Index: clock.c
===
RCS file: /home/ncvs/src/sys/i386/isa/clock.c,v
retrieving revision 1.155
diff -u -r1.155 clock.c
--- clock.c 2000/07/30 21:05:22 1.155
+++ clock.c 2000/09/04 16:34:16
@@ -194,7 +194,7 @@
 static struct timecounter i8254_timecounter = {
i8254_get_timecount,/* get_timecount */
0,  /* no poll_pps */
-   ~0u,/* counter_mask */
+   0x1,/* counter_mask */
0,  /* frequency */
"i8254" /* name */
 };

--
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD coreteam member | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message