Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible?
I also see these messages on my Dell 410 Workstation at work and a Dual PIII Box I use at home to do builds with...they just seem to 'come and go' with no particular pattern to them...I have just been ignoring them for the most part...they don't really seem to cause any problems.. At 06:54 PM 2/17/2002 +1100, Bruce Evans wrote: On Sat, 16 Feb 2002, Matthew Dillon wrote: Testing with a 'make -j 10 buildworld' on a -current box I am getting regular: microuptime() went backwards (146.826785 - 146.156715) microuptime() went backwards (146.826782 - 146.228636) ... microuptime() went backwards (8945.938288 - 8945.251603) microuptime() went backwards (8945.938306 - 8945.347173) microuptime() went backwards (9142.847550 - 9142.847546) This occurs both with and without the gettimeofday Giant-removal patch, so I am fairly sure it has nothing to do with any of my current work. This is running -current on a DELL2550 (2xCPUs), compiled with the SMP option. The fact that the timecounter usually goes backwards by about 0.68 seconds is probably significant, but I can't quite explain it. Timecounter i8254 frequency 1193182 Hz ... Timecounter ACPI frequency 3579545 Hz acpi_timer0: 32-bit timer at 3.579545MHz port 0x808-0x80b on acpi0 acpi_cpu0: CPU on acpi0 acpi_cpu1: CPU on acpi0 acpi_pcib0: Host-PCI bridge on acpi0 ... Question: How can this be occuring at all? Isn't the ACPI counter a 32 bit counter that does not have the rollover problems that the 8254 timer has? Timecounters go backwards when the timecounter update or reference code is called insufficiently often to prevent overflow. The rollover problems of the i8254 timecounter actually reduce this problem. If an i8254 rollover is missed, then it causes the the i8254 timecounter to go forward less than it should. I just wrote the following fix for some of the overflow problems. %%% Index: kern_tc.c === RCS file: /home/ncvs/src/sys/kern/kern_tc.c,v retrieving revision 1.113 diff -c -2 -r1.113 kern_tc.c *** kern_tc.c 7 Feb 2002 21:21:55 - 1.113 --- kern_tc.c 17 Feb 2002 06:25:14 - *** *** 108,114 struct timecounter *tc; ! tc = timecounter; ! *bt = tc-tc_offset; ! bintime_addx(bt, tc-tc_scale * tco_delta(tc)); } --- 95,129 struct timecounter *tc; ! /* ! * The loop is to handle changes of timecounter underneath us. ! * Such changes may even become normal for preemptive kernels. ! * It is quite reasonable for idle priority processes to not ! * run for many seconds, and if they are not running after ! * being preempted here, the timecounter may cycle many times ! * underneath them. An NTIMECOUNTER of 2 is neither necessary ! * or sufficient for fixing this problem, unless NTIMECOUNTER is ! * preposterously large. NTIMECOUNTER == 2 suffices for most ! * cases, and something more is required to fix the general case. ! * ! * I hope this also fixes problems with overflow of the ! * multiplication. We depend on tc not becoming stale by more ! * than 1 second. We will now normally see such staleness ! * because it will cause the timecounter to change many times ! * underneath us. There will only be problems if hardclock() ! * doesn't run for many seconds, but hardclock() is a very ! * high priority interrupt, so such problems can't happen. ! * ! * XXX should use a generation count. ! * ! * XXX problems with hardclock() can happen, e.g., at boot time ! * if you have fixed hardclock() to not be a broken fast interrupt ! * handler, or if you sit at the ddb prompt for several seconds. ! * Should do something to make them harmless. ! */ ! do { ! tc = timecounter; ! *bt = tc-tc_offset; ! bintime_addx(bt, tc-tc_scale * tco_delta(tc)); ! } while (tc != timecounter); } %%% Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message Glenn Gombert [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn'tthat be impossible?
:I just wrote the following fix for some of the overflow problems. I don't understand how this code is supposed to handle overflows. You seem only to be checking to see if the master timecounter has changed to a different type. -Matt Matthew Dillon [EMAIL PROTECTED] :%%% :Index: kern_tc.c :=== :RCS file: /home/ncvs/src/sys/kern/kern_tc.c,v :retrieving revision 1.113 :diff -c -2 -r1.113 kern_tc.c :*** kern_tc.c 7 Feb 2002 21:21:55 - 1.113 :--- kern_tc.c 17 Feb 2002 06:25:14 - :*** :*** 108,114 : struct timecounter *tc; : :! tc = timecounter; :! *bt = tc-tc_offset; :! bintime_addx(bt, tc-tc_scale * tco_delta(tc)); : } : :--- 95,129 : struct timecounter *tc; : :! /* :! * The loop is to handle changes of timecounter underneath us. :! * Such changes may even become normal for preemptive kernels. :! * It is quite reasonable for idle priority processes to not :! * run for many seconds, and if they are not running after :! * being preempted here, the timecounter may cycle many times :! * underneath them. An NTIMECOUNTER of 2 is neither necessary :! * or sufficient for fixing this problem, unless NTIMECOUNTER is :! * preposterously large. NTIMECOUNTER == 2 suffices for most :! * cases, and something more is required to fix the general case. :! * :! * I hope this also fixes problems with overflow of the :! * multiplication. We depend on tc not becoming stale by more :! * than 1 second. We will now normally see such staleness :! * because it will cause the timecounter to change many times :! * underneath us. There will only be problems if hardclock() :! * doesn't run for many seconds, but hardclock() is a very :! * high priority interrupt, so such problems can't happen. :! * :! * XXX should use a generation count. :! * :! * XXX problems with hardclock() can happen, e.g., at boot time :! * if you have fixed hardclock() to not be a broken fast interrupt :! * handler, or if you sit at the ddb prompt for several seconds. :! * Should do something to make them harmless. :! */ :! do { :! tc = timecounter; :! *bt = tc-tc_offset; :! bintime_addx(bt, tc-tc_scale * tco_delta(tc)); :! } while (tc != timecounter); : } : :%%% : :Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible?
In message [EMAIL PROTECTED], Bruce Evans writes: This occurs both with and without the gettimeofday Giant-removal patch, so I am fairly sure it has nothing to do with any of my current work. This is running -current on a DELL2550 (2xCPUs), compiled with the SMP option. The Gian removal doesn't come anywhere near this. The fact that the timecounter usually goes backwards by about 0.68 seconds is probably significant, but I can't quite explain it. acpi_timer0: 32-bit timer at 3.579545MHz port 0x808-0x80b on acpi0 Well: 2^32 / 3579545 = 1199.86 seconds. 2^24 / 3579545 =4.68 seconds. So even assuming that ACPI reported a wrong width, four seconds should still be enough to prevent a wraparound even though we cycle through the timecounter ring in one second. I just wrote the following fix for some of the overflow problems. It is true that if a process is not running for arbitrary long time the timecounter may be modified underneat it, and bruce's patch is actually a pretty elegant solution for that case. By all means try it. I have had one other report of a similar problem, with the added information that changing to the TSC or i8254 instead of PIXX made it go away so I am not entirely convinced this is really what we are looking for in this case. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible?
In message [EMAIL PROTECTED], Matthew Dillon wri tes: :I just wrote the following fix for some of the overflow problems. I don't understand how this code is supposed to handle overflows. You seem only to be checking to see if the master timecounter has changed to a different type. Bruce's patch amounts to a retry if the current timecounter was updated while we were calculating time. It is a bit more defensive than it needs to be and generally pessimizes the timecounters elegant lockless design a fair bit, but it is still much better than slamming a mutex around the entire clock code. If this patch cures the PIIX problem, something I'm not at all convinced about, it should go in, if not only the comment should go in. Poul-Henning -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn'tthat be impossible?
Ok, I've looked at the code more carefully and I understand how this works now. However, it is not enough in an SMP environment. You need a generation count in the timecounter structure and you also need a synchronization point when you switch time counters or a process running on a different cpu may wind up using a time counter that is being actively updated. I'm experimenting with your patch now. I'll send email when I have some test results. -Matt : :I just wrote the following fix for some of the overflow problems. : :%%% :Index: kern_tc.c :=== :RCS file: /home/ncvs/src/sys/kern/kern_tc.c,v :retrieving revision 1.113 :diff -c -2 -r1.113 kern_tc.c :... To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Success! Sorta! (was Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible? )
:Bruce's patch amounts to a retry if the current timecounter was updated :while we were calculating time. It is a bit more defensive than it :needs to be and generally pessimizes the timecounters elegant lockless :design a fair bit, but it is still much better than slamming a mutex :around the entire clock code. : :If this patch cures the PIIX problem, something I'm not at all convinced :about, it should go in, if not only the comment should go in. : :Poul-Henning : :-- :Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 Ok, I've tested Bruce's patch and it appeaars to mostly solve the problem. I no longer get 'microuptime ... backwards' errors on a -current SMP box. However, I think to be complete we need to make it even less elegant. The TC module is only flip-flopping between two time counters, which means that it can flip-flop twice and the test will not work. We need a generation count on the timecounter as well: do { tc = timecounter; gen = tc-tc_generation; *bt = tc-tc_offset; bintime_addx(bt, tc-tc_scale * tco_delta(tc)); } while (tc != timecounter || tc-tc_generation != gen); There is also an issue on non-i386 machines. The timecounter swapping code requires a memory synchronization point after updating the contents of the new timecounter but before setting the 'timecounter' global to the new timecounter. If this is not done, non-i386 machines running SMP may see the new timecounter structure but access pre-updated values from it. (i386 boxes do not have this problem because writes are ordered so the inherent synchronication point at the end of the timer interrupt is enough). Is there a memory synchronization point macro available? -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Success! Sorta! (was Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible? )
Whoop! I take it back. I'm still getting the errors: microuptime() went backwards (458.168990 - 458.168882) microuptime() went backwards (578.609995 - 577.929801) microuptime() went backwards (748.912755 - 748.237402) microuptime() went backwards (775.159625 - 775.159612) I also think this retry loop has to be done everywhere where the timecounter structure is accessed directly. -Matt :Ok, I've tested Bruce's patch and it appeaars to mostly solve the problem. :I no longer get 'microuptime ... backwards' errors on a -current SMP :box. :... Index: kern/kern_tc.c === RCS file: /home/ncvs/src/sys/kern/kern_tc.c,v retrieving revision 1.113 diff -u -r1.113 kern_tc.c --- kern/kern_tc.c 7 Feb 2002 21:21:55 - 1.113 +++ kern/kern_tc.c 17 Feb 2002 20:41:47 - @@ -107,9 +107,11 @@ { struct timecounter *tc; - tc = timecounter; - *bt = tc-tc_offset; - bintime_addx(bt, tc-tc_scale * tco_delta(tc)); + do { + tc = timecounter; + *bt = tc-tc_offset; + bintime_addx(bt, tc-tc_scale * tco_delta(tc)); + } while (tc != timecounter); } void @@ -126,8 +128,10 @@ struct timecounter *tc; ngetmicrotime++; - tc = timecounter; - *tvp = tc-tc_microtime; + do { + tc = timecounter; + *tvp = tc-tc_microtime; + } while (tc != timecounter); } void @@ -136,8 +140,10 @@ struct timecounter *tc; ngetnanotime++; - tc = timecounter; - *tsp = tc-tc_nanotime; + do { + tc = timecounter; + *tsp = tc-tc_nanotime; + } while (tc != timecounter); } void @@ -166,8 +172,10 @@ struct timecounter *tc; ngetmicrouptime++; - tc = timecounter; - bintime2timeval(tc-tc_offset, tvp); + do { + tc = timecounter; + bintime2timeval(tc-tc_offset, tvp); + } while (tc != timecounter); } void @@ -176,8 +184,10 @@ struct timecounter *tc; ngetnanouptime++; - tc = timecounter; - bintime2timespec(tc-tc_offset, tsp); + do { + tc = timecounter; + bintime2timespec(tc-tc_offset, tsp); + } while (tc != timecounter); } void To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Success! Sorta! (was Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible? )
In message [EMAIL PROTECTED], Matthew Dillon wri tes: However, I think to be complete we need to make it even less elegant. The TC module is only flip-flopping between two time counters, which means that it can flip-flop twice and the test will not work. We need a generation count on the timecounter as well: do { tc = timecounter; gen = tc-tc_generation; *bt = tc-tc_offset; bintime_addx(bt, tc-tc_scale * tco_delta(tc)); } while (tc != timecounter || tc-tc_generation != gen); No, more like: do { tc = timecounter; gen = tc-gen; ... } while (gen != tc-gen); -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Success! Sorta! (was Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible? )
In message [EMAIL PROTECTED], Matthew Dillon wri tes: Whoop! I take it back. I'm still getting the errors: microuptime() went backwards (458.168990 - 458.168882) microuptime() went backwards (578.609995 - 577.929801) microuptime() went backwards (748.912755 - 748.237402) microuptime() went backwards (775.159625 - 775.159612) I also think this retry loop has to be done everywhere where the timecounter structure is accessed directly. No, only where the timecounter hardware is read and the math done. As far as I know, this is the only place. As I said, I am far from convinced this is the solution to the problem we are looking at with the PIIX timecounter. Msmith has some magic code in sys/dev/acpi/acpi_timer.c which tries to identify if the PIIX counter is broken or OK and I notice that the mask seems to always be set to 24 bits even if the hardware is 32 bits. I am not sure I read his code right, but he seems to default to the unsafe method, can you try this copypasted patch ? Index: acpi_timer.c === RCS file: /home/ncvs/src/sys/dev/acpica/acpi_timer.c,v retrieving revision 1.11 diff -u -r1.11 acpi_timer.c --- acpi_timer.c8 Jan 2002 06:45:56 - 1.11 +++ acpi_timer.c17 Feb 2002 20:48:23 - @@ -138,7 +138,7 @@ if (getenv(debug.acpi.timer_test) != NULL) acpi_timer_test(); -acpi_timer_timecounter.tc_get_timecount = acpi_timer_get_timecount; +acpi_timer_timecounter.tc_get_timecount = acpi_timer_get_timecount_safe; acpi_timer_timecounter.tc_frequency = acpi_timer_frequency; tc_init(acpi_timer_timecounter); -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible?
Matt, Easy now, there is more depth to it than that... I have promised myself to get the timecounter paper written and I'll probably present it at BSDcon-euro-2002 in Amsterdam if they want to listen to me. For now, lets concentrate on the PIIX hardware because that's where the problem seems to be... Poul-Henning In message [EMAIL PROTECTED], Matthew Dillon wri tes: Ok, I've looked at the code more carefully and I understand how this works now. However, it is not enough in an SMP environment. You need a generation count in the timecounter structure and you also need a synchronization point when you switch time counters or a process running on a different cpu may wind up using a time counter that is being actively updated. I'm experimenting with your patch now. I'll send email when I have some test results. -Matt : :I just wrote the following fix for some of the overflow problems. : :%%% :Index: kern_tc.c :=== :RCS file: /home/ncvs/src/sys/kern/kern_tc.c,v :retrieving revision 1.113 :diff -c -2 -r1.113 kern_tc.c :... To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible?
If this patch cures the PIIX problem, something I'm not at all convinced about, it should go in, if not only the comment should go in. I would like to see the PIIX problem caught on camera, personally. We're aware of one errata for it already, and we work around it. If there's another problem, or ideally if someone has some relatively quick code to test it, that would be much better. -- To announce that there must be no criticism of the president, or that we are to stand by the president, right or wrong, is not only unpatriotic and servile, but is morally treasonable to the American public. - Theodore Roosevelt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: ACPI timer is screwed... (was Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible? )
:I would like to see the PIIX problem caught on camera, personally. :We're aware of one errata for it already, and we work around it. If :there's another problem, or ideally if someone has some relatively quick :code to test it, that would be much better. Holy shit. We are screwed. It's a free-running counter with NO synchronization whatsoever. None. Zip. Zero. Sounds like we need to smack whoever made your chipset as well. Intel learned their lesson (finally) with later revisions of the PIIX4. I'm guessing you're running this against a ServerWorks system. It looks like the hardware is using an adder with fast carry (basically a forward looking carry calculation), which can result in a later bit being set before the earlier bits are updated. However, if this is the case then it is almost certain that you can catch the ripple as well, or even catch the counter while the bit states are changing. Lame. Incredibly lame. The only way to get a guarenteed accurate sample under these circumstances is something like this, where you calculate a mask that results in reasonable accurancy without causing the cpu to go into an infinite loop and then read the timer until you get two samples that are the same: u1 = TIMER_READ; u2 = TIMER_READ; while ((u1 ^ u2) 0xFFF0) { mask must be chosen u1 = u2; u2 = TIMER_READ; } return(u2 0xFFF0)same mask here Here are some more from my debug output: Interesting. This would be reasonably robust in the ripple-counter case we have to deal with on the old PIIX4. Have you tried implementing the above yet, or measuring how much it costs? At any rate, please let me know for sure whether you're running on a ServerWorks board, and I'll see if I can't find a Big Stick to hit them with. Thanks, Mike -- To announce that there must be no criticism of the president, or that we are to stand by the president, right or wrong, is not only unpatriotic and servile, but is morally treasonable to the American public. - Theodore Roosevelt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
ACPI patch (was Re: 'microuptime() went backwards ...' using ACPI...)
Ok, here is a patch that executes a brute-force solution to the asynchronous counter problem. Basically it figures out a mask and then the timer code loops until two masked reads yield the same value, guarenteeing that we haven't caught the timer during a carry. On my system, the mask I got was: 0xFFFC which means I lost only 2 bits of accuracy in order to be able to retrieve accurate counter values. This gives my particular box an approximately 1uS accuracy. I think this may be the only safe way to do it. It looks like it is possible to catch the ripple and/or fast-carry on *any* bit, with the statistical chance of it occuring on higher bits dropping by 2x per bit. My proposed (tested) patch is below. Mike? acpi_timer0: 32-bit timer at 3.579545MHz mask fffc port 0x808-0x80b on acpi0 -Matt Index: acpi_timer.c === RCS file: /home/ncvs/src/sys/dev/acpica/acpi_timer.c,v retrieving revision 1.11 diff -u -r1.11 acpi_timer.c --- acpi_timer.c8 Jan 2002 06:45:56 - 1.11 +++ acpi_timer.c17 Feb 2002 23:26:29 - @@ -56,6 +56,7 @@ MODULE_NAME(TIMER) static device_tacpi_timer_dev; +static u_int32_t acpi_timer_mask; struct resource*acpi_timer_reg; #define TIMER_READ bus_space_read_4(rman_get_bustag(acpi_timer_reg), \ rman_get_bushandle(acpi_timer_reg),\ @@ -113,7 +114,7 @@ acpi_timer_identify(driver_t *driver, device_t parent) { device_t dev; -char desc[40]; +char desc[48]; intrid; FUNCTION_TRACE(__func__); @@ -138,14 +139,32 @@ if (getenv(debug.acpi.timer_test) != NULL) acpi_timer_test(); -acpi_timer_timecounter.tc_get_timecount = acpi_timer_get_timecount; +acpi_timer_timecounter.tc_get_timecount = acpi_timer_get_timecount_safe; acpi_timer_timecounter.tc_frequency = acpi_timer_frequency; tc_init(acpi_timer_timecounter); -sprintf(desc, %d-bit timer at 3.579545MHz, AcpiGbl_FADT-TmrValExt ? 32 : 24); +for (acpi_timer_mask = 0x; acpi_timer_mask; acpi_timer_mask = 1) { + u_int32_t u1; + u_int32_t u2; + int count = 10; + + u1 = TIMER_READ; + u2 = TIMER_READ; + while (count ((u1 ^ u2) acpi_timer_mask)) { + u1 = u2; + u2 = TIMER_READ; + --count; + } + if (count) + break; +} +acpi_timer_mask = 1; + +sprintf(desc, %d-bit timer at 3.579545MHz mask %08x, + (AcpiGbl_FADT-TmrValExt ? 32 : 24), acpi_timer_mask); device_set_desc_copy(dev, desc); -#if 0 +#if 0 { u_int64_t first; @@ -192,16 +211,22 @@ static unsigned acpi_timer_get_timecount_safe(struct timecounter *tc) { -unsigned u1, u2, u3; +u_int32_t u1; +u_int32_t u2; +u1 = TIMER_READ; u2 = TIMER_READ; -u3 = TIMER_READ; -do { +while ((u1 ^ u2) acpi_timer_mask) { u1 = u2; - u2 = u3; - u3 = TIMER_READ; -} while (u1 u2 || u2 u3); -return (u2); + u2 = TIMER_READ; +} +#if 0 /* DEBUGGING */ +if (u2 u1) { + u_int32_t u3 = TIMER_READ; + printf(ACPI TIMER LSB MISREAD %08x %08x %08x\n, u1, u2, u3); +} +#endif +return(u2 acpi_timer_mask); } /* To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: ACPI timer is screwed... (was Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible? )
:Sounds like we need to smack whoever made your chipset as well. Intel :learned their lesson (finally) with later revisions of the PIIX4. I'm :guessing you're running this against a ServerWorks system. atapci0: ServerWorks ROSB4 ATA33 controller port 0x8b0-0x8bf at device 15.1 on pci0 Uh huh. It might be possible to detect the situation during init-time by explicitly looking for a reverse indexed time in a tight loop of maybe a thousand reads, but that would still leave us with a statistical chance of not guessing right. :Interesting. This would be reasonably robust in the ripple-counter case :we have to deal with on the old PIIX4. Have you tried implementing the :above yet, or measuring how much it costs? : :At any rate, please let me know for sure whether you're running on a :ServerWorks board, and I'll see if I can't find a Big Stick to hit them :with. : :Thanks, :Mike I haven't measured the cost (extra loops) but I expect it would stabilize in no more then one additional loop, which would be three counter reads total or roughly the same as your originaln _safe code in the worst case. I think we could default to the _safe version and then explicitly change it to use the fast version if we see specific chipsets which we know to be good. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: ACPI timer is screwed... (was Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible? )
In message [EMAIL PROTECTED], Matthew Dillon wri tes: :I would like to see the PIIX problem caught on camera, personally. :We're aware of one errata for it already, and we work around it. If :there's another problem, or ideally if someone has some relatively quick :code to test it, that would be much better. Holy shit. We are screwed. It's a free-running counter with NO synchronization whatsoever. None. Zip. Zero. Yes, there is an errata for just that on early chipsets. Does the ..._slow patch I sent work for you ? -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: ACPI patch (was Re: 'microuptime() went backwards ...' using ACPI...)
Ok, here is a patch that executes a brute-force solution to the asynchronous counter problem. Basically it figures out a mask and then the timer code loops until two masked reads yield the same value, guarenteeing that we haven't caught the timer during a carry. On my system, the mask I got was: 0xFFFC which means I lost only 2 bits of accuracy in order to be able to retrieve accurate counter values. This gives my particular box an approximately 1uS accuracy. I think this may be the only safe way to do it. It looks like it is possible to catch the ripple and/or fast-carry on *any* bit, with the statistical chance of it occuring on higher bits dropping by 2x per bit. My proposed (tested) patch is below. Mike? I have some reservations about this, because I'm not sure that 10 successive reads will catch the ripple-counter problem that the old PIIX4s have. I think that if this code detects a need for the mask technique, it should set the handler to a function that will deal with the mask. If it doesn't, it should assume that we need the 'safe' function as it currently exists (I'm not sure why it's not the default, unless there's a log message to explain it, it must have been a mistake on my part). I'd also like to see the description include the number of bits, rather than the mask, if we are using a mask. Thanks for taking the time to track this one down. -- To announce that there must be no criticism of the president, or that we are to stand by the president, right or wrong, is not only unpatriotic and servile, but is morally treasonable to the American public. - Theodore Roosevelt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: ACPI timer is screwed... (was Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible? )
:I would like to see the PIIX problem caught on camera, personally. :We're aware of one errata for it already, and we work around it. If :there's another problem, or ideally if someone has some relatively quick :code to test it, that would be much better. Holy shit. We are screwed. It's a free-running counter with NO synchronization whatsoever. None. Zip. Zero. Yes, there is an errata for just that on early chipsets. Does the ..._slow patch I sent work for you ? Matt's problem (look-ahead carry) will break the three-read algorithm because it can generate a sequence of three reads that appear to be in succession, but which are all wrong. We need three different algorithms; works, ripple and look-ahead. Of those, works should be based exclusively off a list of known-good chipsets, look-ahead seems to be easily enough detected (but we should probably have a blacklist anyway) and ripple is hard to detect and should be the default case. I really, really hate hardware. -- To announce that there must be no criticism of the president, or that we are to stand by the president, right or wrong, is not only unpatriotic and servile, but is morally treasonable to the American public. - Theodore Roosevelt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: ACPI patch (was Re: 'microuptime() went backwards ...' using ACPI...)
: :I have some reservations about this, because I'm not sure that 10 :successive reads will catch the ripple-counter problem that the old PIIX4s :have. Just goes to show that I need to document my code :-) Those reads are not detecting the ripple-counter problem, they are figuring out the mask required to guarentee that two successive reads will return the same counter value so the counter read in the later code doesn't end up in an infinite loop. i.e. on a slow cpu the mask might have to remove more bits because the counter increments a number of times between reads. On a fast cpu, fewer bits would have to be chopped off. That's all. :I'd also like to see the description include the number of bits, rather :than the mask, if we are using a mask. : :Thanks for taking the time to track this one down. Well, I don't want to spend too much time on it because this was incidental to other work I'm doing on -current. If you think it's worth comitting I'll commit it, otherwise you can use it as a template for your own fix/commit. It would be nice if some sort of solution were comitted to the tree relatively soon. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn'tthat be impossible?
On Sun, 17 Feb 2002, Poul-Henning Kamp wrote: In message [EMAIL PROTECTED], Bruce Evans writes: This occurs both with and without the gettimeofday Giant-removal patch, so I am fairly sure it has nothing to do with any of my current work. This is running -current on a DELL2550 (2xCPUs), compiled with the SMP option. The Gian removal doesn't come anywhere near this. Yes it did. There were Giants in the path for all syscalls. The fact that the timecounter usually goes backwards by about 0.68 seconds is probably significant, but I can't quite explain it. acpi_timer0: 32-bit timer at 3.579545MHz port 0x808-0x80b on acpi0 Well: 2^32 / 3579545 = 1199.86 seconds. 2^24 / 3579545 =4.68 seconds. So even assuming that ACPI reported a wrong width, four seconds should still be enough to prevent a wraparound even though we cycle through the timecounter ring in one second. No, there is only one seconds worth of of wraparaond, because recent changes annulled the slightly less recent fixes for loss of nanoseconds above the first 10^9 of them. Previously, the multiplication corresponding to the one in binuptime() overflowed at 2^32 nsec. Now it overflows at 2^64 in units of 2^-64 seconds. I just wrote the following fix for some of the overflow problems. It is true that if a process is not running for arbitrary long time the timecounter may be modified underneat it, and bruce's patch is actually a pretty elegant solution for that case. By all means try it. Thanks :-). In other mail you said it pessimize the timecounters elegant lockless design a fair bit. Here the fair bit only applies to the elegance. The pessimization at runtime is tiny. Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Success! Sorta! (was Re: 'microuptime() went backwards ...'using ACPI timer. Shouldn't that be impossible? )
On Sun, 17 Feb 2002, Matthew Dillon wrote: Whoop! I take it back. I'm still getting the errors: microuptime() went backwards (458.168990 - 458.168882) microuptime() went backwards (578.609995 - 577.929801) microuptime() went backwards (748.912755 - 748.237402) microuptime() went backwards (775.159625 - 775.159612) I also think this retry loop has to be done everywhere where the timecounter structure is accessed directly. Yes, since the reads of all the relevant timecounter variables are non-atomic. Index: kern/kern_tc.c === RCS file: /home/ncvs/src/sys/kern/kern_tc.c,v retrieving revision 1.113 diff -u -r1.113 kern_tc.c --- kern/kern_tc.c7 Feb 2002 21:21:55 - 1.113 +++ kern/kern_tc.c17 Feb 2002 20:41:47 - @@ -126,8 +128,10 @@ struct timecounter *tc; ngetmicrotime++; - tc = timecounter; - *tvp = tc-tc_microtime; + do { + tc = timecounter; + *tvp = tc-tc_microtime; + } while (tc != timecounter); } void E.g., tc_mictrotime here is a timeval. It doesn't matter getting a stale value (although getting a stale value increases the possible incoherency of the get*() functions from 1/HZ to NTIMECOUNTER/HZ), but getting a stale value that changed underneath the read would be bad. Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
'microuptime() went backwards ...' using ACPI timer. Shouldn't that be impossible?
Testing with a 'make -j 10 buildworld' on a -current box I am getting regular: microuptime() went backwards (146.826785 - 146.156715) microuptime() went backwards (146.826782 - 146.228636) ... microuptime() went backwards (8945.938288 - 8945.251603) microuptime() went backwards (8945.938306 - 8945.347173) microuptime() went backwards (9142.847550 - 9142.847546) This occurs both with and without the gettimeofday Giant-removal patch, so I am fairly sure it has nothing to do with any of my current work. This is running -current on a DELL2550 (2xCPUs), compiled with the SMP option. Timecounter i8254 frequency 1193182 Hz ... Timecounter ACPI frequency 3579545 Hz acpi_timer0: 32-bit timer at 3.579545MHz port 0x808-0x80b on acpi0 acpi_cpu0: CPU on acpi0 acpi_cpu1: CPU on acpi0 acpi_pcib0: Host-PCI bridge on acpi0 ... Question: How can this be occuring at all? Isn't the ACPI counter a 32 bit counter that does not have the rollover problems that the 8254 timer has? -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: 'microuptime() went backwards ...' using ACPI timer. Shouldn'tthat be impossible?
On Sat, 16 Feb 2002, Matthew Dillon wrote: Testing with a 'make -j 10 buildworld' on a -current box I am getting regular: microuptime() went backwards (146.826785 - 146.156715) microuptime() went backwards (146.826782 - 146.228636) ... microuptime() went backwards (8945.938288 - 8945.251603) microuptime() went backwards (8945.938306 - 8945.347173) microuptime() went backwards (9142.847550 - 9142.847546) This occurs both with and without the gettimeofday Giant-removal patch, so I am fairly sure it has nothing to do with any of my current work. This is running -current on a DELL2550 (2xCPUs), compiled with the SMP option. The fact that the timecounter usually goes backwards by about 0.68 seconds is probably significant, but I can't quite explain it. Timecounter i8254 frequency 1193182 Hz ... Timecounter ACPI frequency 3579545 Hz acpi_timer0: 32-bit timer at 3.579545MHz port 0x808-0x80b on acpi0 acpi_cpu0: CPU on acpi0 acpi_cpu1: CPU on acpi0 acpi_pcib0: Host-PCI bridge on acpi0 ... Question: How can this be occuring at all? Isn't the ACPI counter a 32 bit counter that does not have the rollover problems that the 8254 timer has? Timecounters go backwards when the timecounter update or reference code is called insufficiently often to prevent overflow. The rollover problems of the i8254 timecounter actually reduce this problem. If an i8254 rollover is missed, then it causes the the i8254 timecounter to go forward less than it should. I just wrote the following fix for some of the overflow problems. %%% Index: kern_tc.c === RCS file: /home/ncvs/src/sys/kern/kern_tc.c,v retrieving revision 1.113 diff -c -2 -r1.113 kern_tc.c *** kern_tc.c 7 Feb 2002 21:21:55 - 1.113 --- kern_tc.c 17 Feb 2002 06:25:14 - *** *** 108,114 struct timecounter *tc; ! tc = timecounter; ! *bt = tc-tc_offset; ! bintime_addx(bt, tc-tc_scale * tco_delta(tc)); } --- 95,129 struct timecounter *tc; ! /* !* The loop is to handle changes of timecounter underneath us. !* Such changes may even become normal for preemptive kernels. !* It is quite reasonable for idle priority processes to not !* run for many seconds, and if they are not running after !* being preempted here, the timecounter may cycle many times !* underneath them. An NTIMECOUNTER of 2 is neither necessary !* or sufficient for fixing this problem, unless NTIMECOUNTER is !* preposterously large. NTIMECOUNTER == 2 suffices for most !* cases, and something more is required to fix the general case. !* !* I hope this also fixes problems with overflow of the !* multiplication. We depend on tc not becoming stale by more !* than 1 second. We will now normally see such staleness !* because it will cause the timecounter to change many times !* underneath us. There will only be problems if hardclock() !* doesn't run for many seconds, but hardclock() is a very !* high priority interrupt, so such problems can't happen. !* !* XXX should use a generation count. !* !* XXX problems with hardclock() can happen, e.g., at boot time !* if you have fixed hardclock() to not be a broken fast interrupt !* handler, or if you sit at the ddb prompt for several seconds. !* Should do something to make them harmless. !*/ ! do { ! tc = timecounter; ! *bt = tc-tc_offset; ! bintime_addx(bt, tc-tc_scale * tco_delta(tc)); ! } while (tc != timecounter); } %%% Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: microuptime went backwards
On Saturday, 26 January 2002 at 16:28:04 +0100, Gabriel Ambuehl wrote: -BEGIN PGP SIGNED MESSAGE- Hello, I just installed 5 CURRENT (20.1. snapshot) on a K6-2 500 whic was previously running 4.4 STABLE without any problems but since CURRENT is running, it keeps printing error messages about microuptime went backwards which doesn't seem to hurt the system itself but is very annoying to work with as it jams the whole screen in a matter of seconds with error messages, thus making it impossible to really work with the system. Now in some mailinglist archive it was suggested to turn of the APM of the board (some Elitegroup Super Socket 7 board based on a SiS chipset, can dig out the specs if that helps) but that didn't help at all... Funny, it always does for me. Note that you have to build a kernel without APM support; just disabling it doesn't work. If that doesn't help, you're probably on your own. Oh and please reply to me privately as well as I'm not reading this list (too much traffic)! If you're running -CURRENT, you should be reading this list. If you don't have time, don't run -CURRENT. Greg -- See complete headers for address and phone numbers To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
microuptime went backwards
-BEGIN PGP SIGNED MESSAGE- Hello, I just installed 5 CURRENT (20.1. snapshot) on a K6-2 500 whic was previously running 4.4 STABLE without any problems but since CURRENT is running, it keeps printing error messages about microuptime went backwards which doesn't seem to hurt the system itself but is very annoying to work with as it jams the whole screen in a matter of seconds with error messages, thus making it impossible to really work with the system. Now in some mailinglist archive it was suggested to turn of the APM of the board (some Elitegroup Super Socket 7 board based on a SiS chipset, can dig out the specs if that helps) but that didn't help at all... Oh and please reply to me privately as well as I'm not reading this list (too much traffic)! Best regards, Gabriel -BEGIN PGP SIGNATURE- Version: PGP 6.5i iQEVAwUBPFK9G8Za2WpymlDxAQFhlQgAtHfA5JbWxM9fZ/UFAq2zoLC6QLCBufXX PmC4eisjnjGCyWmx2ukmRAZEjDa38b05tRON13Tax8U1kz1dUaFC4HLAhppUlkkO ZsantX/MMJRAjYa6zS2Ez4lDsKr9GVmuLZxIdBGJ4Mfg+LmI+/owzQHn+m4EEeLe TC31lNxf4Q8IJDkE4AAiuU/V3iQOPwmeia49jk6J5nqsw4ZZ0PKXEcUs53JkVDMy jSSUBtm2lJCZnkMi27IdyzXkfwmDNJJDNj10U1qbwt/v1j6lcT8T2NW1rNJLPsQ9 3EKcTXx37RnYk8l7SJa6N69reMjgWlKi1q8f9q7ZGf6ALsqgyhmYGQ== =bVJy -END PGP SIGNATURE- To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Seeing a lot of 'microuptime() went backwards' messages during heavy disk I/O
I'm seeing a lot of this during heavy disk I/O (5 postmark benchmarks running in parallel): microuptime() went backwards (44525.3954411 - 44524.563978) microuptime() went backwards (57425.4282241 - 57424.766121) microuptime() went backwards (57425.4282241 - 57424.845844) microuptime() went backwards (60724.4427887 - 60724.686232) microuptime() went backwards (60724.4427887 - 60724.768808) microuptime() went backwards (61685.4418845 - 61685.102724) microuptime() went backwards (61924.3906516 - 61924.246151) microuptime() went backwards (62344.3800035 - 62344.415407) Anyone know what's up? -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Seeing a lot of 'microuptime() went backwards' messages duringheavy disk I/O
On Wed, 19 Dec 2001, Matthew Dillon wrote: I'm seeing a lot of this during heavy disk I/O (5 postmark benchmarks running in parallel): microuptime() went backwards (44525.3954411 - 44524.563978) microuptime() went backwards (57425.4282241 - 57424.766121) microuptime() went backwards (57425.4282241 - 57424.845844) microuptime() went backwards (60724.4427887 - 60724.686232) microuptime() went backwards (60724.4427887 - 60724.768808) microuptime() went backwards (61685.4418845 - 61685.102724) microuptime() went backwards (61924.3906516 - 61924.246151) microuptime() went backwards (62344.3800035 - 62344.415407) Anyone know what's up? This has been asked on just about every FreeBSD list since the printf was added. Use the archives, man! :) This is when you have a device generating too much interrupt latency -- enough to stall the RTC. Usually the offender is video cards, but in this case it could be your IDE controller. The usual fix is to try changing timecounters; sysctl kern.timecounter.hardware tells you what you're currently using. If you're on CURRENT, it's probably using ACPI. If you want to override it back to TSC, put 'hint.acpi_timer.0.disabled=1' in your /boot/device.hints. Also try compiling with or without apm since this influences the timecounter as well (although if you're on SMP you might be stuck with the i8254). phk can clarify. :) Doug White| FreeBSD: The Power to Serve [EMAIL PROTECTED] | www.FreeBSD.org To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Seeing a lot of 'microuptime() went backwards' messages duringheavy disk I/O
:This has been asked on just about every FreeBSD list since the printf was :added. Use the archives, man! :) : :This is when you have a device generating too much interrupt latency -- :enough to stall the RTC. Usually the offender is video cards, but in this :case it could be your IDE controller. This is on a SCSI machine, but I get your drift. :The usual fix is to try changing timecounters; sysctl :kern.timecounter.hardware tells you what you're currently using. If :you're on CURRENT, it's probably using ACPI. If you want to override it :back to TSC, put 'hint.acpi_timer.0.disabled=1' in your :/boot/device.hints. Also try compiling with or without apm since this :influences the timecounter as well (although if you're on SMP you might be :stuck with the i8254). : :phk can clarify. :) : :Doug White| FreeBSD: The Power to Serve I would prefer that it figure out the correct time counter itself rather then plague us with fraggin microuptime messages for the last N years :-( -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Seeing a lot of 'microuptime() went backwards' messages duringheavy disk I/O
I've been seeing them a lot too on a recent (NOV) -current On Wed, 19 Dec 2001, Matthew Dillon wrote: I'm seeing a lot of this during heavy disk I/O (5 postmark benchmarks running in parallel): microuptime() went backwards (44525.3954411 - 44524.563978) microuptime() went backwards (57425.4282241 - 57424.766121) microuptime() went backwards (57425.4282241 - 57424.845844) microuptime() went backwards (60724.4427887 - 60724.686232) microuptime() went backwards (60724.4427887 - 60724.768808) microuptime() went backwards (61685.4418845 - 61685.102724) microuptime() went backwards (61924.3906516 - 61924.246151) microuptime() went backwards (62344.3800035 - 62344.415407) Anyone know what's up? -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
microuptime() went backwards
Should I worry about this: [...] Aug 18 11:11:51 aldan /boot/mi/kernel: calcru: negative time of -1781452130 usec for pid 352 (setiathome) Aug 18 11:17:10 aldan /boot/mi/kernel: microuptime() went backwards (442732.3850800 - 442731.165931) Aug 18 11:17:10 aldan /boot/mi/kernel: microuptime() went backwards (442732.3850800 - 442731.166366) Aug 18 11:17:11 aldan /boot/mi/kernel: microuptime() went backwards (442732.3850800 - 442732.324086) Aug 18 11:17:11 aldan /boot/mi/kernel: microuptime() went backwards (442732.3850800 - 442732.324293) Aug 18 11:17:12 aldan /boot/mi/kernel: microuptime() went backwards (442732.3850800 - 442732.814924) Aug 18 11:17:13 aldan /boot/mi/kernel: calcru: negative time of -1498577479 usec for pid 352 (setiathome) ? Thanks, -mi To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: microuptime() went backwards
On Sat, 7 Oct 2000, Valentin Nechayev wrote: At Fri, 6 Oct 2000 22:00:23 + (UTC), jhb wrote: JB tc_windup() wasn't called soon enough to update the timecounter. Making On my system _each_ boot causes hundreds of these messages. And as described, long offsets without updating are caused by some code in drivers, i.e. DELAY(100) in isa/fd.c. Is it nesessary to call tc_windup() between iterations in isa_configure? ;| Not really. The timecounters overflow to bizarre values if the boot takes a long time (more than about 4 seconds), but there is no problem unless something looks at the time before tc_windup() is working and expects the time to be monotonic. Previously mi_switch() looked at the time before tc_windup() was called, and this was impossible to avoid because getting to tc_windup() involved a context switch. Now the misbehaviour should only be visible using a debugger or debugging code that prints the time. Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
microuptime() went backwards: possible diagnosis
Following is partial dmesg output from 5.0(13)-CURRENT-20001002 kernel with advanced debugging prints: ==={{{ mi_startup(): call subsystem 0x880(142606336), microuptime is 1.4027506 mi_startup(): call subsystem 0x880(142606336), microuptime is 1.4027897 mi_startup(): call subsystem 0xa00(167772160), microuptime is 1.4028290 mi_startup(): call subsystem 0xa80(176160768), microuptime is 1.4028701 mi_switch(): microuptime() went backwards (1.4029032 - 0.734106) mi_switch(): microuptime() went backwards (1.4029032 - 0.734590) mi_switch(): microuptime() went backwards (1.4029032 - 0.735077) mi_switch(): microuptime() went backwards (1.4029032 - 0.73) [...] mi_switch(): microuptime() went backwards (1.4029032 - 1.053263) mi_switch(): microuptime() went backwards (1.4029032 - 1.053756) mi_startup(): call subsystem 0xd00(218103808), microuptime is 1.054550 proc0_post(): switchtime initialized to 1.54864 mi_startup(): call subsystem 0xe00(234881024), microuptime is 1.055501 mi_startup(): call subsystem 0xe40(239075328), microuptime is 1.056012 ===}}} mi_switch() uses static variable 'switchtime' which keeps microuptime data when previous switching occured. For non-MP systems, proc0_post() sets it up. But first mi_switch() call occures when SI_SUB_INT_CONFIG_HOOKS (or SI_SUB_KICK_SCHEDULER?) subsystem called - why? IMHO it is too early for mi_switch(), it should be enabled on SI_SUB_RUN_SCHEDULER. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - This was first question. Another one is strange value returned by microuptime() (3-5 millions of microseconds). This value becomes strange on my system during third SI_SUB_CONFIGURE instance, which is configure() in i386/i386/autoconf.c: [...] isa_probe_children: before probing non-PNP device: microuptime=0.349777, x=0 fdc0: NEC 72065B or clone at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd.c: fd_probe(): before set_motor(): microuptime=0.291463, x=0 fdc0: FIFO enabled, 8 bytes threshold fd.c: fd_probe(): before NE7CMD_SENSED(): microuptime=1.292409, x=0 fd0: 1440-KB 3.5" drive on fdc0 drive 0 isa_probe_children: after probing non-PNP device: microuptime=1.1653283, x=0 isa_probe_children: before probing non-PNP device: microuptime=1.1604514, x=7 atkbdc0: Keyboard controller (i8042) at port 0x60,0x64 on isa0 atkbd0: AT Keyboard flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 isa_probe_children: after probing non-PNP device: microuptime=1.3138749, x=7 [...] it isn't possible to normalize value during this subsystem startup, is it? /netch To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
RE: microuptime() went backwards: possible diagnosis
On 06-Oct-00 Valentin Nechayev wrote: Following is partial dmesg output from 5.0(13)-CURRENT-20001002 kernel with advanced debugging prints: The problem was that the interrupt threads for the clk interrupt introduced enough latency that occasionally (mostly during a heavy load of interrupts) tc_windup() wasn't called soon enough to update the timecounter. Making clock interrupts not be threaded fixes this problem. -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: microuptime() went backwards
At Fri, 6 Oct 2000 22:00:23 + (UTC), jhb wrote: JB The problem was that the interrupt threads for the clk interrupt introduced JB enough latency that occasionally (mostly during a heavy load of interrupts) JB tc_windup() wasn't called soon enough to update the timecounter. Making On my system _each_ boot causes hundreds of these messages. And as described, long offsets without updating are caused by some code in drivers, i.e. DELAY(100) in isa/fd.c. Is it nesessary to call tc_windup() between iterations in isa_configure? ;| JB clock interrupts not be threaded fixes this problem. Oh, well, I understand now that scheduling is nesessary to be run early because interrupts are implemented as kernel threads even when kernel is in phase of hardware detection.;( /netch To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: microuptime() went backwards
On 06-Oct-00 Valentin Nechayev wrote: At Fri, 6 Oct 2000 22:00:23 + (UTC), jhb wrote: JB The problem was that the interrupt threads for the clk interrupt introduced JB enough latency that occasionally (mostly during a heavy load of interrupts) JB tc_windup() wasn't called soon enough to update the timecounter. Making On my system _each_ boot causes hundreds of these messages. And as described, long offsets without updating are caused by some code in drivers, i.e. DELAY(100) in isa/fd.c. Is it nesessary to call tc_windup() between iterations in isa_configure? ;| Yes, I know. The problem didn't occur as badly on SMP systems as the clk ithread didn't suffer from as much latency in that case, but on UP systems it could be quite bad. I used to get it both during boot and when doing things like starting up X. JB clock interrupts not be threaded fixes this problem. Oh, well, I understand now that scheduling is nesessary to be run early because interrupts are implemented as kernel threads even when kernel is in phase of hardware detection.;( This is not done yet. It is something that is on the todo list though (to move SI_SUB_INIT_CLOCKS earlier, basically). /netch -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: recent kernel, microuptime went backwards
On Wed, 20 Sep 2000, John Baldwin wrote: On 19-Sep-00 Bruce Evans wrote: It really does go backwards. This is caused by the giant lock preventing the clock interrupt task from running soon enough. The giant lock can also prevent the clock interrupt task from running often enough even after booting. E.g., "dd if=/dev/random of=/dev/null bs=large" does several bad things. It's not the Giant lock that is at fault. We give up Giant during mi_switch(). Then scheduling problem is in the way that the top-level scheduler runs. Then the scheduler is more broken than I thought :-). Initially there may be a locking problem as well as a scheduling problem. Giving up Giant in the first mi_switch() is a bit late. mi_switch() uses microuptime(), and the clock task needs to be run before then to finish initialization of the timecounter. We decide to schedule another process due to the timeslice ending during the clk interrupt thread. How can this work? The timeslice accounting stuff doesn't get updated until the clock task runs. In the past, this was not run as a thread, so it ran, set the AST_* constant for needing a resched and then exited. During doreti, we notice an AST is pending and call ast(). ast() calls userret() which notices that a resched is needed and calls mi_switch(). In the New World Order, when the clock interrupt occurs, we set the AST_* constant for every interrupt before returning from sched_ithd(). This results in the actual interrupt threads being schedule from ast(). What should happen is for ast() to normally schedule the clock interrupt (and other interrupts) immediately (unless they are blocked). This doesn't seem to be working, and I can't see how it can work, since there is nothing except the giant lock to tell us whether interrupts are blocked, and the giant lock is held most of the time in system mode. Previously, cpl told us, but cpl is no longer maintained. However, when the clk ithread finishes, it simply calls mi_switch() to enter the next process in ithd_loop(). The need_resched() that it sets isn't handled until the next call to userret() either via a hardware interrupt or a syscall return. Thus, the problem isn't due to Giant, but rather to interrupt threads. I think this is a different problem. It is similar to a problem for scheduling netisrs from non-interrupt context. schednetisr() sets the AST flag and some other flags. Nothing looks at these flags until an interrupt occurs or the process sleeps. Previously, this was handled in splx(): s = splnet(); /* s == 0 in process context. */ queue_net_output(...); schednetisr(...); splx(s);/* Since s == 0, the netisr gets run here. */ Will this work again as soom as splx() is replaced by mtx_exit(), etc? We only have a few thousand spls to change :(. As for the micruptime() messages on boot, they only occur here on a UP kernel. On an SMP kernel I don't get them. Also, they always occur during mi_switch() when an interrupt thread is finishing and going back to sleep. The first such thread to be run to generate thet error message is the irq0: clk ithread, so the clk ithread is running fine. They are very timing dependent, and probably also very task-mix dependent. The primary cause of microuptime() going backwards is tv_nsec overflowing if the system takes longer than 2^32 nsec (about 4 seconds) between the initialization of the timecounter and the timecounter maintenance for the first clock interrupt. On one of my systems, the first thread to call mi_switch() is the generic thread (proc0?) that executes run_interrupt_driven_hooks(). mi_switch() is called for the first time when the ata hook goes to sleep. Things would be a little different for SMP. Hopefully another cpu handles the clock interrupt. Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: recent kernel, microuptime went backwards
In message [EMAIL PROTECTED], John Baldwin writes: As for the micruptime() messages on boot, they only occur here on a UP kernel. On an SMP kernel I don't get them. Also, they always occur during mi_switch() when an interrupt thread is finishing and going back to sleep. The first such thread to be run to generate thet error message is the irq0: clk ithread, so the clk ithread is running fine. The microuptime() messages occur because the timecounters expect the i8254 clock interrupt to run "hz" times per second, and it doesn't. In particular it doesn't during then 10-20 seconds we probe/attach devices. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD coreteam member | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: recent kernel, microuptime went backwards
John, I get these on an SMP kernel, which locks up the box, I can't even figure out where exactly its happening. Maybe I'm just missing something in my kernel config file? I assumed (from UPDATING) that no real change was needed to the SMP options? The hardware is an Intel N440BX motherboard with two PII333 procs. I'm not opposed to the option that this is *my* error and not SMPNG's PRE_SMPNG is VERY stable on this box, however the constant crashes (mostly during high I/O periods, such as make world) have made it impossible to use the machine for any type of real development under SMPNG. (which if I had time since the promotion, I'd be a bit crazy right now) If theres is anything I can do to help debug this, let me know, you can email me privately. -Trish __ Trish Lynch FreeBSD - The Power to Serve[EMAIL PROTECTED] Rush Networking [EMAIL PROTECTED] VA Linux Systems[EMAIL PROTECTED] O|S|D|N [EMAIL PROTECTED] --- "So if you ask me how do I fell inside I could honestly Tell you we've been taken on a very long ride And if my owners let me have some free time some day With all good intention I would probably run away Clutching the short straw" -Marillion, That Time of the Night To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: recent kernel, microuptime went backwards
more info, the N440BX has a symbios scsi card on board and an fxp card onboard. I'm using relatively old scsi drives (4500 RPM seagates and HP drives) I have softupdates on, and I'm now using COMPAT_OLDPCI for now, although I only turned it back on after seeing that was the only real change I made to my kernel config pre and post SMPNG. -Trish __ Trish Lynch FreeBSD - The Power to Serve[EMAIL PROTECTED] Rush Networking [EMAIL PROTECTED] VA Linux Systems[EMAIL PROTECTED] O|S|D|N [EMAIL PROTECTED] --- "what makes me think i could start clean slated the hardest to learn was the least complicated" -Indigo Girls, Least Complicated To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: recent kernel, microuptime went backwards
On 19-Sep-00 Bruce Evans wrote: On Tue, 19 Sep 2000, Andrey A. Chernov wrote: With very latest kernel I got lots of microuptime() went backwards (1.3624050 - 1.998840) messages just before Mounting root from ufs:/dev/da0s1a It really does go backwards. This is caused by the giant lock preventing the clock interrupt task from running soon enough. The giant lock can also prevent the clock interrupt task from running often enough even after booting. E.g., "dd if=/dev/random of=/dev/null bs=large" does several bad things. It's not the Giant lock that is at fault. We give up Giant during mi_switch(). Then scheduling problem is in the way that the top-level scheduler runs. We decide to schedule another process due to the timeslice ending during the clk interrupt thread. In the past, this was not run as a thread, so it ran, set the AST_* constant for needing a resched and then exited. During doreti, we notice an AST is pending and call ast(). ast() calls userret() which notices that a resched is needed and calls mi_switch(). In the New World Order, when the clock interrupt occurs, we set the AST_* constant for every interrupt before returning from sched_ithd(). This results in the actual interrupt threads being schedule from ast(). However, when the clk ithread finishes, it simply calls mi_switch() to enter the next process in ithd_loop(). The need_resched() that it sets isn't handled until the next call to userret() either via a hardware interrupt or a syscall return. Thus, the problem isn't due to Giant, but rather to interrupt threads. As for the micruptime() messages on boot, they only occur here on a UP kernel. On an SMP kernel I don't get them. Also, they always occur during mi_switch() when an interrupt thread is finishing and going back to sleep. The first such thread to be run to generate thet error message is the irq0: clk ithread, so the clk ithread is running fine. Bruce -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.cslab.vt.edu/~jobaldwi/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
recent kernel, microuptime went backwards
With very latest kernel I got lots of microuptime() went backwards (1.3624050 - 1.998840) messages just before Mounting root from ufs:/dev/da0s1a -- Andrey A. Chernov [EMAIL PROTECTED] http://ache.pp.ru/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: recent kernel, microuptime went backwards
On Tue, 19 Sep 2000, Andrey A. Chernov wrote: With very latest kernel I got lots of microuptime() went backwards (1.3624050 - 1.998840) messages just before Mounting root from ufs:/dev/da0s1a It really does go backwards. This is caused by the giant lock preventing the clock interrupt task from running soon enough. The giant lock can also prevent the clock interrupt task from running often enough even after booting. E.g., "dd if=/dev/random of=/dev/null bs=large" does several bad things. Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: recent kernel, microuptime went backwards
In message [EMAIL PROTECTED], Bruce Eva ns writes: On Tue, 19 Sep 2000, Andrey A. Chernov wrote: With very latest kernel I got lots of microuptime() went backwards (1.3624050 - 1.998840) messages just before Mounting root from ufs:/dev/da0s1a It really does go backwards. This is caused by the giant lock preventing the clock interrupt task from running soon enough. The giant lock can also prevent the clock interrupt task from running often enough even after booting. E.g., "dd if=/dev/random of=/dev/null bs=large" does several bad things. Yes, we are royally hosed right now with respect to interrupt service. It's a testimony to the robustness of the majority of our drivers that they don't explode more often than they do. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD coreteam member | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: recent kernel, microuptime went backwards
On Wed, 20 Sep 2000, Bruce Evans wrote: On Tue, 19 Sep 2000, Andrey A. Chernov wrote: microuptime() went backwards (1.3624050 - 1.998840) It really does go backwards. This is caused by the giant lock preventing the clock interrupt task from running soon enough. The giant lock can also prevent the clock interrupt task from running often enough even after booting. E.g., "dd if=/dev/random of=/dev/null bs=large" does several bad things. I've noticed that, every time I do ntpdate, my clock loses about 5-11 seconds per hour. At first I thought it may have been my CMOS battery, but maybe it's that microuptime() thing that's causing it. (I've got a UP machine.) - Donn To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: recent kernel, microuptime went backwards
On Tue, Sep 19, 2000 at 08:30:25PM +0200, Poul-Henning Kamp wrote: In message [EMAIL PROTECTED], Bruce Eva ns writes: On Tue, 19 Sep 2000, Andrey A. Chernov wrote: With very latest kernel I got lots of microuptime() went backwards (1.3624050 - 1.998840) messages just before Mounting root from ufs:/dev/da0s1a It really does go backwards. This is caused by the giant lock preventing the clock interrupt task from running soon enough. The giant lock can also prevent the clock interrupt task from running often enough even after booting. E.g., "dd if=/dev/random of=/dev/null bs=large" does several bad things. Yes, we are royally hosed right now with respect to interrupt service. It's a testimony to the robustness of the majority of our drivers that they don't explode more often than they do. It seems it was SCSI driver in my case waiting for SCSI device settle (or in some another state, I am not sure). BTW, could you please summarize this diagnostic? I.e. istead of tons of messages effectively overflowing dmesg buffer and maybe cause backwards by time consumed by themselfs put only one message summarizing all of them, something like "went backwards N times". -- Andrey A. Chernov [EMAIL PROTECTED] http://ache.pp.ru/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: looking for microuptime went backwards victims...
I get it on my newsserver pretty much every night when expire runs. 12 IDE drives on 3 Promise controllers. I was just actually about to replace the MB, thinking it was just something freaky or unstable. I will try the patch. On Mon, 4 Sep 2000, Poul-Henning Kamp wrote: I'm looking for the remaining victims of the dreaded "microuptime went backwards" message. If you can reliably reproduce the problem, please contact me, so we can arrange for some very detailed tracing to try to find out what exactly is going on. I have not been able to trigger the problem in my lab in a long time. If you see the message only occationally, please try the attached patch and let me know if this changes the picture. The patch is not meant as a fix, but it might reduce the impact of this condition considerably when it happens. Basically by reducing the width of the timecounter the magnitude of the hit we take if the timecounter goes backwards is reduced from about an hour to approx 110msec. Reducing the width to less than 17 bits starts to run the risk of ambiguity due to clock rollover. This patch may not be safe with PCAUDIO. Poul-Henning Index: clock.c === RCS file: /home/ncvs/src/sys/i386/isa/clock.c,v retrieving revision 1.155 diff -u -r1.155 clock.c --- clock.c 2000/07/30 21:05:22 1.155 +++ clock.c 2000/09/04 16:34:16 @@ -194,7 +194,7 @@ static struct timecounter i8254_timecounter = { i8254_get_timecount,/* get_timecount */ 0, /* no poll_pps */ - ~0u,/* counter_mask */ + 0x1,/* counter_mask */ 0, /* frequency */ "i8254" /* name */ }; -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD coreteam member | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: microuptime() went backwards
Hello John Baldwin! microuptime() went backwards (1.7682417 - 1.997434) I recall reading in -current earlier this week that someone was looking for victims getting this. What further information can I provide? JB This is a SMPng issue on UP machines. I have obtained the same on non-SMP machine. (See my parallel letter.) /netch To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: microuptime() went backwards
If I build the kernel without apm, then there is no clock because the statclock is broken on this computer. = | Kenneth Culver | FreeBSD: The best NT upgrade| | Unix Systems Administrator | ICQ #: 24767726 | | and student at The | AIM: muythaibxr | | The University of Maryland, | Website: (Under Construction) | | College Park. | http://www.wam.umd.edu/~culverk/| = On Fri, 8 Sep 2000, Greg Lehey wrote: On Friday, 8 September 2000 at 0:18:07 -0400, Kenneth Wayne Culver wrote: On Fri, 8 Sep 2000, Greg Lehey wrote: On Thursday, 7 September 2000 at 22:49:30 -0400, Kenneth Wayne Culver wrote: The point I'm making is that we've had these problems before SMPng, and that you can't automatically assume that it's SMPng just because you get the messages. On the other hand, the 7 digits seem to be a pretty reliable signature. I'm getting this error while starting XFree86 4.0.1 on my laptop computer, and I'm using FreeBSD 4.1-STABLE, so I'm sure it's not the SMP stuff that's causing it. Right, but you're not getting the 7 digit microsecond count, right? You should contact phk. Well, here is one of the messeges: Sep 7 14:35:55 laptop /kernel: microuptime() went backwards (10412.355980 - 10412, -694583121) this is bad.. right ? :-) Well, at any rate it looks very funny. If this is a laptop, try building a kernel without apm and see if that helps. Greg -- Finger [EMAIL PROTECTED] for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: microuptime() went backwards / SHMSEG
Steve Ames wrote: Just upgraded to -CURRENT as of about noon (EST) today. At reboot I got a lot of these: microuptime() went backwards (1.7682417 - 1.997434) I recall reading in -current earlier this week that someone was looking for victims getting this. What further information can I provide? -Steve If it is of any relevance, I got hundreds of these messages from 4.0-RELEASE, on my Athlon 1Ghz, Abit KA7-100 system. So many, in fact, that every time it happened, I was left with no choice but to reboot, as it slowly brought the system to its knees. Building the kernel without apm fixed the problem, and I have since had no problems. -- "I'm entirely unaccountable for any moments of sanity contained herein." :s/Windows/UNIX John Toon | [EMAIL PROTECTED] | "Sonnilon" To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: microuptime() went backwards
Sep 7 14:35:55 laptop /kernel: microuptime() went backwards (10412.355980 - 10412, -694583121)y this is bad.. right ? :-) Well, at any rate it looks very funny. If this is a laptop, try building a kernel without apm and see if that helps. It only helps "hide" the problem. There's either *extremely* bogus data coming in, or an arithmetic or sequencing error that's allowing a corrupt timecounter to be seen. It might help to see the negative number as hex... -- ... every activity meets with opposition, everyone who acts has his rivals and unfortunately opponents also. But not because people want to be opponents, rather because the tasks and relationships force people to take different points of view. [Dr. Fritz Todt] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: microuptime() went backwards
In message [EMAIL PROTECTED], Mike Smith writes: Sep 7 14:35:55 laptop /kernel: microuptime() went backwards (10412.355980 - 10412, -694583121)y this is bad.. right ? :-) Well, at any rate it looks very funny. If this is a laptop, try building a kernel without apm and see if that helps. It only helps "hide" the problem. There's either *extremely* bogus data coming in, or an arithmetic or sequencing error that's allowing a corrupt timecounter to be seen. It might help to see the negative number as hex... I have collected all the emails I've received and I have identified at least two different causes: There is a bogus i8254 implementation on certain Athlon Mobos, this is a non-brainer since they should not use the i8254 but the TSC. There are negative numbers coming in from both the i8254 and in a few cases from the TSC. NTIMECOUNTER may be too low for certain systems, I'm still waiting for some feedback on that. Either way, I have a patch which I need to burn in in my lab, but right now I have a hard time getting my SMP box to even print out "Copyright..." when it boots :-( -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD coreteam member | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: microuptime() went backwards
I have collected all the emails I've received and I have identified at least two different causes: There is a bogus i8254 implementation on certain Athlon Mobos, this is a non-brainer since they should not use the i8254 but the TSC. s/TSC/ACPI timer/ It might even work on those systems. 8) -- ... every activity meets with opposition, everyone who acts has his rivals and unfortunately opponents also. But not because people want to be opponents, rather because the tasks and relationships force people to take different points of view. [Dr. Fritz Todt] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: microuptime() went backwards
Could all this be in any way related to the statclock not working on certain systems (like my Dell Inspiron)?? Also, could the "rtc" device as listed in systat -vm not being there (I'm assuming this is the statclock) be causing scheduling anomalies? I ask because I am trying to get gaim (multi-threaded, and using gtk which is threaded as well) to work on my laptop. It works on my desktop (-CURRENT), and on my friend's desktop (-STABLE) but not on my laptop (-STABLE). Just wondering... Thanks. = | Kenneth Culver | FreeBSD: The best NT upgrade| | Unix Systems Administrator | ICQ #: 24767726 | | and student at The | AIM: muythaibxr | | The University of Maryland, | Website: (Under Construction) | | College Park. | http://www.wam.umd.edu/~culverk/| = On Fri, 8 Sep 2000, Poul-Henning Kamp wrote: In message [EMAIL PROTECTED], Mike Smith writes: Sep 7 14:35:55 laptop /kernel: microuptime() went backwards (10412.355980 - 10412, -694583121)y this is bad.. right ? :-) Well, at any rate it looks very funny. If this is a laptop, try building a kernel without apm and see if that helps. It only helps "hide" the problem. There's either *extremely* bogus data coming in, or an arithmetic or sequencing error that's allowing a corrupt timecounter to be seen. It might help to see the negative number as hex... I have collected all the emails I've received and I have identified at least two different causes: There is a bogus i8254 implementation on certain Athlon Mobos, this is a non-brainer since they should not use the i8254 but the TSC. There are negative numbers coming in from both the i8254 and in a few cases from the TSC. NTIMECOUNTER may be too low for certain systems, I'm still waiting for some feedback on that. Either way, I have a patch which I need to burn in in my lab, but right now I have a hard time getting my SMP box to even print out "Copyright..." when it boots :-( -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD coreteam member | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: microuptime() went backwards
On Thu, 7 Sep 2000, John Baldwin wrote: Most definitely an SMPng issue. If you take a SMP machine and compile a UP and an SMP kernel on it, the SMP kernel will boot fine, whereas the UP kernel will generate these warnings. John, not quite, I have an SMP box that I got these warnings on, spent the better part of the day trying to reproduce them (after the machine became unresponsive last night) the machine is a N440BX Night (Intel) motherboard, with two PII 333's. the board has everything onboard except the disks. the only change besides building an SMPng kernel was to take out COMPAT_OLDPCI. If I can find out more for you I will, however I'm leaving for vacation at 7:30 AM tomorrow morning. -Trish __ Trish Lynch FreeBSD - The Power to Serve[EMAIL PROTECTED] Rush Networking [EMAIL PROTECTED] VA Linux Systems[EMAIL PROTECTED] O|S|D|N [EMAIL PROTECTED] --- Q: Have you guys ever been on Sesame Street as special guests? A (from Geddy): No. Much to the dismay of my daughter. Q: Would you be on Sesame Street if you were asked? A (again from Geddy): G: Yes. How could you turn Sesame Street down? That borders on criminal. -Geddy Lee From Rush, on Rockline. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
microuptime() went backwards
Just upgraded to -CURRENT as of about noon (EST) today. At reboot I got a lot of these: microuptime() went backwards (1.7682417 - 1.997434) I recall reading in -current earlier this week that someone was looking for victims getting this. What further information can I provide? -Steve To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: microuptime() went backwards
Steve Ames wrote: Just upgraded to -CURRENT as of about noon (EST) today. At reboot I got a lot of these: microuptime() went backwards (1.7682417 - 1.997434) I recall reading in -current earlier this week that someone was looking for victims getting this. What further information can I provide? This is a SMPng issue on UP machines. -Steve -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.cslab.vt.edu/~jobaldwi/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: microuptime() went backwards
On Thursday, 7 September 2000 at 17:28:02 -0700, John Baldwin wrote: Steve Ames wrote: Just upgraded to -CURRENT as of about noon (EST) today. At reboot I got a lot of these: microuptime() went backwards (1.7682417 - 1.997434) I recall reading in -current earlier this week that someone was looking for victims getting this. What further information can I provide? This is a SMPng issue on UP machines. Yes, but it's not the only cause. We've also seen it on pre-SMPng stuff, and Athlons seem to be particularly capable of doing it. phk is looking at it at the moment, and he has an idea what's causing it. I'd guess, however, that this *is* the SMPng problem. It's subtly different from other manifestations in that the first time typically has 7 digits after the decimal point. Greg -- Finger [EMAIL PROTECTED] for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: microuptime() went backwards
Greg Lehey wrote: On Thursday, 7 September 2000 at 17:28:02 -0700, John Baldwin wrote: Steve Ames wrote: Just upgraded to -CURRENT as of about noon (EST) today. At reboot I got a lot of these: microuptime() went backwards (1.7682417 - 1.997434) I recall reading in -current earlier this week that someone was looking for victims getting this. What further information can I provide? This is a SMPng issue on UP machines. Yes, but it's not the only cause. We've also seen it on pre-SMPng stuff, and Athlons seem to be particularly capable of doing it. phk is looking at it at the moment, and he has an idea what's causing it. I'd guess, however, that this *is* the SMPng problem. It's subtly different from other manifestations in that the first time typically has 7 digits after the decimal point. Most definitely an SMPng issue. If you take a SMP machine and compile a UP and an SMP kernel on it, the SMP kernel will boot fine, whereas the UP kernel will generate these warnings. Greg -- Finger [EMAIL PROTECTED] for PGP public key See complete headers for address and phone numbers -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.cslab.vt.edu/~jobaldwi/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: microuptime() went backwards
On Thursday, 7 September 2000 at 17:48:06 -0700, John Baldwin wrote: Greg Lehey wrote: On Thursday, 7 September 2000 at 17:28:02 -0700, John Baldwin wrote: Steve Ames wrote: Just upgraded to -CURRENT as of about noon (EST) today. At reboot I got a lot of these: microuptime() went backwards (1.7682417 - 1.997434) I recall reading in -current earlier this week that someone was looking for victims getting this. What further information can I provide? This is a SMPng issue on UP machines. Yes, but it's not the only cause. We've also seen it on pre-SMPng stuff, and Athlons seem to be particularly capable of doing it. phk is looking at it at the moment, and he has an idea what's causing it. I'd guess, however, that this *is* the SMPng problem. It's subtly different from other manifestations in that the first time typically has 7 digits after the decimal point. Most definitely an SMPng issue. If you take a SMP machine and compile a UP and an SMP kernel on it, the SMP kernel will boot fine, whereas the UP kernel will generate these warnings. The point I'm making is that we've had these problems before SMPng, and that you can't automatically assume that it's SMPng just because you get the messages. On the other hand, the 7 digits seem to be a pretty reliable signature. Greg -- Finger [EMAIL PROTECTED] for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: microuptime() went backwards
The point I'm making is that we've had these problems before SMPng, and that you can't automatically assume that it's SMPng just because you get the messages. On the other hand, the 7 digits seem to be a pretty reliable signature. I'm getting this error while starting XFree86 4.0.1 on my laptop computer, and I'm using FreeBSD 4.1-STABLE, so I'm sure it's not the SMP stuff that's causing it. Ken (it's a 600 MHz coppermine based celeron BTW with a mobile BX chipset) Ken To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: microuptime() went backwards
On Thursday, 7 September 2000 at 22:49:30 -0400, Kenneth Wayne Culver wrote: The point I'm making is that we've had these problems before SMPng, and that you can't automatically assume that it's SMPng just because you get the messages. On the other hand, the 7 digits seem to be a pretty reliable signature. I'm getting this error while starting XFree86 4.0.1 on my laptop computer, and I'm using FreeBSD 4.1-STABLE, so I'm sure it's not the SMP stuff that's causing it. Right, but you're not getting the 7 digit microsecond count, right? You should contact phk. Greg -- Finger [EMAIL PROTECTED] for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: microuptime() went backwards
On Friday, 8 September 2000 at 0:18:07 -0400, Kenneth Wayne Culver wrote: On Fri, 8 Sep 2000, Greg Lehey wrote: On Thursday, 7 September 2000 at 22:49:30 -0400, Kenneth Wayne Culver wrote: The point I'm making is that we've had these problems before SMPng, and that you can't automatically assume that it's SMPng just because you get the messages. On the other hand, the 7 digits seem to be a pretty reliable signature. I'm getting this error while starting XFree86 4.0.1 on my laptop computer, and I'm using FreeBSD 4.1-STABLE, so I'm sure it's not the SMP stuff that's causing it. Right, but you're not getting the 7 digit microsecond count, right? You should contact phk. Well, here is one of the messeges: Sep 7 14:35:55 laptop /kernel: microuptime() went backwards (10412.355980 - 10412, -694583121) this is bad.. right ? :-) Well, at any rate it looks very funny. If this is a laptop, try building a kernel without apm and see if that helps. Greg -- Finger [EMAIL PROTECTED] for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: microuptime() went backwards
On Thursday, 7 September 2000 at 22:43:11 -0700, Mike Smith wrote: Sep 7 14:35:55 laptop /kernel: microuptime() went backwards (10412.355980 - 10412, -694583121)y this is bad.. right ? :-) Well, at any rate it looks very funny. If this is a laptop, try building a kernel without apm and see if that helps. It only helps "hide" the problem. There's either *extremely* bogus data coming in, or an arithmetic or sequencing error that's allowing a corrupt timecounter to be seen. Correct. As I mentioned earlier in the thread, phk has an idea what the problem is, and he's asked for people with it to contact him. Greg -- Finger [EMAIL PROTECTED] for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: looking for microuptime went backwards victims...
I got this last night running 'mtv' on a system with heavy disk I/O. The errors were: Sep 5 23:12:14 genius /kernel: pcm0: hwptr went backwards 8208 - 8192 Sep 5 23:12:47 genius /kernel: pcm0: hwptr went backwards 8420 - 8192 Sep 5 23:12:57 genius /kernel: pcm0: hwptr went backwards 8212 - 8192 Sep 5 23:13:04 genius /kernel: pcm0: hwptr went backwards 8196 - 8192 Sep 5 23:13:21 genius /kernel: pcm0: hwptr went backwards 8208 - 8192 etc. Joe FreeBSD genius.systems.pavilion.net 5.0-CURRENT FreeBSD 5.0-CURRENT #11: Tue Sep 5 12:45:45 BST 2000 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENIUS i386 On Mon, Sep 04, 2000 at 06:41:14PM +0200, Poul-Henning Kamp wrote: I'm looking for the remaining victims of the dreaded "microuptime went backwards" message. If you can reliably reproduce the problem, please contact me, so we can arrange for some very detailed tracing to try to find out what exactly is going on. I have not been able to trigger the problem in my lab in a long time. If you see the message only occationally, please try the attached patch and let me know if this changes the picture. The patch is not meant as a fix, but it might reduce the impact of this condition considerably when it happens. Basically by reducing the width of the timecounter the magnitude of the hit we take if the timecounter goes backwards is reduced from about an hour to approx 110msec. Reducing the width to less than 17 bits starts to run the risk of ambiguity due to clock rollover. This patch may not be safe with PCAUDIO. Poul-Henning Index: clock.c === RCS file: /home/ncvs/src/sys/i386/isa/clock.c,v retrieving revision 1.155 diff -u -r1.155 clock.c --- clock.c 2000/07/30 21:05:22 1.155 +++ clock.c 2000/09/04 16:34:16 @@ -194,7 +194,7 @@ static struct timecounter i8254_timecounter = { i8254_get_timecount,/* get_timecount */ 0, /* no poll_pps */ - ~0u,/* counter_mask */ + 0x1,/* counter_mask */ 0, /* frequency */ "i8254" /* name */ }; -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD coreteam member | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message -- Josef KarthauserFreeBSD: How many times have you booted today? Technical Manager Viagra for your server (http://www.uk.freebsd.org) Pavilion Internet plc. [[EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: looking for microuptime went backwards victims...
Uhm, that is from the sound driver, not from the timecounter... Poul-Henning In message [EMAIL PROTECTED], Josef Karthauser writes: I got this last night running 'mtv' on a system with heavy disk I/O. The errors were: Sep 5 23:12:14 genius /kernel: pcm0: hwptr went backwards 8208 - 8192 Sep 5 23:12:47 genius /kernel: pcm0: hwptr went backwards 8420 - 8192 Sep 5 23:12:57 genius /kernel: pcm0: hwptr went backwards 8212 - 8192 Sep 5 23:13:04 genius /kernel: pcm0: hwptr went backwards 8196 - 8192 Sep 5 23:13:21 genius /kernel: pcm0: hwptr went backwards 8208 - 8192 etc. Joe FreeBSD genius.systems.pavilion.net 5.0-CURRENT FreeBSD 5.0-CURRENT #11: Tue Sep 5 12:45:45 BST 2000 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENIUS i386 On Mon, Sep 04, 2000 at 06:41:14PM +0200, Poul-Henning Kamp wrote: I'm looking for the remaining victims of the dreaded "microuptime went backwards" message. If you can reliably reproduce the problem, please contact me, so we can arrange for some very detailed tracing to try to find out what exactly is going on. I have not been able to trigger the problem in my lab in a long time. If you see the message only occationally, please try the attached patch and let me know if this changes the picture. The patch is not meant as a fix, but it might reduce the impact of this condition considerably when it happens. Basically by reducing the width of the timecounter the magnitude of the hit we take if the timecounter goes backwards is reduced from about an hour to approx 110msec. Reducing the width to less than 17 bits starts to run the risk of ambiguity due to clock rollover. This patch may not be safe with PCAUDIO. Poul-Henning Index: clock.c === RCS file: /home/ncvs/src/sys/i386/isa/clock.c,v retrieving revision 1.155 diff -u -r1.155 clock.c --- clock.c 2000/07/30 21:05:22 1.155 +++ clock.c 2000/09/04 16:34:16 @@ -194,7 +194,7 @@ static struct timecounter i8254_timecounter = { i8254_get_timecount,/* get_timecount */ 0, /* no poll_pps */ -~0u,/* counter_mask */ +0x1,/* counter_mask */ 0, /* frequency */ "i8254" /* name */ }; -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD coreteam member | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message -- Josef Karthauser FreeBSD: How many times have you booted today? Technical Manager Viagra for your server (http://www.uk.freebsd.org) Pavilion Internet plc. [[EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]] -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD coreteam member | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: looking for microuptime went backwards victims...
On Wed, Sep 06, 2000 at 12:25:53PM +0200, Poul-Henning Kamp wrote: Uhm, that is from the sound driver, not from the timecounter... Poul-Henning Oops, "You are in a maze of twisty passages all alike. Which direction do you want to go?". Joe In message [EMAIL PROTECTED], Josef Karthauser writes: I got this last night running 'mtv' on a system with heavy disk I/O. The errors were: Sep 5 23:12:14 genius /kernel: pcm0: hwptr went backwards 8208 - 8192 Sep 5 23:12:47 genius /kernel: pcm0: hwptr went backwards 8420 - 8192 Sep 5 23:12:57 genius /kernel: pcm0: hwptr went backwards 8212 - 8192 Sep 5 23:13:04 genius /kernel: pcm0: hwptr went backwards 8196 - 8192 Sep 5 23:13:21 genius /kernel: pcm0: hwptr went backwards 8208 - 8192 -- Josef KarthauserFreeBSD: How many times have you booted today? Technical Manager Viagra for your server (http://www.uk.freebsd.org) Pavilion Internet plc. [[EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
looking for microuptime went backwards victims...
I'm looking for the remaining victims of the dreaded "microuptime went backwards" message. If you can reliably reproduce the problem, please contact me, so we can arrange for some very detailed tracing to try to find out what exactly is going on. I have not been able to trigger the problem in my lab in a long time. If you see the message only occationally, please try the attached patch and let me know if this changes the picture. The patch is not meant as a fix, but it might reduce the impact of this condition considerably when it happens. Basically by reducing the width of the timecounter the magnitude of the hit we take if the timecounter goes backwards is reduced from about an hour to approx 110msec. Reducing the width to less than 17 bits starts to run the risk of ambiguity due to clock rollover. This patch may not be safe with PCAUDIO. Poul-Henning Index: clock.c === RCS file: /home/ncvs/src/sys/i386/isa/clock.c,v retrieving revision 1.155 diff -u -r1.155 clock.c --- clock.c 2000/07/30 21:05:22 1.155 +++ clock.c 2000/09/04 16:34:16 @@ -194,7 +194,7 @@ static struct timecounter i8254_timecounter = { i8254_get_timecount,/* get_timecount */ 0, /* no poll_pps */ - ~0u,/* counter_mask */ + 0x1,/* counter_mask */ 0, /* frequency */ "i8254" /* name */ }; -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD coreteam member | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message