Re: Won't boot after the commits to timecounter code
On Thu, Mar 07, 2002 at 10:40:07AM +0900, I wrote: Apparently you have KTR enabled (not the default in GENERIC). I think WITNESS+KTR already caused nasty recursion from the mtx_lock_spin, and we now get an endless loop when nanotime() is called with an invalid timecounter in the following call chain: tc_init - tc_windup - tco_delta - i8254_get_timecount - mtx_foo - witness_foo - ktr_foo - nanotime, just after nanotime has somehow recursed back into i8254_get_timecounter without causing endless recursion! Yes, I have the following KTR options enabled (I think I brought this from NOTES about a year before): options KTR options KTR_EXTEND options KTR_ENTRIES=1024 options KTR_COMPILE=0x3f options KTR_MASK=0x201208 options KTR_CPUMASK=0x3 but WITNESS is commented out. Try setting MTX_NOWITNESS in the initialization of clock_lock in i386/machdep.c. O.k., I'll try this(but does it affect a kernel without WITNESS?), then try a kernel without KTR options. I've found the following: - KTR alone can make this happen; it locked solid with or without WITNESS. - Setting MTX_NOWITNESS in the initialization of clock_lock didn't work. - If I disable KTR, it just works fine without any patches. - If I enable KTR with KTR_LOCK in options KTR_MASK, it freezes after Timecounter i8254 frequency 1193182 Hz message. - If I enable KTR without KTR_LOCK in options KTR_MASK, it boots but it locks solid the moment I inserted a pccard (I'm using OLDCARD). To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Won't boot after the commits to timecounter code
On Wed, Mar 06, 2002 at 08:49:18AM +0100, Poul-Henning Kamp wrote: In message 20020306054514.GA395@gzl, [EMAIL PROTECTED] writes: Hello. After upgrading to the kernel as of 2002-03-03 00:00:00(UTC), it stopped booting just after the message: Timecounter i8254 frequency 1193182 Hz With some debugging printf()'s inserted, I found it was tc-tc_get_timecount() called from tco_delta() called just after the bcopy() in tc_windup(). So maybe the next place I have to look at is i8254_get_timecount(), which is in /sys/i386/isa/clock.c, but the last (effective) commit to this file is revision 1.180(at the end of January). I couldn't even drop into DDB; no response other than to power button. The last bootable(and stable so far) kernel is from 2002-02-24. I don't think this is caused by some leftover in the work directory since I always build kernels in a new empty directory under /usr/obj. Any (clue|fix)\? The only thing I know off right now is this thing from BDE which I havn't been able to verify yet: From:Bruce Evans [EMAIL PROTECTED] Subject: dummy_timecounter broken; breaks booting with -d To: [EMAIL PROTECTED] Message-Id: [EMAIL PROTECTED] Date:Tue, 05 Mar 2002 08:09:26 +1100 %%% Index: kern_tc.c === RCS file: /home/ncvs/src/sys/kern/kern_tc.c,v retrieving revision 1.116 diff -u -2 -r1.116 kern_tc.c --- kern_tc.c 26 Feb 2002 09:16:27 - 1.116 +++ kern_tc.c 4 Mar 2002 21:08:03 - @@ -192,4 +252,14 @@ gen = tc-tc_generation; bintime2timeval(tc-tc_offset, tvp); + /* + * XXX dummy_timecounter is now broken. Its tc_get_timecount + * needs to be called before it works, and that doesn't + * always happen naturally. In particular, we spin forever + * here after booting with -d unless we do an unnatural call + * here, since the screen timeout code is always run on entry + * to ddb, and it calls here. + */ + if (gen == 0 timecounter == dummy_timecounter) + (void)tc-tc_get_timecount(tc); } while (gen == 0 || gen != tc-tc_generation); } %%% Bruce Hmm, this doesn't seem to change the situation. I tried reverting the following files: /sys/sys/timetc.h:1.46 - 1.45 /sys/kern/kern_tc.c: 1.116 - 1.114 and managed to get into the single-user mode, but this of course doesn't solve the problem. I inserted a pair of printf() inside mtx_lock_spin/mtx_unlock_spin in i8254_get_timecount() and it kept printing the message while tc_init() was blocked, so I think it's blocked at mtx_lock_spin in i8254_get_timecount() when called from tc_init(), but not when called from somewhere else. (maybe an interrupt handler?) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Won't boot after the commits to timecounter code
On Wed, 6 Mar 2002, Poul-Henning Kamp wrote: The only thing I know off right now is this thing from BDE which I havn't been able to verify yet: I got the hang for all boots, but it was a local problem. I had added a nanouptime() call the tc_windup(), and this spins forever when tc_windup() is called from tc_init() or switch_timecounter() (because timecounter-generation is 0). At least one of these calls is made unconditionally at boot time, but there is normally no problem, at least if WITNESS and KTR are not enabled (my default) because the functions that spin on the generation update are not called. But interrupt activity might result in them being called, and WITNESS and/or KTR call them if a lock is witnessed. The following patch is the result of attempting to debug this. I first had to debug the debugger, since it has the same problem. From:Bruce Evans [EMAIL PROTECTED] Subject: dummy_timecounter broken; breaks booting with -d To: [EMAIL PROTECTED] Message-Id: [EMAIL PROTECTED] Date:Tue, 05 Mar 2002 08:09:26 +1100 %%% Index: kern_tc.c === RCS file: /home/ncvs/src/sys/kern/kern_tc.c,v retrieving revision 1.116 diff -u -2 -r1.116 kern_tc.c --- kern_tc.c 26 Feb 2002 09:16:27 - 1.116 +++ kern_tc.c 4 Mar 2002 21:08:03 - @@ -192,4 +252,14 @@ gen = tc-tc_generation; bintime2timeval(tc-tc_offset, tvp); + /* + * XXX dummy_timecounter is now broken. Its tc_get_timecount + * needs to be called before it works, and that doesn't + * always happen naturally. In particular, we spin forever + * here after booting with -d unless we do an unnatural call + * here, since the screen timeout code is always run on entry + * to ddb, and it calls here. + */ + if (gen == 0 timecounter == dummy_timecounter) + (void)tc-tc_get_timecount(tc); } while (gen == 0 || gen != tc-tc_generation); } %%% Bruce In message 20020306054514.GA395@gzl, [EMAIL PROTECTED] writes: Hello. After upgrading to the kernel as of 2002-03-03 00:00:00(UTC), it stopped booting just after the message: Timecounter i8254 frequency 1193182 Hz With some debugging printf()'s inserted, I found it was tc-tc_get_timecount() called from tco_delta() called just after the bcopy() in tc_windup(). So maybe the next place I have to look at is i8254_get_timecount(), which is in /sys/i386/isa/clock.c, but the last (effective) commit to this file is revision 1.180(at the end of January). I couldn't even drop into DDB; no response other than to power button. The last bootable(and stable so far) kernel is from 2002-02-24. I don't think this is caused by some leftover in the work directory since I always build kernels in a new empty directory under /usr/obj. Any (clue|fix)\? To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Won't boot after the commits to timecounter code
On Wed, 6 Mar 2002 [EMAIL PROTECTED] wrote: I inserted a pair of printf() inside mtx_lock_spin/mtx_unlock_spin in i8254_get_timecount() and it kept printing the message while tc_init() was blocked, so I think it's blocked at mtx_lock_spin in i8254_get_timecount() when called from tc_init(), but not when called from somewhere else. (maybe an interrupt handler?) Apparently you have KTR enabled (not the default in GENERIC). I think WITNESS+KTR already caused nasty recursion from the mtx_lock_spin, and we now get an endless loop when nanotime() is called with an invalid timecounter in the following call chain: tc_init - tc_windup - tco_delta - i8254_get_timecount - mtx_foo - witness_foo - ktr_foo - nanotime, just after nanotime has somehow recursed back into i8254_get_timecounter without causing endless recursion! Try setting MTX_NOWITNESS in the initialization of clock_lock in i386/machdep.c. Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Won't boot after the commits to timecounter code
On Thu, Mar 07, 2002 at 04:34:00AM +1100, Bruce Evans wrote: On Wed, 6 Mar 2002 [EMAIL PROTECTED] wrote: I inserted a pair of printf() inside mtx_lock_spin/mtx_unlock_spin in i8254_get_timecount() and it kept printing the message while tc_init() was blocked, so I think it's blocked at mtx_lock_spin in i8254_get_timecount() when called from tc_init(), but not when called from somewhere else. (maybe an interrupt handler?) Apparently you have KTR enabled (not the default in GENERIC). I think WITNESS+KTR already caused nasty recursion from the mtx_lock_spin, and we now get an endless loop when nanotime() is called with an invalid timecounter in the following call chain: tc_init - tc_windup - tco_delta - i8254_get_timecount - mtx_foo - witness_foo - ktr_foo - nanotime, just after nanotime has somehow recursed back into i8254_get_timecounter without causing endless recursion! Yes, I have the following KTR options enabled (I think I brought this from NOTES about a year before): options KTR options KTR_EXTEND options KTR_ENTRIES=1024 options KTR_COMPILE=0x3f options KTR_MASK=0x201208 options KTR_CPUMASK=0x3 but WITNESS is commented out. Try setting MTX_NOWITNESS in the initialization of clock_lock in i386/machdep.c. O.k., I'll try this(but does it affect a kernel without WITNESS?), then try a kernel without KTR options. Thanks. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Won't boot after the commits to timecounter code
Hello. After upgrading to the kernel as of 2002-03-03 00:00:00(UTC), it stopped booting just after the message: Timecounter i8254 frequency 1193182 Hz With some debugging printf()'s inserted, I found it was tc-tc_get_timecount() called from tco_delta() called just after the bcopy() in tc_windup(). So maybe the next place I have to look at is i8254_get_timecount(), which is in /sys/i386/isa/clock.c, but the last (effective) commit to this file is revision 1.180(at the end of January). I couldn't even drop into DDB; no response other than to power button. The last bootable(and stable so far) kernel is from 2002-02-24. I don't think this is caused by some leftover in the work directory since I always build kernels in a new empty directory under /usr/obj. Any (clue|fix)\? To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Won't boot after the commits to timecounter code
The only thing I know off right now is this thing from BDE which I havn't been able to verify yet: From:Bruce Evans [EMAIL PROTECTED] Subject: dummy_timecounter broken; breaks booting with -d To: [EMAIL PROTECTED] Message-Id: [EMAIL PROTECTED] Date:Tue, 05 Mar 2002 08:09:26 +1100 %%% Index: kern_tc.c === RCS file: /home/ncvs/src/sys/kern/kern_tc.c,v retrieving revision 1.116 diff -u -2 -r1.116 kern_tc.c --- kern_tc.c 26 Feb 2002 09:16:27 - 1.116 +++ kern_tc.c 4 Mar 2002 21:08:03 - @@ -192,4 +252,14 @@ gen = tc-tc_generation; bintime2timeval(tc-tc_offset, tvp); + /* +* XXX dummy_timecounter is now broken. Its tc_get_timecount +* needs to be called before it works, and that doesn't +* always happen naturally. In particular, we spin forever +* here after booting with -d unless we do an unnatural call +* here, since the screen timeout code is always run on entry +* to ddb, and it calls here. +*/ + if (gen == 0 timecounter == dummy_timecounter) + (void)tc-tc_get_timecount(tc); } while (gen == 0 || gen != tc-tc_generation); } %%% Bruce In message 20020306054514.GA395@gzl, [EMAIL PROTECTED] writes: Hello. After upgrading to the kernel as of 2002-03-03 00:00:00(UTC), it stopped booting just after the message: Timecounter i8254 frequency 1193182 Hz With some debugging printf()'s inserted, I found it was tc-tc_get_timecount() called from tco_delta() called just after the bcopy() in tc_windup(). So maybe the next place I have to look at is i8254_get_timecount(), which is in /sys/i386/isa/clock.c, but the last (effective) commit to this file is revision 1.180(at the end of January). I couldn't even drop into DDB; no response other than to power button. The last bootable(and stable so far) kernel is from 2002-02-24. I don't think this is caused by some leftover in the work directory since I always build kernels in a new empty directory under /usr/obj. Any (clue|fix)\? To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message