Re: Won't boot after the commits to timecounter code

2002-03-14 Thread qhwt

On Thu, Mar 07, 2002 at 10:40:07AM +0900, I wrote:
  Apparently you have KTR enabled (not the default in GENERIC).  I think
  WITNESS+KTR already caused nasty recursion from the mtx_lock_spin, and
  we now get an endless loop when nanotime() is called with an invalid
  timecounter in the following call chain:
 
  tc_init - tc_windup - tco_delta - i8254_get_timecount - mtx_foo -
  witness_foo - ktr_foo - nanotime,
 
  just after nanotime has somehow recursed back into i8254_get_timecounter
  without causing endless recursion!
 
 Yes, I have the following KTR options enabled (I think I brought this from
 NOTES about a year before):
   options KTR
   options KTR_EXTEND
   options KTR_ENTRIES=1024
   options KTR_COMPILE=0x3f
   options KTR_MASK=0x201208
   options KTR_CPUMASK=0x3
 
 but WITNESS is commented out.
 
  Try setting MTX_NOWITNESS in the initialization of clock_lock in
  i386/machdep.c.
 
 O.k., I'll try this(but does it affect a kernel without WITNESS?), then
 try a kernel without KTR options.

I've found the following:
- KTR alone can make this happen; it locked solid with or without WITNESS.
- Setting MTX_NOWITNESS in the initialization of clock_lock didn't work.
- If I disable KTR, it just works fine without any patches.
- If I enable KTR with KTR_LOCK in options KTR_MASK, it freezes after
Timecounter i8254  frequency 1193182 Hz
message.
- If I enable KTR without KTR_LOCK in options KTR_MASK,
it boots but it locks solid the moment I inserted a pccard
(I'm using OLDCARD).

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Won't boot after the commits to timecounter code

2002-03-06 Thread qhwt

On Wed, Mar 06, 2002 at 08:49:18AM +0100, Poul-Henning Kamp wrote:
  In message 20020306054514.GA395@gzl, [EMAIL PROTECTED] writes:
  Hello.
  After upgrading to the kernel as of 2002-03-03 00:00:00(UTC), it stopped
  booting just after the message:
  
  Timecounter i8254  frequency 1193182 Hz
  
  With some debugging printf()'s inserted, I found it was tc-tc_get_timecount()
  called from tco_delta() called just after the bcopy() in tc_windup().
  So maybe the next place I have to look at is i8254_get_timecount(), which is in
  /sys/i386/isa/clock.c, but the last (effective) commit to this file is
  revision 1.180(at the end of January).
  
  I couldn't even drop into DDB; no response other than to power button.
  The last bootable(and stable so far) kernel is from 2002-02-24.
  I don't think this is caused by some leftover in the work directory
  since I always build kernels in a new empty directory under /usr/obj.
  
  Any (clue|fix)\?

 The only thing I know off right now is this thing from BDE which
 I havn't been able to verify yet:



 
 From:Bruce Evans [EMAIL PROTECTED]
 Subject: dummy_timecounter broken; breaks booting with -d
 To:  [EMAIL PROTECTED]
 Message-Id: [EMAIL PROTECTED]
 Date:Tue, 05 Mar 2002 08:09:26 +1100
 
 %%%
 Index: kern_tc.c
 ===
 RCS file: /home/ncvs/src/sys/kern/kern_tc.c,v
 retrieving revision 1.116
 diff -u -2 -r1.116 kern_tc.c
 --- kern_tc.c 26 Feb 2002 09:16:27 -  1.116
 +++ kern_tc.c 4 Mar 2002 21:08:03 -
 @@ -192,4 +252,14 @@
   gen = tc-tc_generation;
   bintime2timeval(tc-tc_offset, tvp);
 + /*
 +  * XXX dummy_timecounter is now broken.  Its tc_get_timecount
 +  * needs to be called before it works, and that doesn't
 +  * always happen naturally.  In particular, we spin forever
 +  * here after booting with -d unless we do an unnatural call
 +  * here, since the screen timeout code is always run on entry
 +  * to ddb, and it calls here.
 +  */
 + if (gen == 0  timecounter == dummy_timecounter)
 + (void)tc-tc_get_timecount(tc);
   } while (gen == 0 || gen != tc-tc_generation);
  }
 %%%
 
 Bruce

Hmm, this doesn't seem to change the situation.

I tried reverting the following files:
  /sys/sys/timetc.h:1.46 - 1.45
  /sys/kern/kern_tc.c:  1.116 - 1.114
and managed to get into the single-user mode, but this of course doesn't solve
the problem.

I inserted a pair of printf() inside mtx_lock_spin/mtx_unlock_spin in
i8254_get_timecount() and it kept printing the message while tc_init()
was blocked, so I think it's blocked at mtx_lock_spin in i8254_get_timecount()
when called from tc_init(), but not when called from somewhere else.
(maybe an interrupt handler?)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Won't boot after the commits to timecounter code

2002-03-06 Thread Bruce Evans

On Wed, 6 Mar 2002, Poul-Henning Kamp wrote:

 The only thing I know off right now is this thing from BDE which
 I havn't been able to verify yet:

I got the hang for all boots, but it was a local problem.  I had added
a nanouptime() call the tc_windup(), and this spins forever when
tc_windup() is called from tc_init() or switch_timecounter() (because
timecounter-generation is 0).  At least one of these calls is made
unconditionally at boot time, but there is normally no problem, at
least if WITNESS and KTR are not enabled (my default) because the
functions that spin on the generation update are not called.  But
interrupt activity might result in them being called, and WITNESS
and/or KTR call them if a lock is witnessed.

The following patch is the result of attempting to debug this.  I first
had to debug the debugger, since it has the same problem.

 
 From:Bruce Evans [EMAIL PROTECTED]
 Subject: dummy_timecounter broken; breaks booting with -d
 To:  [EMAIL PROTECTED]
 Message-Id: [EMAIL PROTECTED]
 Date:Tue, 05 Mar 2002 08:09:26 +1100

 %%%
 Index: kern_tc.c
 ===
 RCS file: /home/ncvs/src/sys/kern/kern_tc.c,v
 retrieving revision 1.116
 diff -u -2 -r1.116 kern_tc.c
 --- kern_tc.c 26 Feb 2002 09:16:27 -  1.116
 +++ kern_tc.c 4 Mar 2002 21:08:03 -
 @@ -192,4 +252,14 @@
   gen = tc-tc_generation;
   bintime2timeval(tc-tc_offset, tvp);
 + /*
 +  * XXX dummy_timecounter is now broken.  Its tc_get_timecount
 +  * needs to be called before it works, and that doesn't
 +  * always happen naturally.  In particular, we spin forever
 +  * here after booting with -d unless we do an unnatural call
 +  * here, since the screen timeout code is always run on entry
 +  * to ddb, and it calls here.
 +  */
 + if (gen == 0  timecounter == dummy_timecounter)
 + (void)tc-tc_get_timecount(tc);
   } while (gen == 0 || gen != tc-tc_generation);
  }
 %%%

 Bruce


 In message 20020306054514.GA395@gzl, [EMAIL PROTECTED] writes:
 Hello.
 After upgrading to the kernel as of 2002-03-03 00:00:00(UTC), it stopped
 booting just after the message:
 
 Timecounter i8254  frequency 1193182 Hz
 
 With some debugging printf()'s inserted, I found it was tc-tc_get_timecount()
 called from tco_delta() called just after the bcopy() in tc_windup().
 So maybe the next place I have to look at is i8254_get_timecount(), which is in
 /sys/i386/isa/clock.c, but the last (effective) commit to this file is
 revision 1.180(at the end of January).
 
 I couldn't even drop into DDB; no response other than to power button.
 The last bootable(and stable so far) kernel is from 2002-02-24.
 I don't think this is caused by some leftover in the work directory
 since I always build kernels in a new empty directory under /usr/obj.
 
 Any (clue|fix)\?
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-current in the body of the message
 

 --
 Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
 [EMAIL PROTECTED] | TCP/IP since RFC 956
 FreeBSD committer   | BSD since 4.3-tahoe
 Never attribute to malice what can adequately be explained by incompetence.

 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-current in the body of the message

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Won't boot after the commits to timecounter code

2002-03-06 Thread Bruce Evans

On Wed, 6 Mar 2002 [EMAIL PROTECTED] wrote:

 I inserted a pair of printf() inside mtx_lock_spin/mtx_unlock_spin in
 i8254_get_timecount() and it kept printing the message while tc_init()
 was blocked, so I think it's blocked at mtx_lock_spin in i8254_get_timecount()
 when called from tc_init(), but not when called from somewhere else.
 (maybe an interrupt handler?)

Apparently you have KTR enabled (not the default in GENERIC).  I think
WITNESS+KTR already caused nasty recursion from the mtx_lock_spin, and
we now get an endless loop when nanotime() is called with an invalid
timecounter in the following call chain:

tc_init - tc_windup - tco_delta - i8254_get_timecount - mtx_foo -
witness_foo - ktr_foo - nanotime,

just after nanotime has somehow recursed back into i8254_get_timecounter
without causing endless recursion!

Try setting MTX_NOWITNESS in the initialization of clock_lock in
i386/machdep.c.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Won't boot after the commits to timecounter code

2002-03-06 Thread qhwt

On Thu, Mar 07, 2002 at 04:34:00AM +1100, Bruce Evans wrote:
 On Wed, 6 Mar 2002 [EMAIL PROTECTED] wrote:

  I inserted a pair of printf() inside mtx_lock_spin/mtx_unlock_spin in
  i8254_get_timecount() and it kept printing the message while tc_init()
  was blocked, so I think it's blocked at mtx_lock_spin in i8254_get_timecount()
  when called from tc_init(), but not when called from somewhere else.
  (maybe an interrupt handler?)

 Apparently you have KTR enabled (not the default in GENERIC).  I think
 WITNESS+KTR already caused nasty recursion from the mtx_lock_spin, and
 we now get an endless loop when nanotime() is called with an invalid
 timecounter in the following call chain:

 tc_init - tc_windup - tco_delta - i8254_get_timecount - mtx_foo -
 witness_foo - ktr_foo - nanotime,

 just after nanotime has somehow recursed back into i8254_get_timecounter
 without causing endless recursion!

Yes, I have the following KTR options enabled (I think I brought this from
NOTES about a year before):
  options   KTR
  options   KTR_EXTEND
  options   KTR_ENTRIES=1024
  options   KTR_COMPILE=0x3f
  options   KTR_MASK=0x201208
  options   KTR_CPUMASK=0x3

but WITNESS is commented out.

 Try setting MTX_NOWITNESS in the initialization of clock_lock in
 i386/machdep.c.

O.k., I'll try this(but does it affect a kernel without WITNESS?), then
try a kernel without KTR options.

Thanks.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Won't boot after the commits to timecounter code

2002-03-05 Thread qhwt

Hello.
After upgrading to the kernel as of 2002-03-03 00:00:00(UTC), it stopped
booting just after the message:

Timecounter i8254  frequency 1193182 Hz

With some debugging printf()'s inserted, I found it was tc-tc_get_timecount()
called from tco_delta() called just after the bcopy() in tc_windup().
So maybe the next place I have to look at is i8254_get_timecount(), which is in
/sys/i386/isa/clock.c, but the last (effective) commit to this file is
revision 1.180(at the end of January).

I couldn't even drop into DDB; no response other than to power button.
The last bootable(and stable so far) kernel is from 2002-02-24.
I don't think this is caused by some leftover in the work directory
since I always build kernels in a new empty directory under /usr/obj.

Any (clue|fix)\?

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Won't boot after the commits to timecounter code

2002-03-05 Thread Poul-Henning Kamp


The only thing I know off right now is this thing from BDE which
I havn't been able to verify yet:




From:Bruce Evans [EMAIL PROTECTED]
Subject: dummy_timecounter broken; breaks booting with -d
To:  [EMAIL PROTECTED]
Message-Id: [EMAIL PROTECTED]
Date:Tue, 05 Mar 2002 08:09:26 +1100

%%%
Index: kern_tc.c
===
RCS file: /home/ncvs/src/sys/kern/kern_tc.c,v
retrieving revision 1.116
diff -u -2 -r1.116 kern_tc.c
--- kern_tc.c   26 Feb 2002 09:16:27 -  1.116
+++ kern_tc.c   4 Mar 2002 21:08:03 -
@@ -192,4 +252,14 @@
gen = tc-tc_generation;
bintime2timeval(tc-tc_offset, tvp);
+   /*
+* XXX dummy_timecounter is now broken.  Its tc_get_timecount
+* needs to be called before it works, and that doesn't
+* always happen naturally.  In particular, we spin forever
+* here after booting with -d unless we do an unnatural call
+* here, since the screen timeout code is always run on entry
+* to ddb, and it calls here.
+*/
+   if (gen == 0  timecounter == dummy_timecounter)
+   (void)tc-tc_get_timecount(tc);
} while (gen == 0 || gen != tc-tc_generation);
 }
%%%

Bruce


In message 20020306054514.GA395@gzl, [EMAIL PROTECTED] writes:
Hello.
After upgrading to the kernel as of 2002-03-03 00:00:00(UTC), it stopped
booting just after the message:

Timecounter i8254  frequency 1193182 Hz

With some debugging printf()'s inserted, I found it was tc-tc_get_timecount()
called from tco_delta() called just after the bcopy() in tc_windup().
So maybe the next place I have to look at is i8254_get_timecount(), which is in
/sys/i386/isa/clock.c, but the last (effective) commit to this file is
revision 1.180(at the end of January).

I couldn't even drop into DDB; no response other than to power button.
The last bootable(and stable so far) kernel is from 2002-02-24.
I don't think this is caused by some leftover in the work directory
since I always build kernels in a new empty directory under /usr/obj.

Any (clue|fix)\?

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message