Re: missing interrupts (was Re: CURRENT is freezing again ...)

2000-11-28 Thread Bruce Evans

On Mon, 27 Nov 2000, Andrew Gallatin wrote:

> 
> Bruce Evans writes:
>  > Possible causes of the problem:
>  > 1) isa_handle_intr() claims to send specific EOIs (0x30 | irq) but
>  >actually sends non-specific ones (0x20 | garbage).  Since interrupts
> 
> I think that sending non-specific EOIs is the problem.  Sending
> specific EOIs seem to eliminate my nic timeouts and the need to
> manually feed an eoi to recover from a missing interrupt.
> 
> My question is: how does one send a specific EOI correctly?  I don't
> have decent documentation for this.  Above, you seem to imply that
> 0x30 is a specific EOI.  That does not seem to work for me (machine
> locks at boot).
> 
> Linux uses 0xe0.  According to some Tru64 docs I have,
> that means "Rotate Priority on specific EOI".  According
> to that same documentation, 0x60 is a specific EOI.  Both of these

Oops, I misread the data sheet.  0x60 is correct, 0x30 is wrong.  The
irq number is in the lowest 3 bits.

> appear to work just fine.   What should the alpha port use?

I think it should use non-specific EOIs and send them early (when
there is no ambiguity about which interrupt is being handled), as in
the i386 port.  Sending them late mainly gives the ICU's braindamaged
interrupt priority scheme for longer than necessary.

Bruce



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: missing interrupts (was Re: CURRENT is freezing again ...)

2000-11-27 Thread Robert Drehmel

In <[EMAIL PROTECTED]>,
Andrew Gallatin wrote:
> Bruce Evans writes:
>  > Possible causes of the problem:
>  > 1) isa_handle_intr() claims to send specific EOIs (0x30 | irq) but
>  >actually sends non-specific ones (0x20 | garbage).  Since interrupts
>  >may be handled in non-LIFO order, this results in EOIs being sent
>  >for the wrong interrupts.  I think this just randomizes the
>  >brokenness caused by delaying sending of EOIs.  I can't see how it
>  >would result in an EOI being lost -- the right number of EOIs will
>  >have been sent after all handlers have returned.
> 
> 
> I think that sending non-specific EOIs is the problem.  Sending
> specific EOIs seem to eliminate my nic timeouts and the need to
> manually feed an eoi to recover from a missing interrupt.
> 
> My question is: how does one send a specific EOI correctly?  I don't
> have decent documentation for this.  Above, you seem to imply that
> 0x30 is a specific EOI.  That does not seem to work for me (machine
> locks at boot).
> 
> Linux uses 0xe0.  According to some Tru64 docs I have,
> that means "Rotate Priority on specific EOI".  According
> to that same documentation, 0x60 is a specific EOI.  Both of these
> appear to work just fine.   What should the alpha port use?

My notes say:

Non-specific EOI : 0x20
Specific EOI : 0x60 | IRQn
EOI + rotate priority: 0xa0
EOI + select lowest priority : 0xe0 | IRQn

-- 
Robert S. F. Drehmel <[EMAIL PROTECTED]>


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: missing interrupts (was Re: CURRENT is freezing again ...)

2000-11-27 Thread Andrew Gallatin


Bruce Evans writes:
 > Possible causes of the problem:
 > 1) isa_handle_intr() claims to send specific EOIs (0x30 | irq) but
 >actually sends non-specific ones (0x20 | garbage).  Since interrupts
 >may be handled in non-LIFO order, this results in EOIs being sent
 >for the wrong interrupts.  I think this just randomizes the
 >brokenness caused by delaying sending of EOIs.  I can't see how it
 >would result in an EOI being lost -- the right number of EOIs will
 >have been sent after all handlers have returned.


I think that sending non-specific EOIs is the problem.  Sending
specific EOIs seem to eliminate my nic timeouts and the need to
manually feed an eoi to recover from a missing interrupt.

My question is: how does one send a specific EOI correctly?  I don't
have decent documentation for this.  Above, you seem to imply that
0x30 is a specific EOI.  That does not seem to work for me (machine
locks at boot).

Linux uses 0xe0.  According to some Tru64 docs I have,
that means "Rotate Priority on specific EOI".  According
to that same documentation, 0x60 is a specific EOI.  Both of these
appear to work just fine.   What should the alpha port use?

Thanks,

Drew





To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: CURRENT is freezing again ...

2000-11-20 Thread Mark Murray

> Interestingly though - I thrashed the disks for about 15 minutes to no
> avail before kldloading random.ko and firing up ssh, at which point it
> froze within a few minutes while typing. Obviously one data point
> isn't much to go off, but it might be somewhere to start looking.

Now that I've (almost) cleared get_cyclecounter(9) out of my TODO,
I can use it, and then go about getting rid of most malloc(9)s and
all TAILQs in random.ko.

M
--
Mark Murray
Join the anti-SPAM movement: http://www.cauce.org


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: CURRENT is freezing again ...

2000-11-20 Thread Kris Kennaway

On Fri, Nov 17, 2000 at 05:58:30PM -0800, Kris Kennaway wrote:
> On Fri, Nov 17, 2000 at 12:55:28PM +0100, Soren Schmidt wrote:
> 
> > > I thought I was the only one, since my question on the freebsd-current
> > > mailing list went unanswered.
> > 
> > You are _not_ alone, there has been numerous complains about this
> > on the list, but so far they have not been taken seriously :|
> 
> One of my non-SMP machines reliably wedges whenever I do heavy disk
> I/O. I can't break to debugger.
> 
> Nov  4 15:46:41 mollari /boot/kernel/kernel: atapci0:  
>port 0xffa0-0xffaf at device 7.1 on pci0
> Nov  4 15:46:41 mollari /boot/kernel/kernel: ata0: at 0x1f0 irq 14 on atapci0
> Nov  4 15:46:41 mollari /boot/kernel/kernel: ahc0:  
>port 0xfc00-0xfcff mem 0xffbeb000-0xffbebfff irq 15 at device 11.0 on pci0
> Nov  4 15:46:41 mollari /boot/kernel/kernel: aic7880: Wide Channel A, SCSI Id=7, 
>16/255 SCBs

Well, adding INVARIANTS, INVARIANTS_SUPPORT, MUTEX_DEBUG and WITNESS
didn't give me anything to go off.

Interestingly though - I thrashed the disks for about 15 minutes to no
avail before kldloading random.ko and firing up ssh, at which point it
froze within a few minutes while typing. Obviously one data point
isn't much to go off, but it might be somewhere to start looking.

Kris

 PGP signature


Re: CURRENT is freezing again ...

2000-11-19 Thread Mark Huizer

On Thu, Nov 16, 2000 at 12:20:49PM -0500, Steven E. Ames wrote:
> It seems to only do it SMP... the same machine built with a non-SMP
> kernel (same source code) runs just fine for extended periods.

I have a non-SMP machine that is running a 15-nov current kernel, which
freezes a few times a day. This morning I found it might coincide with
the times that cvsup is running. Disabled that, I'll see if that's where
the problem might show up.
Freeze means: no keyboard activity possible, machine just does nothing.
> 
> > > On Thu, 16 Nov 2000, Soren Schmidt wrote:
> > >
> > > > > After last cvsup my machine (Dual PIII, SMP kernel) is freezing
> again in
> > > > > 10 min after boot...
> > > >
> > > > You mean "is still freezing" right ?
> > > >
> > > > Current has been like this for longer than I care to think about,
> it
> > > > seems those in charge doesn't take these problems seriously
> (enough)...
> > >
> > > I think info about where/how it freezing would be more helpful.
> >
> > No idea, the system just freezes, no drob to DDB no remote gdb no
> > nothing, so its really hard to tell where...
> > As to how, just boot current on a fairly fast machine, make a kernel
> > and it'll hang in minutes if not less, or just leave it alone and
> > it will hang in 10-30 mins...
> >
> > -Søren
> >
> >
> > To Unsubscribe: send mail to [EMAIL PROTECTED]
> > with "unsubscribe freebsd-current" in the body of the message
> >
> 
> 
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-current" in the body of the message

-- 
Nice testing in little China...


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: missing interrupts (was Re: CURRENT is freezing again ...)

2000-11-18 Thread Bruce Evans

On Fri, 17 Nov 2000, Andrew Gallatin wrote:

> [fxp isa irq pending but never occurs]

> I then wrote a hack which sends an eoi.  If I call my hack from ddb
> and send an eoi for irq10, everything goes back to normal and the
> network interface is back.
> 
> So, is it a race in the interrupt code, or is it something about how
> the code is structured?
> 
> On the alpha at least, we get the irq, mask the irq and set the
> ithread runnable.  When the (isa) ithread runs, it calls the interrupt
> handler and then sends an eoi.  The interrupt is then unmasked.
> 
> I've peeked at the linux code and noticed that they do things
> differently.  They first mask the interrupt, and then send the eoi
> immediately -- before the handler runs.  They then run the handler
> and unmask the interrupt.  The seem to do this both on i386 and
> alpha.  

FreeBSD does the same thing on i386's as Linux, except for fast
interrupts it delays the EOI until the handler returns so that the
handler gets called as soon as possible.

> Does anybody have any ideas about this?  Does something bad
> happen if you don't send an eoi in a reasonable amount of time?

Delayed EOIs work normally, but lower priority interrupts (according
to the ICU's priority scheme) are masked until the EIO is sent.  This
is bad mainly because the ICU's priority scheme is different from
FreeBSD's priority scheme.

Possible causes of the problem:
1) isa_handle_intr() claims to send specific EOIs (0x30 | irq) but
   actually sends non-specific ones (0x20 | garbage).  Since interrupts
   may be handled in non-LIFO order, this results in EOIs being sent
   for the wrong interrupts.  I think this just randomizes the
   brokenness caused by delaying sending of EOIs.  I can't see how it
   would result in an EOI being lost -- the right number of EOIs will
   have been sent after all handlers have returned.
2) Insufficient locking for ICU accesses.  Again, I can't see how this
   would affect EOIs.  On i386's, some accesses are locked implicitly
   by sched_lock.
3) Enabling interrupts (and unlocking the ICU) before sending EOI seems
   to just make things more complicated.  It requires the specific EOIs
   in (1).

On alphas, interrupts aren't masked in the ICU while they are handled
(the disable/enable args in the call to alpha_setup_intr() in
isa_setup_intr() are NULL ...).  They are masked by some combination
of the CPU and ICU priorities.

Bruce



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: CURRENT is freezing again ...

2000-11-18 Thread Michael Harnois

On Sat, 18 Nov 2000 11:40:34 -0600 (CST), Jonathan Lemon <[EMAIL PROTECTED]> said:

> What version of if_dc.c

1.38

-- 
Michael D. Harnois, Redeemer Lutheran Church, Washburn, IA 
[EMAIL PROTECTED]  [EMAIL PROTECTED] 
 "It's not what we don't know that hurts us, 
 it's what we know for certain that just ain't so." -- Mark Twain


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: CURRENT is freezing again ...

2000-11-18 Thread Jonathan Lemon

In article [EMAIL PROTECTED]> you 
write:
>On Fri, 17 Nov 2000 10:30:02 -0800 (PST), John Baldwin <[EMAIL PROTECTED]> said:
>
>> what the WITNESS code does is perform extra checks on mutex
>> enter's and exit's to ensure that we aren't handling mutexes in
>> such a way that a deadlock is possible. Thus, it verifies that
>> you don't grab mutexes out of order, or that you don't grab
>> sleep mutexes with interrupts disabled, etc.
>
>Is this code meaningful on UP machines? Having been a victim of these
>seemingly random freezes since SMPng started, as others have noted, I
>decided to compile it in earlier this week. Twice now I've been dumped
>into the debugger with this output:
>
>lock order reversal
>1st dc0 last acquired @ ../../pci/if_dc.c:2717
>2nd 0xc0acdb3c dc1 @ ../../pci/if_dc.c: 2717
>3rd 0xc0acab3c dc0 @ ../../pci/if_dc.c: 2929

This is on a UP machine?  This looks like you're taking an interrupt
on dc1 and then trying to call the dc0 start routine, which shouldn't
be possible.  (Unless I'm misunderstanding the witness code)

What version of if_dc.c are you using?  line 2929 doesn't correspond
to an instance of "DC_LOCK" in my copy.
--
Jonathan

this should be released before anything else 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: CURRENT is freezing again ...

2000-11-18 Thread Michael Harnois

On Fri, 17 Nov 2000 10:30:02 -0800 (PST), John Baldwin <[EMAIL PROTECTED]> said:

> what the WITNESS code does is perform extra checks on mutex
> enter's and exit's to ensure that we aren't handling mutexes in
> such a way that a deadlock is possible. Thus, it verifies that
> you don't grab mutexes out of order, or that you don't grab
> sleep mutexes with interrupts disabled, etc.

Is this code meaningful on UP machines? Having been a victim of these
seemingly random freezes since SMPng started, as others have noted, I
decided to compile it in earlier this week. Twice now I've been dumped
into the debugger with this output:

lock order reversal
1st dc0 last acquired @ ../../pci/if_dc.c:2717
2nd 0xc0acdb3c dc1 @ ../../pci/if_dc.c: 2717
3rd 0xc0acab3c dc0 @ ../../pci/if_dc.c: 2929

Debugger ("witness_enter")
Stopped at Debugger+0x39: movb $0, in.Debugger.639

-- 
Michael D. Harnois, Redeemer Lutheran Church, Washburn, IA 
[EMAIL PROTECTED]  [EMAIL PROTECTED] 
 The atheist staring from the attic window is often nearer to God than the
 believer caught up in his own false image of God. -- Martin Buber


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: CURRENT is freezing again ...

2000-11-17 Thread Mike Smith

> : >You can also short IOCHK to ground to get an NMI which kicks you into
> : >the debugger, even in an interrupt context.
> : 
> : Bad news for you warner:  On a too large sample of my newer
> : motherboards this doesn't work anymore :-(
> 
> There's also a pci signal that you can either pull up or pull down
> that's supposed to give you the same results.  I've never really
> needed to know it.

SERR behaviour is programmable and there is no standard for it. 8(

-- 
... every activity meets with opposition, everyone who acts has his
rivals and unfortunately opponents also.  But not because people want
to be opponents, rather because the tasks and relationships force
people to take different points of view.  [Dr. Fritz Todt]
   V I C T O R Y   N O T   V E N G E A N C E




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



missing interrupts (was Re: CURRENT is freezing again ...)

2000-11-17 Thread Andrew Gallatin


Valentin Chopov writes:
 > Hi,
 > 
 > After last cvsup my machine (Dual PIII, SMP kernel) is freezing again in
 > 10 min after boot...
 > 

I've seen one similar problem on an alpha UP1000 that I'd like some
input about.

The UP1000 is essentially an alpha 21264 stuffed into an AMD Athlon
system.  It has an AMD-751 chipset and handles all device interrupts
via an isa interrupt controller.

I've noticed that under "heavy" load (gdb -k kernel.debug /dev/mem on
an NFS filesystem), the network interface goes away, never to
reappear.  All I see is "fxp0: device timeout" on console.
This started with SMPng.

After a little bit of investigation with ddb, I discovered that
the NIC's irq was pending.  Eg:

login: fxp0: device timeout
Stopped at  siointr1+0x17c: br  zero,siointr1+0x32c 
db> call isa_irq_pending()
0x410

The fxp interface is at ir10, so 0x410 means there's an irq 10
pending.

I then wrote a hack which sends an eoi.  If I call my hack from ddb
and send an eoi for irq10, everything goes back to normal and the
network interface is back.

So, is it a race in the interrupt code, or is it something about how
the code is structured?

On the alpha at least, we get the irq, mask the irq and set the
ithread runnable.  When the (isa) ithread runs, it calls the interrupt
handler and then sends an eoi.  The interrupt is then unmasked.

I've peeked at the linux code and noticed that they do things
differently.  They first mask the interrupt, and then send the eoi
immediately -- before the handler runs.  They then run the handler
and unmask the interrupt.  The seem to do this both on i386 and
alpha.  

Does anybody have any ideas about this?  Does something bad
happen if you don't send an eoi in a reasonable amount of time?


Drew
--
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University Email: [EMAIL PROTECTED]
Department of Computer Science  Phone: (919) 660-6590





To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: CURRENT is freezing again ...

2000-11-17 Thread Andrew Gallatin


Warner Losh writes:
 > In message <[EMAIL PROTECTED]> Sheldon Hearn writes:
 > : The problem with a hard lock-up out of which you can't escape into the
 > : debugger is that it makes meaningful bug reports impossible.  My non-SMP
 > : workstation has exhibited apparently arbitrary lock-ups since the advent
 > : of SMPng.
 > 
 > You can also short IOCHK to ground to get an NMI which kicks you into
 > the debugger, even in an interrupt context.  I have a card I built
 > from an old multi-function card to do this.  I think it is A1 and A2,
 > but I don't have my ISA bus spec handy.

Or you can use an alpha; most of which have a halt button that
will drop you into the SRM console.  ;)

Drew


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: CURRENT is freezing again ...

2000-11-17 Thread Kris Kennaway

On Fri, Nov 17, 2000 at 12:55:28PM +0100, Soren Schmidt wrote:

> > I thought I was the only one, since my question on the freebsd-current
> > mailing list went unanswered.
> 
> You are _not_ alone, there has been numerous complains about this
> on the list, but so far they have not been taken seriously :|

One of my non-SMP machines reliably wedges whenever I do heavy disk
I/O. I can't break to debugger.

Nov  4 15:46:41 mollari /boot/kernel/kernel: atapci0:  
port 0xffa0-0xffaf at device 7.1 on pci0
Nov  4 15:46:41 mollari /boot/kernel/kernel: ata0: at 0x1f0 irq 14 on atapci0
Nov  4 15:46:41 mollari /boot/kernel/kernel: ahc0:  
port 0xfc00-0xfcff mem 0xffbeb000-0xffbebfff irq 15 at device 11.0 on pci0
Nov  4 15:46:41 mollari /boot/kernel/kernel: aic7880: Wide Channel A, SCSI Id=7, 
16/255 SCBs

Kris
 PGP signature


Re: CURRENT is freezing again ...

2000-11-17 Thread John Baldwin


On 17-Nov-00 Sheldon Hearn wrote:
> 
> 
> On Fri, 17 Nov 2000 10:30:02 PST, John Baldwin wrote:
> 
>> # sysctl -w debug.ktr_verbose=1 ; command_that_makes_my_machine_go_boom
> 
> All very well and good once you've figured out which command makes your
> machine go boom.

Yes, I know.  I didn't say the method was perfect.  I find it frustrating as
well. :-/

> But as I said, the locks I'm getting appear completely arbitrary.  I'm
> no hard-core hacker, but I'm not completely clueless when it comes to
> isolating problems by way of deductive reasoning, and I'm stumped as to
> what's causing these.

I have no idea either. :(  I can't magically fix them, and if I don't commit
any of this stuff which works on my boxes as far as I can tell, then others
won't test it and we won't make progress.  If a buildworld usually triggers,
then try writing a small C program that beats on a file in /tmp or /var, then
start up 10 copies of it in a script that you run as your
'command_that_makes_my_machine_go_boom' to see if it works. :-P

> Ciao,
> Sheldon.

-- 

John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: CURRENT is freezing again ...

2000-11-17 Thread Sheldon Hearn



On Fri, 17 Nov 2000 10:30:02 PST, John Baldwin wrote:

> # sysctl -w debug.ktr_verbose=1 ; command_that_makes_my_machine_go_boom

All very well and good once you've figured out which command makes your
machine go boom.

But as I said, the locks I'm getting appear completely arbitrary.  I'm
no hard-core hacker, but I'm not completely clueless when it comes to
isolating problems by way of deductive reasoning, and I'm stumped as to
what's causing these.

Ciao,
Sheldon.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: CURRENT is freezing again ...

2000-11-17 Thread Wilko Bulte

On Fri, Nov 17, 2000 at 11:26:02AM -0700, Warner Losh wrote:
> In message <[EMAIL PROTECTED]> Sheldon Hearn writes:
> : The problem with a hard lock-up out of which you can't escape into the
> : debugger is that it makes meaningful bug reports impossible.  My non-SMP
> : workstation has exhibited apparently arbitrary lock-ups since the advent
> : of SMPng.
> 
> You can also short IOCHK to ground to get an NMI which kicks you into
> the debugger, even in an interrupt context.  I have a card I built
> from an old multi-function card to do this.  I think it is A1 and A2,
> but I don't have my ISA bus spec handy.

Just stick a metal pin (ballpoint works well) into the ISA connector
between the pins closest to the back of the machine. That is IOCHKN and GND
respectively.

Wilko
[hardware designer gone bad..]

-- 
Wilko Bulte Arnhem, the Netherlands
[EMAIL PROTECTED]   http://www.freebsd.org  http://www.nlfug.nl



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: CURRENT is freezing again ...

2000-11-17 Thread Warner Losh

In message <25636.974487067@critter> Poul-Henning Kamp writes:
: In message <[EMAIL PROTECTED]>, Warner Losh writes:
: >In message <[EMAIL PROTECTED]> Sheldon Hearn writes:
: >: The problem with a hard lock-up out of which you can't escape into the
: >: debugger is that it makes meaningful bug reports impossible.  My non-SMP
: >: workstation has exhibited apparently arbitrary lock-ups since the advent
: >: of SMPng.
: >
: >You can also short IOCHK to ground to get an NMI which kicks you into
: >the debugger, even in an interrupt context.
: 
: Bad news for you warner:  On a too large sample of my newer
: motherboards this doesn't work anymore :-(

There's also a pci signal that you can either pull up or pull down
that's supposed to give you the same results.  I've never really
needed to know it.

Warner



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: CURRENT is freezing again ...

2000-11-17 Thread Poul-Henning Kamp

In message <[EMAIL PROTECTED]>, Warner Losh writes:
>In message <[EMAIL PROTECTED]> Sheldon Hearn writes:
>: The problem with a hard lock-up out of which you can't escape into the
>: debugger is that it makes meaningful bug reports impossible.  My non-SMP
>: workstation has exhibited apparently arbitrary lock-ups since the advent
>: of SMPng.
>
>You can also short IOCHK to ground to get an NMI which kicks you into
>the debugger, even in an interrupt context.

Bad news for you warner:  On a too large sample of my newer
motherboards this doesn't work anymore :-(

--
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: CURRENT is freezing again ...

2000-11-17 Thread John Baldwin


On 17-Nov-00 Sheldon Hearn wrote:
> 
> 
> On Thu, 16 Nov 2000 10:42:51 PST, Alfred Perlstein wrote:
> 
>> I would try a new kernel, and perhaps some collabaration with John
>> to debug these problems rather than just complaining about the
>> situation.  I see at least two experianced developers in the CC
>> list, there's no reason for these poor bug reports.
> 
> The problem with a hard lock-up out of which you can't escape into the
> debugger is that it makes meaningful bug reports impossible.  My non-SMP
> workstation has exhibited apparently arbitrary lock-ups since the advent
> of SMPng.

When I get a hard lock like this I usually try to see if I can reproduce it in
single user mode.  If I can, then I compile KTR into my kernel with the
following options:  KTR, KTR_EXTEND, KTR_COMPILE="0x3fff",
KTR_MASK="(KTR_INTR|KTR_PROC)".  Then I boot into single user (so I don't dirty
filesystems), mount any needed fs's as read only if possible, and run the
following command:

# sysctl -w debug.ktr_verbose=1 ; command_that_makes_my_machine_go_boom

And then stare at the tracing output on teh screen to see what the machine
was doing when it hung.  I.e., to see if it is still getting interrupts, and to
see what process it died in, etc.

> From my understanding, John's WITNESS code allows us to break into the
> debugger from within interrupt context.  If the lock-ups are happening
> in there, then this may help us provide better bug reports.

Err, not quite.  It's BSD/OS's WITNESS code, and what the WITNESS code does is
perform extra checks on mutex enter's and exit's to ensure that we aren't
handling mutexes in such a way that a deadlock is possible.  Thus, it verifies
that you don't grab mutexes out of order, or that you don't grab sleep mutexes
with interrupts disabled, etc.

-- 

John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: CURRENT is freezing again ...

2000-11-17 Thread Warner Losh

In message <[EMAIL PROTECTED]> Sheldon Hearn writes:
: The problem with a hard lock-up out of which you can't escape into the
: debugger is that it makes meaningful bug reports impossible.  My non-SMP
: workstation has exhibited apparently arbitrary lock-ups since the advent
: of SMPng.

You can also short IOCHK to ground to get an NMI which kicks you into
the debugger, even in an interrupt context.  I have a card I built
from an old multi-function card to do this.  I think it is A1 and A2,
but I don't have my ISA bus spec handy.

Warner


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: CURRENT is freezing again ...

2000-11-17 Thread Michael Harnois

On Fri, 17 Nov 2000 12:55:28 +0100 (CET), Soren Schmidt <[EMAIL PROTECTED]> said:

> It doesn't help here at least, the machine(s) just lock up solid
> only reset or a powercycle can bring them back...

Same here ... as others noted, started with SMPng ...

-- 
Michael D. Harnois, Redeemer Lutheran Church, Washburn, IA 
[EMAIL PROTECTED]  [EMAIL PROTECTED] 
 There are things that are so serious 
 that you can only joke about them. -- Werner Heisenberg


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: CURRENT is freezing again ...

2000-11-17 Thread Soren Schmidt

It seems Sheldon Hearn wrote:
> > I would try a new kernel, and perhaps some collabaration with John
> > to debug these problems rather than just complaining about the
> > situation.  I see at least two experianced developers in the CC
> > list, there's no reason for these poor bug reports.
> 
> The problem with a hard lock-up out of which you can't escape into the
> debugger is that it makes meaningful bug reports impossible.  My non-SMP
> workstation has exhibited apparently arbitrary lock-ups since the advent
> of SMPng.
> 
> I thought I was the only one, since my question on the freebsd-current
> mailing list went unanswered.

You are _not_ alone, there has been numerous complains about this
on the list, but so far they have not been taken seriously :|

> >From my understanding, John's WITNESS code allows us to break into the
> debugger from within interrupt context.  If the lock-ups are happening
> in there, then this may help us provide better bug reports.

It doesn't help here at least, the machine(s) just lock up solid
only reset or a powercycle can bring them back...

> Oh, and a couple of deep breaths are probably in order. :-)

Yeah like *sigh* 

-Søren


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: CURRENT is freezing again ...

2000-11-17 Thread Sheldon Hearn



On Thu, 16 Nov 2000 10:42:51 PST, Alfred Perlstein wrote:

> I would try a new kernel, and perhaps some collabaration with John
> to debug these problems rather than just complaining about the
> situation.  I see at least two experianced developers in the CC
> list, there's no reason for these poor bug reports.

The problem with a hard lock-up out of which you can't escape into the
debugger is that it makes meaningful bug reports impossible.  My non-SMP
workstation has exhibited apparently arbitrary lock-ups since the advent
of SMPng.

I thought I was the only one, since my question on the freebsd-current
mailing list went unanswered.

>From my understanding, John's WITNESS code allows us to break into the
debugger from within interrupt context.  If the lock-ups are happening
in there, then this may help us provide better bug reports.

Oh, and a couple of deep breaths are probably in order. :-)

Ciao,
Sheldon.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: CURRENT is freezing again ...

2000-11-17 Thread Soren Schmidt

It seems Michael C . Wu wrote:
> I had those problems too a while ago on a UP p3-650 laptop.  Finally I just
> newfs'ed the machine and installed the 20001028 snapshot, then cvsupp'ed
> to 20001122.  The laptop now works well.  What I saw was processes
> forking and forking again until the machine runs out of memory and
> swap.  I think it may be some old libraries left over from
> upgrades and make world.

Hmm, this is a new installed box, so there is no old leftovers...

-Søren


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: CURRENT is freezing again ...

2000-11-16 Thread Michael C . Wu

On Thu, Nov 16, 2000 at 10:27:39PM +0100, Soren Schmidt scribbled:
| It seems John Baldwin wrote:
| >
| > 1) What revision of sys/kern/kern_synch.c do you have?  I fixed several things
| > yesterday, and the latest version is 1.108.
|
| 1.108
|
| > 2) If you do have the latest version, have you compiled a kernel with WITNESS,
| > INVARIANTS, and INVARIANT_SUPPORT to see how it runs?
|
| Have those in too...
|
| It still cant compile a kernel, it hangs itself in ~30 secs, no messages,
| no hints, no nothing, the machine just locks up solid as usual..
|
| Mind you the same machines run 4.2 and PRE_SMPNG without a hitch...
|
| > Also, I have noticed that occasionally on my SMP boxes the console seems to
| > lose itself.  By lose itself, I mean that all output stops, and it doesn't
| > process any input.  If I hit Ctrl-Alt-Backspace to break into the debugger, it
| > suddenly catches up and processes all pending events before dropping into teh
| > debugger, but hangs again when I continue from ddb.  However, the rest of hte
| > machine works fine during this time.  I can ssh in, build kernels, reboot, etc.
| > without any problem.
|
| It has been like this almost since the SMPNG stuff vent in, at least on all my
| -current machines...


I had those problems too a while ago on a UP p3-650 laptop.  Finally I just
newfs'ed the machine and installed the 20001028 snapshot, then cvsupp'ed
to 20001122.  The laptop now works well.  What I saw was processes
forking and forking again until the machine runs out of memory and
swap.  I think it may be some old libraries left over from
upgrades and make world.

--
+--+
| [EMAIL PROTECTED] | [EMAIL PROTECTED] |
| http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. |
+--+


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: CURRENT is freezing again ...

2000-11-16 Thread Soren Schmidt

It seems John Baldwin wrote:
> 
> 1) What revision of sys/kern/kern_synch.c do you have?  I fixed several things
> yesterday, and the latest version is 1.108.

1.108

> 2) If you do have the latest version, have you compiled a kernel with WITNESS,
> INVARIANTS, and INVARIANT_SUPPORT to see how it runs?

Have those in too...

It still cant compile a kernel, it hangs itself in ~30 secs, no messages,
no hints, no nothing, the machine just locks up solid as usual..

Mind you the same machines run 4.2 and PRE_SMPNG without a hitch...

> Also, I have noticed that occasionally on my SMP boxes the console seems to
> lose itself.  By lose itself, I mean that all output stops, and it doesn't
> process any input.  If I hit Ctrl-Alt-Backspace to break into the debugger, it
> suddenly catches up and processes all pending events before dropping into teh
> debugger, but hangs again when I continue from ddb.  However, the rest of hte
> machine works fine during this time.  I can ssh in, build kernels, reboot, etc.
> without any problem.

It has been like this almost since the SMPNG stuff vent in, at least on all my
-current machines...

-Søren


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



RE: CURRENT is freezing again ...

2000-11-16 Thread John Baldwin


On 16-Nov-00 Valentin Chopov wrote:
> Hi,
> 
> After last cvsup my machine (Dual PIII, SMP kernel) is freezing again in
> 10 min after boot...
> 
> Thanks, 
> 
> Val

Two questions:

1) What revision of sys/kern/kern_synch.c do you have?  I fixed several things
yesterday, and the latest version is 1.108.
2) If you do have the latest version, have you compiled a kernel with WITNESS,
INVARIANTS, and INVARIANT_SUPPORT to see how it runs?

Also, I have noticed that occasionally on my SMP boxes the console seems to
lose itself.  By lose itself, I mean that all output stops, and it doesn't
process any input.  If I hit Ctrl-Alt-Backspace to break into the debugger, it
suddenly catches up and processes all pending events before dropping into teh
debugger, but hangs again when I continue from ddb.  However, the rest of hte
machine works fine during this time.  I can ssh in, build kernels, reboot, etc.
without any problem.

-- 

John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: CURRENT is freezing again ...

2000-11-16 Thread Alfred Perlstein

* Steven E. Ames <[EMAIL PROTECTED]> [001116 09:27] wrote:
> It seems to only do it SMP... the same machine built with a non-SMP
> kernel (same source code) runs just fine for extended periods.

John just checked in some code last night that may address your
problems.

I would try a new kernel, and perhaps some collabaration with John
to debug these problems rather than just complaining about the
situation.  I see at least two experianced developers in the CC
list, there's no reason for these poor bug reports.

-Alfred



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: CURRENT is freezing again ...

2000-11-16 Thread Gianmarco Giovannelli


> It seems Boris Popov wrote:
> > On Thu, 16 Nov 2000, Soren Schmidt wrote:
> > 
> > > > After last cvsup my machine (Dual PIII, SMP kernel) is freezing 
again in
> > > > 10 min after boot...
> > > 
> > > You mean "is still freezing" right ?
> > > 
> > > Current has been like this for longer than I care to think about, 
it
> > > seems those in charge doesn't take these problems seriously 
(enough)...
> > 
> > I think info about where/how it freezing would be more helpful.
> 
> No idea, the system just freezes, no drob to DDB no remote gdb no
> nothing, so its really hard to tell where...
> As to how, just boot current on a fairly fast machine, make a kernel
> and it'll hang in minutes if not less, or just leave it alone and 
> it will hang in 10-30 mins...

I have the same problem on a dual PII 400mhz.
I haven't tried to remove the SMP support, but I have not too much time
to cvsup and to make anything else.

I'll try to boot the GENERIC (damn !%&!& , I always repeat to myself 
that is a good habits to compile the GENERIC too after updates... 
but I never do... :-( )




-- 
Regards...

Gianmarco
"Unix expert since yesterday"

http://www.giovannelli.it


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: CURRENT is freezing again ...

2000-11-16 Thread Steven E. Ames

It seems to only do it SMP... the same machine built with a non-SMP
kernel (same source code) runs just fine for extended periods.

-Steve

- Original Message -
From: "Soren Schmidt" <[EMAIL PROTECTED]>
To: "Boris Popov" <[EMAIL PROTECTED]>
Cc: "Valentin Chopov" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Thursday, November 16, 2000 12:17 PM
Subject: Re: CURRENT is freezing again ...


> It seems Boris Popov wrote:
> > On Thu, 16 Nov 2000, Soren Schmidt wrote:
> >
> > > > After last cvsup my machine (Dual PIII, SMP kernel) is freezing
again in
> > > > 10 min after boot...
> > >
> > > You mean "is still freezing" right ?
> > >
> > > Current has been like this for longer than I care to think about,
it
> > > seems those in charge doesn't take these problems seriously
(enough)...
> >
> > I think info about where/how it freezing would be more helpful.
>
> No idea, the system just freezes, no drob to DDB no remote gdb no
> nothing, so its really hard to tell where...
> As to how, just boot current on a fairly fast machine, make a kernel
> and it'll hang in minutes if not less, or just leave it alone and
> it will hang in 10-30 mins...
>
> -Søren
>
>
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-current" in the body of the message
>



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: CURRENT is freezing again ...

2000-11-16 Thread Soren Schmidt

It seems Boris Popov wrote:
> On Thu, 16 Nov 2000, Soren Schmidt wrote:
> 
> > > After last cvsup my machine (Dual PIII, SMP kernel) is freezing again in
> > > 10 min after boot...
> > 
> > You mean "is still freezing" right ?
> > 
> > Current has been like this for longer than I care to think about, it
> > seems those in charge doesn't take these problems seriously (enough)...
> 
>   I think info about where/how it freezing would be more helpful.

No idea, the system just freezes, no drob to DDB no remote gdb no
nothing, so its really hard to tell where...
As to how, just boot current on a fairly fast machine, make a kernel
and it'll hang in minutes if not less, or just leave it alone and 
it will hang in 10-30 mins...

-Søren


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: CURRENT is freezing again ...

2000-11-16 Thread Boris Popov

On Thu, 16 Nov 2000, Soren Schmidt wrote:

> > After last cvsup my machine (Dual PIII, SMP kernel) is freezing again in
> > 10 min after boot...
> 
> You mean "is still freezing" right ?
> 
> Current has been like this for longer than I care to think about, it
> seems those in charge doesn't take these problems seriously (enough)...

I think info about where/how it freezing would be more helpful.

> I've started doing development on -stable instead, it goes nowhere
> on -current

 - works fine for me even with my new evil hacks :)

--
Boris Popov
http://www.butya.kz/~bp/



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: CURRENT is freezing again ...

2000-11-16 Thread Soren Schmidt

It seems Valentin Chopov wrote:
> Hi,
> 
> After last cvsup my machine (Dual PIII, SMP kernel) is freezing again in
> 10 min after boot...

You mean "is still freezing" right ?

Current has been like this for longer than I care to think about, it
seems those in charge doesn't take these problems seriously (enough)...

I've started doing development on -stable instead, it goes nowhere
on -current

-Søren


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message