Re: missing interrupts (was Re: CURRENT is freezing again ...)
On Mon, 27 Nov 2000, Andrew Gallatin wrote: > > Bruce Evans writes: > > Possible causes of the problem: > > 1) isa_handle_intr() claims to send specific EOIs (0x30 | irq) but > >actually sends non-specific ones (0x20 | garbage). Since interrupts > > I think that sending non-specific EOIs is the problem. Sending > specific EOIs seem to eliminate my nic timeouts and the need to > manually feed an eoi to recover from a missing interrupt. > > My question is: how does one send a specific EOI correctly? I don't > have decent documentation for this. Above, you seem to imply that > 0x30 is a specific EOI. That does not seem to work for me (machine > locks at boot). > > Linux uses 0xe0. According to some Tru64 docs I have, > that means "Rotate Priority on specific EOI". According > to that same documentation, 0x60 is a specific EOI. Both of these Oops, I misread the data sheet. 0x60 is correct, 0x30 is wrong. The irq number is in the lowest 3 bits. > appear to work just fine. What should the alpha port use? I think it should use non-specific EOIs and send them early (when there is no ambiguity about which interrupt is being handled), as in the i386 port. Sending them late mainly gives the ICU's braindamaged interrupt priority scheme for longer than necessary. Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: missing interrupts (was Re: CURRENT is freezing again ...)
In <[EMAIL PROTECTED]>, Andrew Gallatin wrote: > Bruce Evans writes: > > Possible causes of the problem: > > 1) isa_handle_intr() claims to send specific EOIs (0x30 | irq) but > >actually sends non-specific ones (0x20 | garbage). Since interrupts > >may be handled in non-LIFO order, this results in EOIs being sent > >for the wrong interrupts. I think this just randomizes the > >brokenness caused by delaying sending of EOIs. I can't see how it > >would result in an EOI being lost -- the right number of EOIs will > >have been sent after all handlers have returned. > > > I think that sending non-specific EOIs is the problem. Sending > specific EOIs seem to eliminate my nic timeouts and the need to > manually feed an eoi to recover from a missing interrupt. > > My question is: how does one send a specific EOI correctly? I don't > have decent documentation for this. Above, you seem to imply that > 0x30 is a specific EOI. That does not seem to work for me (machine > locks at boot). > > Linux uses 0xe0. According to some Tru64 docs I have, > that means "Rotate Priority on specific EOI". According > to that same documentation, 0x60 is a specific EOI. Both of these > appear to work just fine. What should the alpha port use? My notes say: Non-specific EOI : 0x20 Specific EOI : 0x60 | IRQn EOI + rotate priority: 0xa0 EOI + select lowest priority : 0xe0 | IRQn -- Robert S. F. Drehmel <[EMAIL PROTECTED]> To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: missing interrupts (was Re: CURRENT is freezing again ...)
Bruce Evans writes: > Possible causes of the problem: > 1) isa_handle_intr() claims to send specific EOIs (0x30 | irq) but >actually sends non-specific ones (0x20 | garbage). Since interrupts >may be handled in non-LIFO order, this results in EOIs being sent >for the wrong interrupts. I think this just randomizes the >brokenness caused by delaying sending of EOIs. I can't see how it >would result in an EOI being lost -- the right number of EOIs will >have been sent after all handlers have returned. I think that sending non-specific EOIs is the problem. Sending specific EOIs seem to eliminate my nic timeouts and the need to manually feed an eoi to recover from a missing interrupt. My question is: how does one send a specific EOI correctly? I don't have decent documentation for this. Above, you seem to imply that 0x30 is a specific EOI. That does not seem to work for me (machine locks at boot). Linux uses 0xe0. According to some Tru64 docs I have, that means "Rotate Priority on specific EOI". According to that same documentation, 0x60 is a specific EOI. Both of these appear to work just fine. What should the alpha port use? Thanks, Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: CURRENT is freezing again ...
> Interestingly though - I thrashed the disks for about 15 minutes to no > avail before kldloading random.ko and firing up ssh, at which point it > froze within a few minutes while typing. Obviously one data point > isn't much to go off, but it might be somewhere to start looking. Now that I've (almost) cleared get_cyclecounter(9) out of my TODO, I can use it, and then go about getting rid of most malloc(9)s and all TAILQs in random.ko. M -- Mark Murray Join the anti-SPAM movement: http://www.cauce.org To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: CURRENT is freezing again ...
On Fri, Nov 17, 2000 at 05:58:30PM -0800, Kris Kennaway wrote: > On Fri, Nov 17, 2000 at 12:55:28PM +0100, Soren Schmidt wrote: > > > > I thought I was the only one, since my question on the freebsd-current > > > mailing list went unanswered. > > > > You are _not_ alone, there has been numerous complains about this > > on the list, but so far they have not been taken seriously :| > > One of my non-SMP machines reliably wedges whenever I do heavy disk > I/O. I can't break to debugger. > > Nov 4 15:46:41 mollari /boot/kernel/kernel: atapci0: >port 0xffa0-0xffaf at device 7.1 on pci0 > Nov 4 15:46:41 mollari /boot/kernel/kernel: ata0: at 0x1f0 irq 14 on atapci0 > Nov 4 15:46:41 mollari /boot/kernel/kernel: ahc0: >port 0xfc00-0xfcff mem 0xffbeb000-0xffbebfff irq 15 at device 11.0 on pci0 > Nov 4 15:46:41 mollari /boot/kernel/kernel: aic7880: Wide Channel A, SCSI Id=7, >16/255 SCBs Well, adding INVARIANTS, INVARIANTS_SUPPORT, MUTEX_DEBUG and WITNESS didn't give me anything to go off. Interestingly though - I thrashed the disks for about 15 minutes to no avail before kldloading random.ko and firing up ssh, at which point it froze within a few minutes while typing. Obviously one data point isn't much to go off, but it might be somewhere to start looking. Kris PGP signature
Re: CURRENT is freezing again ...
On Thu, Nov 16, 2000 at 12:20:49PM -0500, Steven E. Ames wrote: > It seems to only do it SMP... the same machine built with a non-SMP > kernel (same source code) runs just fine for extended periods. I have a non-SMP machine that is running a 15-nov current kernel, which freezes a few times a day. This morning I found it might coincide with the times that cvsup is running. Disabled that, I'll see if that's where the problem might show up. Freeze means: no keyboard activity possible, machine just does nothing. > > > > On Thu, 16 Nov 2000, Soren Schmidt wrote: > > > > > > > > After last cvsup my machine (Dual PIII, SMP kernel) is freezing > again in > > > > > 10 min after boot... > > > > > > > > You mean "is still freezing" right ? > > > > > > > > Current has been like this for longer than I care to think about, > it > > > > seems those in charge doesn't take these problems seriously > (enough)... > > > > > > I think info about where/how it freezing would be more helpful. > > > > No idea, the system just freezes, no drob to DDB no remote gdb no > > nothing, so its really hard to tell where... > > As to how, just boot current on a fairly fast machine, make a kernel > > and it'll hang in minutes if not less, or just leave it alone and > > it will hang in 10-30 mins... > > > > -Søren > > > > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > > with "unsubscribe freebsd-current" in the body of the message > > > > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-current" in the body of the message -- Nice testing in little China... To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: missing interrupts (was Re: CURRENT is freezing again ...)
On Fri, 17 Nov 2000, Andrew Gallatin wrote: > [fxp isa irq pending but never occurs] > I then wrote a hack which sends an eoi. If I call my hack from ddb > and send an eoi for irq10, everything goes back to normal and the > network interface is back. > > So, is it a race in the interrupt code, or is it something about how > the code is structured? > > On the alpha at least, we get the irq, mask the irq and set the > ithread runnable. When the (isa) ithread runs, it calls the interrupt > handler and then sends an eoi. The interrupt is then unmasked. > > I've peeked at the linux code and noticed that they do things > differently. They first mask the interrupt, and then send the eoi > immediately -- before the handler runs. They then run the handler > and unmask the interrupt. The seem to do this both on i386 and > alpha. FreeBSD does the same thing on i386's as Linux, except for fast interrupts it delays the EOI until the handler returns so that the handler gets called as soon as possible. > Does anybody have any ideas about this? Does something bad > happen if you don't send an eoi in a reasonable amount of time? Delayed EOIs work normally, but lower priority interrupts (according to the ICU's priority scheme) are masked until the EIO is sent. This is bad mainly because the ICU's priority scheme is different from FreeBSD's priority scheme. Possible causes of the problem: 1) isa_handle_intr() claims to send specific EOIs (0x30 | irq) but actually sends non-specific ones (0x20 | garbage). Since interrupts may be handled in non-LIFO order, this results in EOIs being sent for the wrong interrupts. I think this just randomizes the brokenness caused by delaying sending of EOIs. I can't see how it would result in an EOI being lost -- the right number of EOIs will have been sent after all handlers have returned. 2) Insufficient locking for ICU accesses. Again, I can't see how this would affect EOIs. On i386's, some accesses are locked implicitly by sched_lock. 3) Enabling interrupts (and unlocking the ICU) before sending EOI seems to just make things more complicated. It requires the specific EOIs in (1). On alphas, interrupts aren't masked in the ICU while they are handled (the disable/enable args in the call to alpha_setup_intr() in isa_setup_intr() are NULL ...). They are masked by some combination of the CPU and ICU priorities. Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: CURRENT is freezing again ...
On Sat, 18 Nov 2000 11:40:34 -0600 (CST), Jonathan Lemon <[EMAIL PROTECTED]> said: > What version of if_dc.c 1.38 -- Michael D. Harnois, Redeemer Lutheran Church, Washburn, IA [EMAIL PROTECTED] [EMAIL PROTECTED] "It's not what we don't know that hurts us, it's what we know for certain that just ain't so." -- Mark Twain To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: CURRENT is freezing again ...
In article [EMAIL PROTECTED]> you write: >On Fri, 17 Nov 2000 10:30:02 -0800 (PST), John Baldwin <[EMAIL PROTECTED]> said: > >> what the WITNESS code does is perform extra checks on mutex >> enter's and exit's to ensure that we aren't handling mutexes in >> such a way that a deadlock is possible. Thus, it verifies that >> you don't grab mutexes out of order, or that you don't grab >> sleep mutexes with interrupts disabled, etc. > >Is this code meaningful on UP machines? Having been a victim of these >seemingly random freezes since SMPng started, as others have noted, I >decided to compile it in earlier this week. Twice now I've been dumped >into the debugger with this output: > >lock order reversal >1st dc0 last acquired @ ../../pci/if_dc.c:2717 >2nd 0xc0acdb3c dc1 @ ../../pci/if_dc.c: 2717 >3rd 0xc0acab3c dc0 @ ../../pci/if_dc.c: 2929 This is on a UP machine? This looks like you're taking an interrupt on dc1 and then trying to call the dc0 start routine, which shouldn't be possible. (Unless I'm misunderstanding the witness code) What version of if_dc.c are you using? line 2929 doesn't correspond to an instance of "DC_LOCK" in my copy. -- Jonathan this should be released before anything else To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: CURRENT is freezing again ...
On Fri, 17 Nov 2000 10:30:02 -0800 (PST), John Baldwin <[EMAIL PROTECTED]> said: > what the WITNESS code does is perform extra checks on mutex > enter's and exit's to ensure that we aren't handling mutexes in > such a way that a deadlock is possible. Thus, it verifies that > you don't grab mutexes out of order, or that you don't grab > sleep mutexes with interrupts disabled, etc. Is this code meaningful on UP machines? Having been a victim of these seemingly random freezes since SMPng started, as others have noted, I decided to compile it in earlier this week. Twice now I've been dumped into the debugger with this output: lock order reversal 1st dc0 last acquired @ ../../pci/if_dc.c:2717 2nd 0xc0acdb3c dc1 @ ../../pci/if_dc.c: 2717 3rd 0xc0acab3c dc0 @ ../../pci/if_dc.c: 2929 Debugger ("witness_enter") Stopped at Debugger+0x39: movb $0, in.Debugger.639 -- Michael D. Harnois, Redeemer Lutheran Church, Washburn, IA [EMAIL PROTECTED] [EMAIL PROTECTED] The atheist staring from the attic window is often nearer to God than the believer caught up in his own false image of God. -- Martin Buber To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: CURRENT is freezing again ...
> : >You can also short IOCHK to ground to get an NMI which kicks you into > : >the debugger, even in an interrupt context. > : > : Bad news for you warner: On a too large sample of my newer > : motherboards this doesn't work anymore :-( > > There's also a pci signal that you can either pull up or pull down > that's supposed to give you the same results. I've never really > needed to know it. SERR behaviour is programmable and there is no standard for it. 8( -- ... every activity meets with opposition, everyone who acts has his rivals and unfortunately opponents also. But not because people want to be opponents, rather because the tasks and relationships force people to take different points of view. [Dr. Fritz Todt] V I C T O R Y N O T V E N G E A N C E To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
missing interrupts (was Re: CURRENT is freezing again ...)
Valentin Chopov writes: > Hi, > > After last cvsup my machine (Dual PIII, SMP kernel) is freezing again in > 10 min after boot... > I've seen one similar problem on an alpha UP1000 that I'd like some input about. The UP1000 is essentially an alpha 21264 stuffed into an AMD Athlon system. It has an AMD-751 chipset and handles all device interrupts via an isa interrupt controller. I've noticed that under "heavy" load (gdb -k kernel.debug /dev/mem on an NFS filesystem), the network interface goes away, never to reappear. All I see is "fxp0: device timeout" on console. This started with SMPng. After a little bit of investigation with ddb, I discovered that the NIC's irq was pending. Eg: login: fxp0: device timeout Stopped at siointr1+0x17c: br zero,siointr1+0x32c db> call isa_irq_pending() 0x410 The fxp interface is at ir10, so 0x410 means there's an irq 10 pending. I then wrote a hack which sends an eoi. If I call my hack from ddb and send an eoi for irq10, everything goes back to normal and the network interface is back. So, is it a race in the interrupt code, or is it something about how the code is structured? On the alpha at least, we get the irq, mask the irq and set the ithread runnable. When the (isa) ithread runs, it calls the interrupt handler and then sends an eoi. The interrupt is then unmasked. I've peeked at the linux code and noticed that they do things differently. They first mask the interrupt, and then send the eoi immediately -- before the handler runs. They then run the handler and unmask the interrupt. The seem to do this both on i386 and alpha. Does anybody have any ideas about this? Does something bad happen if you don't send an eoi in a reasonable amount of time? Drew -- Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin Duke University Email: [EMAIL PROTECTED] Department of Computer Science Phone: (919) 660-6590 To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: CURRENT is freezing again ...
Warner Losh writes: > In message <[EMAIL PROTECTED]> Sheldon Hearn writes: > : The problem with a hard lock-up out of which you can't escape into the > : debugger is that it makes meaningful bug reports impossible. My non-SMP > : workstation has exhibited apparently arbitrary lock-ups since the advent > : of SMPng. > > You can also short IOCHK to ground to get an NMI which kicks you into > the debugger, even in an interrupt context. I have a card I built > from an old multi-function card to do this. I think it is A1 and A2, > but I don't have my ISA bus spec handy. Or you can use an alpha; most of which have a halt button that will drop you into the SRM console. ;) Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: CURRENT is freezing again ...
On Fri, Nov 17, 2000 at 12:55:28PM +0100, Soren Schmidt wrote: > > I thought I was the only one, since my question on the freebsd-current > > mailing list went unanswered. > > You are _not_ alone, there has been numerous complains about this > on the list, but so far they have not been taken seriously :| One of my non-SMP machines reliably wedges whenever I do heavy disk I/O. I can't break to debugger. Nov 4 15:46:41 mollari /boot/kernel/kernel: atapci0: port 0xffa0-0xffaf at device 7.1 on pci0 Nov 4 15:46:41 mollari /boot/kernel/kernel: ata0: at 0x1f0 irq 14 on atapci0 Nov 4 15:46:41 mollari /boot/kernel/kernel: ahc0: port 0xfc00-0xfcff mem 0xffbeb000-0xffbebfff irq 15 at device 11.0 on pci0 Nov 4 15:46:41 mollari /boot/kernel/kernel: aic7880: Wide Channel A, SCSI Id=7, 16/255 SCBs Kris PGP signature
Re: CURRENT is freezing again ...
On 17-Nov-00 Sheldon Hearn wrote: > > > On Fri, 17 Nov 2000 10:30:02 PST, John Baldwin wrote: > >> # sysctl -w debug.ktr_verbose=1 ; command_that_makes_my_machine_go_boom > > All very well and good once you've figured out which command makes your > machine go boom. Yes, I know. I didn't say the method was perfect. I find it frustrating as well. :-/ > But as I said, the locks I'm getting appear completely arbitrary. I'm > no hard-core hacker, but I'm not completely clueless when it comes to > isolating problems by way of deductive reasoning, and I'm stumped as to > what's causing these. I have no idea either. :( I can't magically fix them, and if I don't commit any of this stuff which works on my boxes as far as I can tell, then others won't test it and we won't make progress. If a buildworld usually triggers, then try writing a small C program that beats on a file in /tmp or /var, then start up 10 copies of it in a script that you run as your 'command_that_makes_my_machine_go_boom' to see if it works. :-P > Ciao, > Sheldon. -- John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: CURRENT is freezing again ...
On Fri, 17 Nov 2000 10:30:02 PST, John Baldwin wrote: > # sysctl -w debug.ktr_verbose=1 ; command_that_makes_my_machine_go_boom All very well and good once you've figured out which command makes your machine go boom. But as I said, the locks I'm getting appear completely arbitrary. I'm no hard-core hacker, but I'm not completely clueless when it comes to isolating problems by way of deductive reasoning, and I'm stumped as to what's causing these. Ciao, Sheldon. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: CURRENT is freezing again ...
On Fri, Nov 17, 2000 at 11:26:02AM -0700, Warner Losh wrote: > In message <[EMAIL PROTECTED]> Sheldon Hearn writes: > : The problem with a hard lock-up out of which you can't escape into the > : debugger is that it makes meaningful bug reports impossible. My non-SMP > : workstation has exhibited apparently arbitrary lock-ups since the advent > : of SMPng. > > You can also short IOCHK to ground to get an NMI which kicks you into > the debugger, even in an interrupt context. I have a card I built > from an old multi-function card to do this. I think it is A1 and A2, > but I don't have my ISA bus spec handy. Just stick a metal pin (ballpoint works well) into the ISA connector between the pins closest to the back of the machine. That is IOCHKN and GND respectively. Wilko [hardware designer gone bad..] -- Wilko Bulte Arnhem, the Netherlands [EMAIL PROTECTED] http://www.freebsd.org http://www.nlfug.nl To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: CURRENT is freezing again ...
In message <25636.974487067@critter> Poul-Henning Kamp writes: : In message <[EMAIL PROTECTED]>, Warner Losh writes: : >In message <[EMAIL PROTECTED]> Sheldon Hearn writes: : >: The problem with a hard lock-up out of which you can't escape into the : >: debugger is that it makes meaningful bug reports impossible. My non-SMP : >: workstation has exhibited apparently arbitrary lock-ups since the advent : >: of SMPng. : > : >You can also short IOCHK to ground to get an NMI which kicks you into : >the debugger, even in an interrupt context. : : Bad news for you warner: On a too large sample of my newer : motherboards this doesn't work anymore :-( There's also a pci signal that you can either pull up or pull down that's supposed to give you the same results. I've never really needed to know it. Warner To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: CURRENT is freezing again ...
In message <[EMAIL PROTECTED]>, Warner Losh writes: >In message <[EMAIL PROTECTED]> Sheldon Hearn writes: >: The problem with a hard lock-up out of which you can't escape into the >: debugger is that it makes meaningful bug reports impossible. My non-SMP >: workstation has exhibited apparently arbitrary lock-ups since the advent >: of SMPng. > >You can also short IOCHK to ground to get an NMI which kicks you into >the debugger, even in an interrupt context. Bad news for you warner: On a too large sample of my newer motherboards this doesn't work anymore :-( -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: CURRENT is freezing again ...
On 17-Nov-00 Sheldon Hearn wrote: > > > On Thu, 16 Nov 2000 10:42:51 PST, Alfred Perlstein wrote: > >> I would try a new kernel, and perhaps some collabaration with John >> to debug these problems rather than just complaining about the >> situation. I see at least two experianced developers in the CC >> list, there's no reason for these poor bug reports. > > The problem with a hard lock-up out of which you can't escape into the > debugger is that it makes meaningful bug reports impossible. My non-SMP > workstation has exhibited apparently arbitrary lock-ups since the advent > of SMPng. When I get a hard lock like this I usually try to see if I can reproduce it in single user mode. If I can, then I compile KTR into my kernel with the following options: KTR, KTR_EXTEND, KTR_COMPILE="0x3fff", KTR_MASK="(KTR_INTR|KTR_PROC)". Then I boot into single user (so I don't dirty filesystems), mount any needed fs's as read only if possible, and run the following command: # sysctl -w debug.ktr_verbose=1 ; command_that_makes_my_machine_go_boom And then stare at the tracing output on teh screen to see what the machine was doing when it hung. I.e., to see if it is still getting interrupts, and to see what process it died in, etc. > From my understanding, John's WITNESS code allows us to break into the > debugger from within interrupt context. If the lock-ups are happening > in there, then this may help us provide better bug reports. Err, not quite. It's BSD/OS's WITNESS code, and what the WITNESS code does is perform extra checks on mutex enter's and exit's to ensure that we aren't handling mutexes in such a way that a deadlock is possible. Thus, it verifies that you don't grab mutexes out of order, or that you don't grab sleep mutexes with interrupts disabled, etc. -- John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: CURRENT is freezing again ...
In message <[EMAIL PROTECTED]> Sheldon Hearn writes: : The problem with a hard lock-up out of which you can't escape into the : debugger is that it makes meaningful bug reports impossible. My non-SMP : workstation has exhibited apparently arbitrary lock-ups since the advent : of SMPng. You can also short IOCHK to ground to get an NMI which kicks you into the debugger, even in an interrupt context. I have a card I built from an old multi-function card to do this. I think it is A1 and A2, but I don't have my ISA bus spec handy. Warner To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: CURRENT is freezing again ...
On Fri, 17 Nov 2000 12:55:28 +0100 (CET), Soren Schmidt <[EMAIL PROTECTED]> said: > It doesn't help here at least, the machine(s) just lock up solid > only reset or a powercycle can bring them back... Same here ... as others noted, started with SMPng ... -- Michael D. Harnois, Redeemer Lutheran Church, Washburn, IA [EMAIL PROTECTED] [EMAIL PROTECTED] There are things that are so serious that you can only joke about them. -- Werner Heisenberg To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: CURRENT is freezing again ...
It seems Sheldon Hearn wrote: > > I would try a new kernel, and perhaps some collabaration with John > > to debug these problems rather than just complaining about the > > situation. I see at least two experianced developers in the CC > > list, there's no reason for these poor bug reports. > > The problem with a hard lock-up out of which you can't escape into the > debugger is that it makes meaningful bug reports impossible. My non-SMP > workstation has exhibited apparently arbitrary lock-ups since the advent > of SMPng. > > I thought I was the only one, since my question on the freebsd-current > mailing list went unanswered. You are _not_ alone, there has been numerous complains about this on the list, but so far they have not been taken seriously :| > >From my understanding, John's WITNESS code allows us to break into the > debugger from within interrupt context. If the lock-ups are happening > in there, then this may help us provide better bug reports. It doesn't help here at least, the machine(s) just lock up solid only reset or a powercycle can bring them back... > Oh, and a couple of deep breaths are probably in order. :-) Yeah like *sigh* -Søren To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: CURRENT is freezing again ...
On Thu, 16 Nov 2000 10:42:51 PST, Alfred Perlstein wrote: > I would try a new kernel, and perhaps some collabaration with John > to debug these problems rather than just complaining about the > situation. I see at least two experianced developers in the CC > list, there's no reason for these poor bug reports. The problem with a hard lock-up out of which you can't escape into the debugger is that it makes meaningful bug reports impossible. My non-SMP workstation has exhibited apparently arbitrary lock-ups since the advent of SMPng. I thought I was the only one, since my question on the freebsd-current mailing list went unanswered. >From my understanding, John's WITNESS code allows us to break into the debugger from within interrupt context. If the lock-ups are happening in there, then this may help us provide better bug reports. Oh, and a couple of deep breaths are probably in order. :-) Ciao, Sheldon. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: CURRENT is freezing again ...
It seems Michael C . Wu wrote: > I had those problems too a while ago on a UP p3-650 laptop. Finally I just > newfs'ed the machine and installed the 20001028 snapshot, then cvsupp'ed > to 20001122. The laptop now works well. What I saw was processes > forking and forking again until the machine runs out of memory and > swap. I think it may be some old libraries left over from > upgrades and make world. Hmm, this is a new installed box, so there is no old leftovers... -Søren To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: CURRENT is freezing again ...
On Thu, Nov 16, 2000 at 10:27:39PM +0100, Soren Schmidt scribbled: | It seems John Baldwin wrote: | > | > 1) What revision of sys/kern/kern_synch.c do you have? I fixed several things | > yesterday, and the latest version is 1.108. | | 1.108 | | > 2) If you do have the latest version, have you compiled a kernel with WITNESS, | > INVARIANTS, and INVARIANT_SUPPORT to see how it runs? | | Have those in too... | | It still cant compile a kernel, it hangs itself in ~30 secs, no messages, | no hints, no nothing, the machine just locks up solid as usual.. | | Mind you the same machines run 4.2 and PRE_SMPNG without a hitch... | | > Also, I have noticed that occasionally on my SMP boxes the console seems to | > lose itself. By lose itself, I mean that all output stops, and it doesn't | > process any input. If I hit Ctrl-Alt-Backspace to break into the debugger, it | > suddenly catches up and processes all pending events before dropping into teh | > debugger, but hangs again when I continue from ddb. However, the rest of hte | > machine works fine during this time. I can ssh in, build kernels, reboot, etc. | > without any problem. | | It has been like this almost since the SMPNG stuff vent in, at least on all my | -current machines... I had those problems too a while ago on a UP p3-650 laptop. Finally I just newfs'ed the machine and installed the 20001028 snapshot, then cvsupp'ed to 20001122. The laptop now works well. What I saw was processes forking and forking again until the machine runs out of memory and swap. I think it may be some old libraries left over from upgrades and make world. -- +--+ | [EMAIL PROTECTED] | [EMAIL PROTECTED] | | http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. | +--+ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: CURRENT is freezing again ...
It seems John Baldwin wrote: > > 1) What revision of sys/kern/kern_synch.c do you have? I fixed several things > yesterday, and the latest version is 1.108. 1.108 > 2) If you do have the latest version, have you compiled a kernel with WITNESS, > INVARIANTS, and INVARIANT_SUPPORT to see how it runs? Have those in too... It still cant compile a kernel, it hangs itself in ~30 secs, no messages, no hints, no nothing, the machine just locks up solid as usual.. Mind you the same machines run 4.2 and PRE_SMPNG without a hitch... > Also, I have noticed that occasionally on my SMP boxes the console seems to > lose itself. By lose itself, I mean that all output stops, and it doesn't > process any input. If I hit Ctrl-Alt-Backspace to break into the debugger, it > suddenly catches up and processes all pending events before dropping into teh > debugger, but hangs again when I continue from ddb. However, the rest of hte > machine works fine during this time. I can ssh in, build kernels, reboot, etc. > without any problem. It has been like this almost since the SMPNG stuff vent in, at least on all my -current machines... -Søren To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
RE: CURRENT is freezing again ...
On 16-Nov-00 Valentin Chopov wrote: > Hi, > > After last cvsup my machine (Dual PIII, SMP kernel) is freezing again in > 10 min after boot... > > Thanks, > > Val Two questions: 1) What revision of sys/kern/kern_synch.c do you have? I fixed several things yesterday, and the latest version is 1.108. 2) If you do have the latest version, have you compiled a kernel with WITNESS, INVARIANTS, and INVARIANT_SUPPORT to see how it runs? Also, I have noticed that occasionally on my SMP boxes the console seems to lose itself. By lose itself, I mean that all output stops, and it doesn't process any input. If I hit Ctrl-Alt-Backspace to break into the debugger, it suddenly catches up and processes all pending events before dropping into teh debugger, but hangs again when I continue from ddb. However, the rest of hte machine works fine during this time. I can ssh in, build kernels, reboot, etc. without any problem. -- John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: CURRENT is freezing again ...
* Steven E. Ames <[EMAIL PROTECTED]> [001116 09:27] wrote: > It seems to only do it SMP... the same machine built with a non-SMP > kernel (same source code) runs just fine for extended periods. John just checked in some code last night that may address your problems. I would try a new kernel, and perhaps some collabaration with John to debug these problems rather than just complaining about the situation. I see at least two experianced developers in the CC list, there's no reason for these poor bug reports. -Alfred To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: CURRENT is freezing again ...
> It seems Boris Popov wrote: > > On Thu, 16 Nov 2000, Soren Schmidt wrote: > > > > > > After last cvsup my machine (Dual PIII, SMP kernel) is freezing again in > > > > 10 min after boot... > > > > > > You mean "is still freezing" right ? > > > > > > Current has been like this for longer than I care to think about, it > > > seems those in charge doesn't take these problems seriously (enough)... > > > > I think info about where/how it freezing would be more helpful. > > No idea, the system just freezes, no drob to DDB no remote gdb no > nothing, so its really hard to tell where... > As to how, just boot current on a fairly fast machine, make a kernel > and it'll hang in minutes if not less, or just leave it alone and > it will hang in 10-30 mins... I have the same problem on a dual PII 400mhz. I haven't tried to remove the SMP support, but I have not too much time to cvsup and to make anything else. I'll try to boot the GENERIC (damn !%&!& , I always repeat to myself that is a good habits to compile the GENERIC too after updates... but I never do... :-( ) -- Regards... Gianmarco "Unix expert since yesterday" http://www.giovannelli.it To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: CURRENT is freezing again ...
It seems to only do it SMP... the same machine built with a non-SMP kernel (same source code) runs just fine for extended periods. -Steve - Original Message - From: "Soren Schmidt" <[EMAIL PROTECTED]> To: "Boris Popov" <[EMAIL PROTECTED]> Cc: "Valentin Chopov" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Thursday, November 16, 2000 12:17 PM Subject: Re: CURRENT is freezing again ... > It seems Boris Popov wrote: > > On Thu, 16 Nov 2000, Soren Schmidt wrote: > > > > > > After last cvsup my machine (Dual PIII, SMP kernel) is freezing again in > > > > 10 min after boot... > > > > > > You mean "is still freezing" right ? > > > > > > Current has been like this for longer than I care to think about, it > > > seems those in charge doesn't take these problems seriously (enough)... > > > > I think info about where/how it freezing would be more helpful. > > No idea, the system just freezes, no drob to DDB no remote gdb no > nothing, so its really hard to tell where... > As to how, just boot current on a fairly fast machine, make a kernel > and it'll hang in minutes if not less, or just leave it alone and > it will hang in 10-30 mins... > > -Søren > > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-current" in the body of the message > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: CURRENT is freezing again ...
It seems Boris Popov wrote: > On Thu, 16 Nov 2000, Soren Schmidt wrote: > > > > After last cvsup my machine (Dual PIII, SMP kernel) is freezing again in > > > 10 min after boot... > > > > You mean "is still freezing" right ? > > > > Current has been like this for longer than I care to think about, it > > seems those in charge doesn't take these problems seriously (enough)... > > I think info about where/how it freezing would be more helpful. No idea, the system just freezes, no drob to DDB no remote gdb no nothing, so its really hard to tell where... As to how, just boot current on a fairly fast machine, make a kernel and it'll hang in minutes if not less, or just leave it alone and it will hang in 10-30 mins... -Søren To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: CURRENT is freezing again ...
On Thu, 16 Nov 2000, Soren Schmidt wrote: > > After last cvsup my machine (Dual PIII, SMP kernel) is freezing again in > > 10 min after boot... > > You mean "is still freezing" right ? > > Current has been like this for longer than I care to think about, it > seems those in charge doesn't take these problems seriously (enough)... I think info about where/how it freezing would be more helpful. > I've started doing development on -stable instead, it goes nowhere > on -current - works fine for me even with my new evil hacks :) -- Boris Popov http://www.butya.kz/~bp/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: CURRENT is freezing again ...
It seems Valentin Chopov wrote: > Hi, > > After last cvsup my machine (Dual PIII, SMP kernel) is freezing again in > 10 min after boot... You mean "is still freezing" right ? Current has been like this for longer than I care to think about, it seems those in charge doesn't take these problems seriously (enough)... I've started doing development on -stable instead, it goes nowhere on -current -Søren To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message