Re: Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)
[EMAIL PROTECTED] wrote: The enclosed patch implements a virtual NMI pushbutton by programming the IOAPIC to deliver an NMI when sio1 generates an interrupt. This would be a nice kernel option... :-) -- Daniel C. Sobral(8-DCS) [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] "There is no spoon." -- Kiki To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)
[EMAIL PROTECTED] wrote: Again I'll offer to run any and all code or patches to -current you guys can come up with, but I simply dont have the time to sit down and analyze into details what you have been doing... The enclosed patch implements a virtual NMI pushbutton by programming the IOAPIC to deliver an NMI when sio1 generates an interrupt. DDB should be defined in the kernel config file. getty should not run on ttyd1 when this patch is applied. A serial console on sio0 is recommended. If you still cannot break into the kernel debugger when the machine locks up then a rogue device is probably blocking the system (or the debugger is trying to obtain a mutex held by somebody else) - Tor Egge cool. What are the instructions for using this? should something have sio1 open? can a paperclip be used to generat the interupt by connecting pins 2 and 3? etc. -- __--_|\ Julian Elischer / \ [EMAIL PROTECTED] ( OZ) World tour 2000 --- X_.---._/ from Perth, presently in: Budapest v To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)
cool. What are the instructions for using this? should something have sio1 open? I use conserver conserver conserver hostnull-modem serial cables test machine label testport AA - sio0 serial console testnmi port BB - sio1 NMI I start two conserver sessions, one using test (for the console access) and one using port testnmi (for NMI). When I need an NMI, I just press return or space in the session using port BB. This only works when the test machine runs an SMP kernel with DDB and the virtual NMI pushbutton patch. No programs on the test machine should open sio1, since that could cause interrupts (which are now NMIs). can a paperclip be used to generat the interupt by connecting pins 2 and 3? I haven't tried that. - Tor Egge To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)
On Wed, Jan 17, 2001 at 07:42:26PM +0100, Soren Schmidt wrote: Basically if you're expecting me or the SMP team to figure out what's going on without more info, you're pretty much out of luck. See above, not really possible, we have been trying to find some (affordable) HW that could be used to preserve a log over a boot, but so far I havn't been able to find anything that works, and is fast enough to not effect the system too much... I did some research on this and am convinced that at least some video cards would work as memory buffers for KTR logs. Specifically, someone mentioned to me yesterday that their Matrox Millennium II flashes the X desktop during startup from a previous invocation across warm boots. (I pursued some alternatives and found the PCI RAM cards to be prohibitively expensive (more than $700), and sound cards to not have enough RAM except on old SoundBlaster AWE cards.) For someone with device driver experience, I expect it would be a few hours of effort to make it possible to use a second video card (or even the primary one for that matter) as a DMA region in which KTR logs can be saved, so that there is a way to debug even these spontaneous reboots you're having. Maybe I'll eventually get to implementing this myself, but to be honest, I don't have a driving need for it right now, whereas you do. =) You're experiencing a stability problem that none of us (SMPng people) can reproduce. We'd love to fix the problem, but without more information, your reports are only slightly more useful than the typical newbie "it's broken" reports, though certainly more frustrating. Jason To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)
It seems Jason Evans wrote: On Wed, Jan 17, 2001 at 07:42:26PM +0100, Soren Schmidt wrote: Basically if you're expecting me or the SMP team to figure out what's going on without more info, you're pretty much out of luck. See above, not really possible, we have been trying to find some (affordable) HW that could be used to preserve a log over a boot, but so far I havn't been able to find anything that works, and is fast enough to not effect the system too much... I did some research on this and am convinced that at least some video cards would work as memory buffers for KTR logs. Specifically, someone mentioned to me yesterday that their Matrox Millennium II flashes the X desktop during startup from a previous invocation across warm boots. (I pursued some alternatives and found the PCI RAM cards to be prohibitively expensive (more than $700), and sound cards to not have enough RAM except on old SoundBlaster AWE cards.) Hmm, I've been toying with this, but the el cheapo videocards I have all lose random amounts of their video RAM over a reset, probably due to the DRAM refresh being absent for too long... For someone with device driver experience, I expect it would be a few hours of effort to make it possible to use a second video card (or even the primary one for that matter) as a DMA region in which KTR logs can be saved, so that there is a way to debug even these spontaneous reboots you're having. Maybe I'll eventually get to implementing this myself, but to be honest, I don't have a driving need for it right now, whereas you do. =) Do you need DMA ?? a simple ptr to the mem should do (and much easier to get to work)... You're experiencing a stability problem that none of us (SMPng people) can reproduce. We'd love to fix the problem, but without more information, your reports are only slightly more useful than the typical newbie "it's broken" reports, though certainly more frustrating. Well, I'm not alone thats for sure, and since this has been so for months I've almost gotten to the impression that something fundamental must be wrong, however until now I've just been told to go away :) I know these problems are a bitch to find, but we need to take this at least semi professionalistic and find out whats wrong, or 5.0 will be a disater when it hits the streets. I dont have the time to play around with SMP for the time being, but I do expect the SMPng group to take these problems seriously instead of the "it works here" attitude thats been hollering down the halls lately.. Again I'll offer to run any and all code or patches to -current you guys can come up with, but I simply dont have the time to sit down and analyze into details what you have been doing... -Sren To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)
:I did some research on this and am convinced that at least some video cards :would work as memory buffers for KTR logs. Specifically, someone mentioned :to me yesterday that their Matrox Millennium II flashes the X desktop :during startup from a previous invocation across warm boots. (I pursued :some alternatives and found the PCI RAM cards to be prohibitively expensive :(more than $700), and sound cards to not have enough RAM except on old :SoundBlaster AWE cards.) My Voodoo 3 2000 does the same thing... crash, reboot, bring up X, and the original pre-boot display flashes before X reinitializes the screen. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)
Again I'll offer to run any and all code or patches to -current you guys can come up with, but I simply dont have the time to sit down and analyze into details what you have been doing... The enclosed patch implements a virtual NMI pushbutton by programming the IOAPIC to deliver an NMI when sio1 generates an interrupt. DDB should be defined in the kernel config file. getty should not run on ttyd1 when this patch is applied. A serial console on sio0 is recommended. If you still cannot break into the kernel debugger when the machine locks up then a rogue device is probably blocking the system (or the debugger is trying to obtain a mutex held by somebody else) - Tor Egge Index: sys/i386/i386/mpapic.c === RCS file: /home/ncvs/src/sys/i386/i386/mpapic.c,v retrieving revision 1.45 diff -u -r1.45 mpapic.c --- sys/i386/i386/mpapic.c 2001/01/10 04:43:46 1.45 +++ sys/i386/i386/mpapic.c 2001/01/18 05:44:30 @@ -269,6 +269,41 @@ /* return GOOD status */ return 0; } + + +void +enable_sio_NMI(int irq) +{ + u_char select; /* the select register is 8 bits */ + u_int32_t flags; /* the window register is 32 bits */ + u_int32_t target;/* the window register is 32 bits */ + u_int32_t vector;/* the window register is 32 bits */ + int apic; + int pin; + + if (irq 0 || irq 15) { + printf("Could not enable NMI for irq %d\n", irq); + return; + } + apic = int_to_apicintpin[irq].ioapic; + pin = int_to_apicintpin[irq].int_pin; + + target = CPU_TO_ID(0) 24; + select = IOAPIC_REDTBL0 + (2 * pin); + vector = TPR_FAST_INTS + irq; + flags = ((u_int32_t) + (IOART_INTMCLR | + IOART_TRGREDG | + IOART_INTAHI | + IOART_DESTPHY | + IOART_DELNMI)); + + io_apic_write(apic, select, flags | vector); + io_apic_write(apic, select + 1, target); + printf("Enabled NMI for irq %d\n", irq); + printf("XXX IOAPIC #%d intpin %d -irq %d vector 0x%x (Delivery mode NMI)\n", + apic, pin, irq, vector); +} #undef DEFAULT_ISA_FLAGS #undef DEFAULT_FLAGS Index: sys/i386/i386/trap.c === RCS file: /home/ncvs/src/sys/i386/i386/trap.c,v retrieving revision 1.164 diff -u -r1.164 trap.c --- sys/i386/i386/trap.c2001/01/10 04:43:46 1.164 +++ sys/i386/i386/trap.c2001/01/18 05:44:30 @@ -248,7 +248,8 @@ atomic_add_int(cnt.v_trap, 1); - if ((frame.tf_eflags PSL_I) == 0) { + if ((frame.tf_eflags PSL_I) == 0 + frame.tf_trapno != T_NMI) { /* * Buggy application or kernel code has disabled * interrupts and then trapped. Enabling interrupts @@ -285,8 +286,38 @@ enable_intr(); } - mtx_enter(Giant, MTX_DEF); + if (frame.tf_trapno == T_NMI) { + /* If we can't get Giant then forward NMI to next CPU */ + if (mtx_try_enter(Giant, MTX_DEF) == 0) { + u_long icr_lo; + u_long icr_hi; + int target; + + target = PCPU_GET(cpuid) + 1; + if (((1 target) PCPU_GET(other_cpus)) == 0) + target = 0; + + /* write the destination field for the target AP */ + icr_hi = (lapic.icr_hi ~APIC_ID_MASK) | + (cpu_num_to_apic_id[target] 24); + lapic.icr_hi = icr_hi; + + /* write command */ + icr_lo = (lapic.icr_lo APIC_RESV2_MASK) | + APIC_DEST_DESTFLD | APIC_DELMODE_NMI | 0xff; + lapic.icr_lo = icr_lo; + + /* wait for pending status end */ + while (lapic.icr_lo APIC_DELSTAT_MASK) + /* spin */ ; + __asm __volatile("int $0xff"); + + return; + } + } else + mtx_enter(Giant, MTX_DEF); + #if defined(I586_CPU) !defined(NO_F00F_HACK) restart: #endif @@ -388,6 +419,9 @@ */ if (ddb_on_nmi) { printf ("NMI ... going to debugger\n"); + sioEATintr(); + __asm __volatile("int $0xff"); + enable_intr(); kdb_trap (type, 0, frame); } #endif /* DDB