Re: Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)
> cool. > What are the instructions for using this? > should something have sio1 open? I use conserver conserver conserver hostnull-modem serial cables test machine label testport AA - sio0 serial console testnmi port BB - sio1 NMI I start two conserver sessions, one using test (for the console access) and one using port testnmi (for NMI). When I need an NMI, I just press return or space in the session using port BB. This only works when the test machine runs an SMP kernel with DDB and the virtual NMI pushbutton patch. No programs on the test machine should open sio1, since that could cause interrupts (which are now NMIs). > can a paperclip be used to generat the interupt by connecting pins 2 and 3? I haven't tried that. - Tor Egge To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)
[EMAIL PROTECTED] wrote: > > > Again I'll offer to run any and all code or patches to -current you > > guys can come up with, but I simply dont have the time to sit down > > and analyze into details what you have been doing... > > The enclosed patch implements a virtual NMI pushbutton by programming > the IOAPIC to deliver an NMI when sio1 generates an interrupt. > > DDB should be defined in the kernel config file. > > getty should not run on ttyd1 when this patch is applied. > > A serial console on sio0 is recommended. > > If you still cannot break into the kernel debugger when the machine > locks up then a rogue device is probably blocking the system > (or the debugger is trying to obtain a mutex held by somebody else) > > - Tor Egge cool. What are the instructions for using this? should something have sio1 open? can a paperclip be used to generat the interupt by connecting pins 2 and 3? etc. -- __--_|\ Julian Elischer / \ [EMAIL PROTECTED] ( OZ) World tour 2000 ---> X_.---._/ from Perth, presently in: Budapest v To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)
[EMAIL PROTECTED] wrote: > > The enclosed patch implements a virtual NMI pushbutton by programming > the IOAPIC to deliver an NMI when sio1 generates an interrupt. This would be a nice kernel option... :-) -- Daniel C. Sobral(8-DCS) [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] "There is no spoon." -- Kiki To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)
> Again I'll offer to run any and all code or patches to -current you > guys can come up with, but I simply dont have the time to sit down > and analyze into details what you have been doing... The enclosed patch implements a virtual NMI pushbutton by programming the IOAPIC to deliver an NMI when sio1 generates an interrupt. DDB should be defined in the kernel config file. getty should not run on ttyd1 when this patch is applied. A serial console on sio0 is recommended. If you still cannot break into the kernel debugger when the machine locks up then a rogue device is probably blocking the system (or the debugger is trying to obtain a mutex held by somebody else) - Tor Egge Index: sys/i386/i386/mpapic.c === RCS file: /home/ncvs/src/sys/i386/i386/mpapic.c,v retrieving revision 1.45 diff -u -r1.45 mpapic.c --- sys/i386/i386/mpapic.c 2001/01/10 04:43:46 1.45 +++ sys/i386/i386/mpapic.c 2001/01/18 05:44:30 @@ -269,6 +269,41 @@ /* return GOOD status */ return 0; } + + +void +enable_sio_NMI(int irq) +{ + u_char select; /* the select register is 8 bits */ + u_int32_t flags; /* the window register is 32 bits */ + u_int32_t target;/* the window register is 32 bits */ + u_int32_t vector;/* the window register is 32 bits */ + int apic; + int pin; + + if (irq < 0 || irq > 15) { + printf("Could not enable NMI for irq %d\n", irq); + return; + } + apic = int_to_apicintpin[irq].ioapic; + pin = int_to_apicintpin[irq].int_pin; + + target = CPU_TO_ID(0) << 24; + select = IOAPIC_REDTBL0 + (2 * pin); + vector = TPR_FAST_INTS + irq; + flags = ((u_int32_t) + (IOART_INTMCLR | + IOART_TRGREDG | + IOART_INTAHI | + IOART_DESTPHY | + IOART_DELNMI)); + + io_apic_write(apic, select, flags | vector); + io_apic_write(apic, select + 1, target); + printf("Enabled NMI for irq %d\n", irq); + printf("XXX IOAPIC #%d intpin %d ->irq %d vector 0x%x (Delivery mode NMI)\n", + apic, pin, irq, vector); +} #undef DEFAULT_ISA_FLAGS #undef DEFAULT_FLAGS Index: sys/i386/i386/trap.c === RCS file: /home/ncvs/src/sys/i386/i386/trap.c,v retrieving revision 1.164 diff -u -r1.164 trap.c --- sys/i386/i386/trap.c2001/01/10 04:43:46 1.164 +++ sys/i386/i386/trap.c2001/01/18 05:44:30 @@ -248,7 +248,8 @@ atomic_add_int(&cnt.v_trap, 1); - if ((frame.tf_eflags & PSL_I) == 0) { + if ((frame.tf_eflags & PSL_I) == 0 && + frame.tf_trapno != T_NMI) { /* * Buggy application or kernel code has disabled * interrupts and then trapped. Enabling interrupts @@ -285,8 +286,38 @@ enable_intr(); } - mtx_enter(&Giant, MTX_DEF); + if (frame.tf_trapno == T_NMI) { + /* If we can't get Giant then forward NMI to next CPU */ + if (mtx_try_enter(&Giant, MTX_DEF) == 0) { + u_long icr_lo; + u_long icr_hi; + int target; + + target = PCPU_GET(cpuid) + 1; + if (((1 << target) & PCPU_GET(other_cpus)) == 0) + target = 0; + + /* write the destination field for the target AP */ + icr_hi = (lapic.icr_hi & ~APIC_ID_MASK) | + (cpu_num_to_apic_id[target] << 24); + lapic.icr_hi = icr_hi; + + /* write command */ + icr_lo = (lapic.icr_lo & APIC_RESV2_MASK) | + APIC_DEST_DESTFLD | APIC_DELMODE_NMI | 0xff; + lapic.icr_lo = icr_lo; + + /* wait for pending status end */ + while (lapic.icr_lo & APIC_DELSTAT_MASK) + /* spin */ ; + __asm __volatile("int $0xff"); + + return; + } + } else + mtx_enter(&Giant, MTX_DEF); + #if defined(I586_CPU) && !defined(NO_F00F_HACK) restart: #endif @@ -388,6 +419,9 @@ */ if (ddb_on_nmi) { printf ("NMI ... going to debugger\n"); + sioEATintr(); + __asm __volatile("int $0xff"); + enable_intr(); kdb_trap (type, 0, &frame);
Re: Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)
:I did some research on this and am convinced that at least some video cards :would work as memory buffers for KTR logs. Specifically, someone mentioned :to me yesterday that their Matrox Millennium II flashes the X desktop :during startup from a previous invocation across warm boots. (I pursued :some alternatives and found the PCI RAM cards to be prohibitively expensive :(more than $700), and sound cards to not have enough RAM except on old :SoundBlaster AWE cards.) My Voodoo 3 2000 does the same thing... crash, reboot, bring up X, and the original pre-boot display flashes before X reinitializes the screen. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)
It seems Jason Evans wrote: > On Wed, Jan 17, 2001 at 07:42:26PM +0100, Soren Schmidt wrote: > > > Basically if you're expecting me or the SMP team to figure out > > > what's going on without more info, you're pretty much out of luck. > > > > See above, not really possible, we have been trying to find some > > (affordable) HW that could be used to preserve a log over a boot, > > but so far I havn't been able to find anything that works, and > > is fast enough to not effect the system too much... > > I did some research on this and am convinced that at least some video cards > would work as memory buffers for KTR logs. Specifically, someone mentioned > to me yesterday that their Matrox Millennium II flashes the X desktop > during startup from a previous invocation across warm boots. (I pursued > some alternatives and found the PCI RAM cards to be prohibitively expensive > (more than $700), and sound cards to not have enough RAM except on old > SoundBlaster AWE cards.) Hmm, I've been toying with this, but the el cheapo videocards I have all lose random amounts of their video RAM over a reset, probably due to the DRAM refresh being absent for too long... > For someone with device driver experience, I expect it would be a few hours > of effort to make it possible to use a second video card (or even the > primary one for that matter) as a DMA region in which KTR logs can be > saved, so that there is a way to debug even these spontaneous reboots > you're having. Maybe I'll eventually get to implementing this myself, but > to be honest, I don't have a driving need for it right now, whereas you > do. =) Do you need DMA ?? a simple ptr to the mem should do (and much easier to get to work)... > You're experiencing a stability problem that none of us (SMPng people) can > reproduce. We'd love to fix the problem, but without more information, > your reports are only slightly more useful than the typical newbie "it's > broken" reports, though certainly more frustrating. Well, I'm not alone thats for sure, and since this has been so for months I've almost gotten to the impression that something fundamental must be wrong, however until now I've just been told to go away :) I know these problems are a bitch to find, but we need to take this at least semi professionalistic and find out whats wrong, or 5.0 will be a disater when it hits the streets. I dont have the time to play around with SMP for the time being, but I do expect the SMPng group to take these problems seriously instead of the "it works here" attitude thats been hollering down the halls lately.. Again I'll offer to run any and all code or patches to -current you guys can come up with, but I simply dont have the time to sit down and analyze into details what you have been doing... -Søren To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)
On Wed, Jan 17, 2001 at 07:42:26PM +0100, Soren Schmidt wrote: > > Basically if you're expecting me or the SMP team to figure out > > what's going on without more info, you're pretty much out of luck. > > See above, not really possible, we have been trying to find some > (affordable) HW that could be used to preserve a log over a boot, > but so far I havn't been able to find anything that works, and > is fast enough to not effect the system too much... I did some research on this and am convinced that at least some video cards would work as memory buffers for KTR logs. Specifically, someone mentioned to me yesterday that their Matrox Millennium II flashes the X desktop during startup from a previous invocation across warm boots. (I pursued some alternatives and found the PCI RAM cards to be prohibitively expensive (more than $700), and sound cards to not have enough RAM except on old SoundBlaster AWE cards.) For someone with device driver experience, I expect it would be a few hours of effort to make it possible to use a second video card (or even the primary one for that matter) as a DMA region in which KTR logs can be saved, so that there is a way to debug even these spontaneous reboots you're having. Maybe I'll eventually get to implementing this myself, but to be honest, I don't have a driving need for it right now, whereas you do. =) You're experiencing a stability problem that none of us (SMPng people) can reproduce. We'd love to fix the problem, but without more information, your reports are only slightly more useful than the typical newbie "it's broken" reports, though certainly more frustrating. Jason To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: smp instability
Patrick Hartling <[EMAIL PROTECTED]> wrote: } John Baldwin <[EMAIL PROTECTED]> wrote: } } } } } On 25-Oct-00 Chuck Robey wrote: } } > I'm having rather extreme problems with stability on my dual PIII } } > setup. I know this is to be expected, but it's gotten so extreme on my } } > system, I can't spend more than a few minutes before it locks up. } } > } } > Is there any chance that I could make things better by using a sysctl to } } > tell the box it's now a single-cpu system? I can't read man pages at the } } > moment (I'm composing this on my Sparc Ultra-5) so if this might work, an *** d } } > someone knows the exact command to use, I'd appreciate a bit of help. } } } } You can use kernel.old to compile a UP kernel. I always keep a UP kernel } } around just in case. Also, when did your SMP box become unstable? There } } was a known problem with SMP boxes when the vm page zero'ing during the idl *** e } } loop was first turned on that has since been fixed with the latest commit t *** o } } vm_machdep.c yesterday. Symptoms were frequent kernel panic 12's with } } interrupts disabled . } } I am having the same lockup problems as Chuck with SMP kernels built since } October 21. The system completely locks up after a short period of time. } If I'm running X, it does it within 10-15 minutes, but if I don't run X } and just leave it at the console, it can go for a few hours. It does } eventually lock up, though. I haven't tried building a UP kernel, but I } will try the latest vm_machdep.c changes. If that doesn't work, I'll go } the UP route since I'm tired of being unable to list my processes. :\ To follow up on this, I rebuilt everything using sources from approximately 11:00 am CDT yesterday (10/26), and everything is great again. Hooray! -Patrick Patrick L. Hartling | Research Assistant, VRAC [EMAIL PROTECTED] | 2624 Howe Hall -- (515)294-4916 http://www.137.org/patrick/ | http://www.vrac.iastate.edu/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: smp instability
John Baldwin <[EMAIL PROTECTED]> wrote: } } On 25-Oct-00 Chuck Robey wrote: } > I'm having rather extreme problems with stability on my dual PIII } > setup. I know this is to be expected, but it's gotten so extreme on my } > system, I can't spend more than a few minutes before it locks up. } > } > Is there any chance that I could make things better by using a sysctl to } > tell the box it's now a single-cpu system? I can't read man pages at the } > moment (I'm composing this on my Sparc Ultra-5) so if this might work, and } > someone knows the exact command to use, I'd appreciate a bit of help. } } You can use kernel.old to compile a UP kernel. I always keep a UP kernel } around just in case. Also, when did your SMP box become unstable? There } was a known problem with SMP boxes when the vm page zero'ing during the idle } loop was first turned on that has since been fixed with the latest commit to } vm_machdep.c yesterday. Symptoms were frequent kernel panic 12's with } interrupts disabled . I am having the same lockup problems as Chuck with SMP kernels built since October 21. The system completely locks up after a short period of time. If I'm running X, it does it within 10-15 minutes, but if I don't run X and just leave it at the console, it can go for a few hours. It does eventually lock up, though. I haven't tried building a UP kernel, but I will try the latest vm_machdep.c changes. If that doesn't work, I'll go the UP route since I'm tired of being unable to list my processes. :\ My working world+kernel was built October 4. Normally, I update my -current system more frequently than that, but this has been an abnormally busy month. Because of that, I can't narrow down exactly when the instability began. Right now, I'm running with a world built October 23 and the October 4 kernel which is rather unpleasant. -Patrick Patrick L. Hartling | Research Assistant, VRAC [EMAIL PROTECTED] | 2624 Howe Hall -- (515)294-4916 http://www.137.org/patrick/ | http://www.vrac.iastate.edu/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: smp instability
On Tue, 24 Oct 2000, Mike Meyer wrote: > Chuck Robey writes: > > I'm having rather extreme problems with stability on my dual PIII > > setup. I know this is to be expected, but it's gotten so extreme on my > > system, I can't spend more than a few minutes before it locks up. > > > > Is there any chance that I could make things better by using a sysctl to > > tell the box it's now a single-cpu system? I can't read man pages at the > > moment (I'm composing this on my Sparc Ultra-5) so if this might work, and > > someone knows the exact command to use, I'd appreciate a bit of help. > > Try "sysctl -w machdep.smp_active=0". It's not clear how much good > this will do since you'll still be running an SMP kernel. Please > let us know how that works. With less than a full hour's history, I haven't exactly heavily tested it, but it only lasted 10 minutes last time, and my system is still kicking currently. Regarding that control-C needed on booting thing: when I log in, my call to fortune needs to be interrupted also, so I immediately went and tried a "ktrace fortune". I didn't need to kdump, because doing that ktrace seems to have somehow cleared the control-C thing on all that kicked it off before (not just fortune alone). My system is really repeatable on that, so if it's not yet fixed, and you have other things to try on it, I'd be willing (if my system stays up!) In the meantime, I think that "sysctl -w machdep.smp_active=0" might actually work for me (I did it in single user so the multiuser startup would be cleaner). To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
RE: smp instability
On Tue, 24 Oct 2000, John Baldwin wrote: > > On 25-Oct-00 Chuck Robey wrote: > > I'm having rather extreme problems with stability on my dual PIII > > setup. I know this is to be expected, but it's gotten so extreme on my > > system, I can't spend more than a few minutes before it locks up. > > > > Is there any chance that I could make things better by using a sysctl to > > tell the box it's now a single-cpu system? I can't read man pages at the > > moment (I'm composing this on my Sparc Ultra-5) so if this might work, and > > someone knows the exact command to use, I'd appreciate a bit of help. > > You can use kernel.old to compile a UP kernel. I always keep a UP kernel > around just in case. Also, when did your SMP box become unstable? There > was a known problem with SMP boxes when the vm page zero'ing during the idle > loop was first turned on that has since been fixed with the latest commit to > vm_machdep.c yesterday. Symptoms were frequent kernel panic 12's with > interrupts disabled . No kernel panics, just lockups. I saw the startup problems (having to hit a lot of control-C's to get booted) and I had two kinds of lockup problems, one a complete machine freeze (still pings, but that's all) and also a strange one where an entire mounted filesystem would disappear. I can back up to my kernel.gd I keep around, but I have to get me an older mountd, netstat, ps (and others) before that older kernel is good, and it was from before the /boot/kernel thing (I hated that idea, and still do). I'm going to try the sysctl route first, see if that works. I won't be able to report reliable results until the morning (if it lasts all night, it's a huge fix). As it stands now, no way can I do any compiling. > > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
RE: smp instability
On 25-Oct-00 Chuck Robey wrote: > I'm having rather extreme problems with stability on my dual PIII > setup. I know this is to be expected, but it's gotten so extreme on my > system, I can't spend more than a few minutes before it locks up. > > Is there any chance that I could make things better by using a sysctl to > tell the box it's now a single-cpu system? I can't read man pages at the > moment (I'm composing this on my Sparc Ultra-5) so if this might work, and > someone knows the exact command to use, I'd appreciate a bit of help. You can use kernel.old to compile a UP kernel. I always keep a UP kernel around just in case. Also, when did your SMP box become unstable? There was a known problem with SMP boxes when the vm page zero'ing during the idle loop was first turned on that has since been fixed with the latest commit to vm_machdep.c yesterday. Symptoms were frequent kernel panic 12's with interrupts disabled . -- John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: smp instability
Chuck Robey writes: > I'm having rather extreme problems with stability on my dual PIII > setup. I know this is to be expected, but it's gotten so extreme on my > system, I can't spend more than a few minutes before it locks up. > > Is there any chance that I could make things better by using a sysctl to > tell the box it's now a single-cpu system? I can't read man pages at the > moment (I'm composing this on my Sparc Ultra-5) so if this might work, and > someone knows the exact command to use, I'd appreciate a bit of help. Try "sysctl -w machdep.smp_active=0". It's not clear how much good this will do since you'll still be running an SMP kernel. Please let us know how that works.
smp instability
I'm having rather extreme problems with stability on my dual PIII setup. I know this is to be expected, but it's gotten so extreme on my system, I can't spend more than a few minutes before it locks up. Is there any chance that I could make things better by using a sysctl to tell the box it's now a single-cpu system? I can't read man pages at the moment (I'm composing this on my Sparc Ultra-5) so if this might work, and someone knows the exact command to use, I'd appreciate a bit of help. Otherwise, I'm going to have to go to a lot of trouble to move back to a pre-SMPNG system, and I sure don't want to do that. Thanks Chuck (who doesn't even have his .sig now!) To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message