SMP system hangs on current, not stable
I have a system using a fairly new Supermicro MB, with 2 P3-1GHZ, and 512mb ram. Running stable works fine at least a day or so with LOTS of activity. Running current it hangs (with no output of any kind, and apparently all interrupts disabled) so DDB does me no good... This requires a fair amount of activity (usually will hang in make -j3 world with 2 copies of setiathome -nice 19) Time to hang varies from a half-hour to a couple of days; hardly ever longer. Maybe I need an NMI button (or does that work?) This does not appear to be the procfs thing that Matt has commented on (it still occurs after his patch, and occurs without the use of top or any other procfs reader that I know of). Dmesg on both current and stable follows, in case it is useful: -- Current (with verbose): --- Copyright (c) 1992-2001 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.0-CURRENT #0: Mon Dec 31 10:47:25 PST 2001 [EMAIL PROTECTED]:/usr/src/sys/i386/compile/GOONEY Preloaded elf kernel /boot/kernel/kernel at 0xc040b000. Preloaded elf module /boot/kernel/acpi.ko at 0xc040b0a8. Calibrating clock(s) ... TSC clock: 999455711 Hz, i8254 clock: 1193107 Hz CLK_USE_I8254_CALIBRATION not specified - using default frequency Timecounter i8254 frequency 1193182 Hz CLK_USE_TSC_CALIBRATION not specified - using old calibration method CPU: Pentium III/Pentium III Xeon/Celeron (999.52-MHz 686-class CPU) Origin = GenuineIntel Id = 0x68a Stepping = 10 Features=0x383fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE real memory = 268369920 (262080K bytes) Physical memory chunk(s): 0x1000 - 0x0009efff, 647168 bytes (158 pages) 0x00435000 - 0x0ffe7fff, 263925760 bytes (64435 pages) avail memory = 256704512 (250688K bytes) Programming 24 pins in IOAPIC #0 IOAPIC #0 intpin 2 - irq 0 SMP: CPU0 apic_initialize(): lint0: 0x0700 lint1: 0x00010400 TPR: 0x0010 SVR: 0x01ff FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): apic id: 0, version: 0x00040011, at 0xfee0 cpu1 (AP): apic id: 1, version: 0x00040011, at 0xfee0 io0 (APIC): apic id: 2, version: 0x00178011, at 0xfec0 bios32: Found BIOS32 Service Directory header at 0xc00faf10 bios32: Entry = 0xfb380 (c00fb380) Rev = 0 Len = 1 pcibios: PCI BIOS entry at 0xf+0xb3b0 pnpbios: Found PnP BIOS data at 0xc00fbe00 pnpbios: Entry = f:be30 Rev = 1.0 Other BIOS signatures found: null: null device, zero device random: entropy source mem: memory I/O Pentium Pro MTRR support enabled SMP: CPU0 bsp_apic_configure(): lint0: 0x00010700 lint1: 0x0400 TPR: 0x0010 SVR: 0x01ff pci_open(1):mode 1 addr port (0x0cf8) is 0x8060 pci_open(1a): mode1res=0x8000 (0x8000) pci_cfgcheck: device 0 [class=06] [hdr=00] is there (id=30911106) Using $PIR table, 8 entries at 0xc00fdc20 npx0: math processor on motherboard npx0: INT 16 interface acpi0: VIA694 AWRDACPI on motherboard acpi0: power button is handled as a fixed feature programming model. Timecounter ACPI frequency 3579545 Hz acpi_timer0: 24-bit timer at 3.579545MHz port 0x4008-0x400b on acpi0 acpi_cpu0: CPU on acpi0 acpi_cpu1: CPU on acpi0 acpi_tz0: thermal zone on acpi0 acpi_button0: Power Button on acpi0 acpi_pcib0: Host-PCI bridge port 0x5000-0x500f,0x4080-0x40ff,0x4000-0x407f,0xcf8-0xcff on acpi0 pci0: physical bus=0 map[10]: type 3, range 32, base f000, size 26, enabled found- vendor=0x1106, dev=0x3091, revid=0x01 bus=0, slot=0, func=0 class=06-00-00, hdrtype=0x00, mfdev=0 powerspec 2 supports D0 D3 current D0 found- vendor=0x1106, dev=0xb091, revid=0x00 bus=0, slot=1, func=0 class=06-04-00, hdrtype=0x01, mfdev=0 IOAPIC #0 intpin 11 - irq 2 Freeing (NOT implemented) redirected PCI irq 11. map[10]: type 4, range 32, base c000, size 3, enabled map[14]: type 4, range 32, base c400, size 2, enabled map[18]: type 4, range 32, base c800, size 3, enabled map[1c]: type 4, range 32, base cc00, size 2, enabled map[20]: type 4, range 32, base d000, size 6, enabled map[24]: type 1, range 32, base f910, size 17, enabled found- vendor=0x105a, dev=0x4d30, revid=0x02 bus=0, slot=12, func=0 class=01-04-00, hdrtype=0x00, mfdev=0 intpin=a, irq=2 powerspec 1 supports D0 D3 current D0 map[10]: type 1, range 32, base f912, size 12, enabled map[14]: type 4, range 32, base d400, size 6, enabled map[18]: type 1, range 32, base f900, size 20, enabled found- vendor=0x8086, dev=0x1229, revid=0x08 bus=0, slot=13, func=0 class=02-00-00, hdrtype=0x00, mfdev=0 intpin=a, irq=5 powerspec 2 supports D0 D1 D2 D3 current D0 found-
Re: SMP system hangs on current, not stable
Pete Carah [EMAIL PROTECTED] wrote: Maybe I need an NMI button (or does that work?) You can generate NMIs by shortening the first two pins of an ISA slot with a screwdriver (the two pins close to the back where the ISA slot covers are). This can also be done with PCI slots, if that board doesn't have an ISA slot anymore, but I don't know which pins (it's _not_ the first two pins), and it's a lot more difficult because the PCI pins are much smaller. Disclaimer: Don't sue me if you toast your board. :-) Do it at your own risk. Read the docs first. Check the pin assignment. Make your last will and testament first, etc. Regards Oliver -- Oliver Fromme, secnetix GmbH Co KG, Oettingenstr. 2, 80538 München Any opinions expressed in this message may be personal to the author and may not necessarily reflect the opinions of secnetix in any way. All that we see or seem is just a dream within a dream (E. A. Poe) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: SMP system hangs on current, not stable
:I have a system using a fairly new Supermicro MB, with 2 P3-1GHZ, and 512mb :ram. Running stable works fine at least a day or so with LOTS of activity. :Running current it hangs (with no output of any kind, and apparently all :interrupts disabled) so DDB does me no good... This requires a fair amount :of activity (usually will hang in make -j3 world with 2 copies of :setiathome -nice 19) Time to hang varies from a half-hour to a couple :of days; hardly ever longer. : :Maybe I need an NMI button (or does that work?) This could be a priority inversion issue. Try running setiathome at nice -10 (or not running it at all), and see if you can still crash the box. :acpi0: VIA694 AWRDACPI on motherboard :acpi0: power button is handled as a fixed feature programming model. :Timecounter ACPI frequency 3579545 Hz :acpi_timer0: 24-bit timer at 3.579545MHz port 0x4008-0x400b on acpi0 :acpi_cpu0: CPU on acpi0 :acpi_cpu1: CPU on acpi0 :acpi_tz0: thermal zone on acpi0 :acpi_button0: Power Button on acpi0 :acpi_pcib0: Host-PCI bridge port :0x5000-0x500f,0x4080-0x40ff,0x4000-0x407f,0xcf8-0xcff on acpi0 Try turning off ACPI. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
RE: System hangs with -current ...
On Thu, 1 Mar 2001, John Baldwin wrote: On 01-Mar-01 The Hermit Hacker wrote: any comments on this? any way of doing this without a serial console? thanks ... The data is too much to make a normal console feasible, although you could try cranking up the console to hte highest res (80x60 or 132x60, etc.) you can and let it freeze and then write down those 60 lines adn maybe that will be enough to figure it out. However, if its looping this won't work. :( I've no idea atm why the serial console isn't working for you. Inability to actually find a NULL modem cable, actually :( Checked two local shops, and neither of them carry one ... just hijacked one from work for the weekend, so will hit this tonight and report anything I can come up with ... On Wed, 28 Feb 2001, The Hermit Hacker wrote: Yup, definitely doesn't like me using the console ... just tried it again, and its as if it can't scroll up the screen to send more data or something? I just rebooted, and then ssh'd in from remote ... type'd the two sysctl commands, and got: cpu1 ../../i386/i386/trap.c.181 GOT (spin) sched lock [0xc0320f20] r=0 at ../../i386/i386/trap.c:181 cpcsocp/../i386/i386/trap.c.217 REL (spin) sched l on my screen ... type'd exactly as seen ... and that's it ... console is now locked again ... On Tue, 27 Feb 2001, The Hermit Hacker wrote: Okay, can't seem to find a 9pin-9pin NULL modem cable in this 'pit of the earth' town, so figured I'd do the sysctl commands on my console and use an ssh connection into the machine to run the 'hanging sequence' ... the console flashed a bunch of 'debugging info' and then hung solid ... I could still login remotely and whatnot, type commands, just nothing was happening on the console, couldn't change vty's, nothing ... is it supposed to do that? *raised eyebrow* On Thu, 22 Feb 2001, John Baldwin wrote: On 23-Feb-01 The Hermit Hacker wrote: On Thu, 22 Feb 2001, John Baldwin wrote: On 22-Feb-01 The Hermit Hacker wrote: Okay, I have to pick up a NULL modem cable tomorrow and dive into this ... finally ... The various KTR_ that you mention below, these are kernel settings that I compile into the kernel? Yes. You want this: options KTR options KTR_EXTEND options KTR_COMPILE=0x1208 okay, just so that I understand ... I compile my kernel with these options, and then run the two sysctl commands you list below? the KTR_COMPILE arg looks similar to the ktr_mask one below, which is why I'm confirming ... Yes. KTR_COMPILE controls what KTR tracepoints are actually compiled into the kernel. The ktr_mask sysctl controls a runtime mask that lets you choose which of the compiled in masks you want to enable. I have manpages for this stuff, but they are waiting for doc guys to review them. The mtx_quiet.patch is old and won't apply to current now I'm afraid. On Tue, 2 Jan 2001, John Baldwin wrote: On 02-Jan-01 The Hermit Hacker wrote: Over the past several months, as others have reported, I've been getting system hangs using 5.0-CURRENT w/ SMP ... I've got DDB enabled, but ctl-alt-esc doesn't break me to the debugger ... I'm not complaining about the hangs, if I was overly concerned, I'd run -STABLE, but I'm wondering how one goes about providing debug information on them other then through DDB? Not easily. :( If you can make the problem easily repeatable, then you can try turning on KTR in your kernel (see NOTES, you will need KTR_EXTEND), setting up a serial console that you log the output of, create a shell script that runs the following commands: #!/bin/sh # Turn on KTR_INTR, KTR_PROC, and KTR_LOCK sysctl -w debug.ktr_mask=0x1208 sysctl -w debug.ktr_verbose=2 run_magic_command_that_hangs_my_machine and run the script. You probably want to run it over a tty or remote login so tthat the serial console output is just the logging (warning, it will be very verbose!). Also, you probably want to use http://www.FreeBSD.org/~jhb/patches/mtx_quiet.patch to shut up most of the irrelevant and cluttery mutex trace messages. Note that having this much logging on will probably slow the machine to a crawl as well, so you may have to just start this up and go off and do something else until it hangs. :-/ Another alternative is to rig up a NMI debouncer and use it to break into the debugger. Then you can start poking around to see who owns
RE: System hangs with -current ...
any comments on this? any way of doing this without a serial console? thanks ... On Wed, 28 Feb 2001, The Hermit Hacker wrote: Yup, definitely doesn't like me using the console ... just tried it again, and its as if it can't scroll up the screen to send more data or something? I just rebooted, and then ssh'd in from remote ... type'd the two sysctl commands, and got: cpu1 ../../i386/i386/trap.c.181 GOT (spin) sched lock [0xc0320f20] r=0 at ../../i386/i386/trap.c:181 cpcsocp/../i386/i386/trap.c.217 REL (spin) sched l on my screen ... type'd exactly as seen ... and that's it ... console is now locked again ... On Tue, 27 Feb 2001, The Hermit Hacker wrote: Okay, can't seem to find a 9pin-9pin NULL modem cable in this 'pit of the earth' town, so figured I'd do the sysctl commands on my console and use an ssh connection into the machine to run the 'hanging sequence' ... the console flashed a bunch of 'debugging info' and then hung solid ... I could still login remotely and whatnot, type commands, just nothing was happening on the console, couldn't change vty's, nothing ... is it supposed to do that? *raised eyebrow* On Thu, 22 Feb 2001, John Baldwin wrote: On 23-Feb-01 The Hermit Hacker wrote: On Thu, 22 Feb 2001, John Baldwin wrote: On 22-Feb-01 The Hermit Hacker wrote: Okay, I have to pick up a NULL modem cable tomorrow and dive into this ... finally ... The various KTR_ that you mention below, these are kernel settings that I compile into the kernel? Yes. You want this: options KTR options KTR_EXTEND options KTR_COMPILE=0x1208 okay, just so that I understand ... I compile my kernel with these options, and then run the two sysctl commands you list below? the KTR_COMPILE arg looks similar to the ktr_mask one below, which is why I'm confirming ... Yes. KTR_COMPILE controls what KTR tracepoints are actually compiled into the kernel. The ktr_mask sysctl controls a runtime mask that lets you choose which of the compiled in masks you want to enable. I have manpages for this stuff, but they are waiting for doc guys to review them. The mtx_quiet.patch is old and won't apply to current now I'm afraid. On Tue, 2 Jan 2001, John Baldwin wrote: On 02-Jan-01 The Hermit Hacker wrote: Over the past several months, as others have reported, I've been getting system hangs using 5.0-CURRENT w/ SMP ... I've got DDB enabled, but ctl-alt-esc doesn't break me to the debugger ... I'm not complaining about the hangs, if I was overly concerned, I'd run -STABLE, but I'm wondering how one goes about providing debug information on them other then through DDB? Not easily. :( If you can make the problem easily repeatable, then you can try turning on KTR in your kernel (see NOTES, you will need KTR_EXTEND), setting up a serial console that you log the output of, create a shell script that runs the following commands: #!/bin/sh # Turn on KTR_INTR, KTR_PROC, and KTR_LOCK sysctl -w debug.ktr_mask=0x1208 sysctl -w debug.ktr_verbose=2 run_magic_command_that_hangs_my_machine and run the script. You probably want to run it over a tty or remote login so tthat the serial console output is just the logging (warning, it will be very verbose!). Also, you probably want to use http://www.FreeBSD.org/~jhb/patches/mtx_quiet.patch to shut up most of the irrelevant and cluttery mutex trace messages. Note that having this much logging on will probably slow the machine to a crawl as well, so you may have to just start this up and go off and do something else until it hangs. :-/ Another alternative is to rig up a NMI debouncer and use it to break into the debugger. Then you can start poking around to see who owns sched_lock, etc. Thanks ... -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message Marc G.
RE: System hangs with -current ...
On 01-Mar-01 The Hermit Hacker wrote: any comments on this? any way of doing this without a serial console? thanks ... The data is too much to make a normal console feasible, although you could try cranking up the console to hte highest res (80x60 or 132x60, etc.) you can and let it freeze and then write down those 60 lines adn maybe that will be enough to figure it out. However, if its looping this won't work. :( I've no idea atm why the serial console isn't working for you. On Wed, 28 Feb 2001, The Hermit Hacker wrote: Yup, definitely doesn't like me using the console ... just tried it again, and its as if it can't scroll up the screen to send more data or something? I just rebooted, and then ssh'd in from remote ... type'd the two sysctl commands, and got: cpu1 ../../i386/i386/trap.c.181 GOT (spin) sched lock [0xc0320f20] r=0 at ../../i386/i386/trap.c:181 cpcsocp/../i386/i386/trap.c.217 REL (spin) sched l on my screen ... type'd exactly as seen ... and that's it ... console is now locked again ... On Tue, 27 Feb 2001, The Hermit Hacker wrote: Okay, can't seem to find a 9pin-9pin NULL modem cable in this 'pit of the earth' town, so figured I'd do the sysctl commands on my console and use an ssh connection into the machine to run the 'hanging sequence' ... the console flashed a bunch of 'debugging info' and then hung solid ... I could still login remotely and whatnot, type commands, just nothing was happening on the console, couldn't change vty's, nothing ... is it supposed to do that? *raised eyebrow* On Thu, 22 Feb 2001, John Baldwin wrote: On 23-Feb-01 The Hermit Hacker wrote: On Thu, 22 Feb 2001, John Baldwin wrote: On 22-Feb-01 The Hermit Hacker wrote: Okay, I have to pick up a NULL modem cable tomorrow and dive into this ... finally ... The various KTR_ that you mention below, these are kernel settings that I compile into the kernel? Yes. You want this: options KTR options KTR_EXTEND options KTR_COMPILE=0x1208 okay, just so that I understand ... I compile my kernel with these options, and then run the two sysctl commands you list below? the KTR_COMPILE arg looks similar to the ktr_mask one below, which is why I'm confirming ... Yes. KTR_COMPILE controls what KTR tracepoints are actually compiled into the kernel. The ktr_mask sysctl controls a runtime mask that lets you choose which of the compiled in masks you want to enable. I have manpages for this stuff, but they are waiting for doc guys to review them. The mtx_quiet.patch is old and won't apply to current now I'm afraid. On Tue, 2 Jan 2001, John Baldwin wrote: On 02-Jan-01 The Hermit Hacker wrote: Over the past several months, as others have reported, I've been getting system hangs using 5.0-CURRENT w/ SMP ... I've got DDB enabled, but ctl-alt-esc doesn't break me to the debugger ... I'm not complaining about the hangs, if I was overly concerned, I'd run -STABLE, but I'm wondering how one goes about providing debug information on them other then through DDB? Not easily. :( If you can make the problem easily repeatable, then you can try turning on KTR in your kernel (see NOTES, you will need KTR_EXTEND), setting up a serial console that you log the output of, create a shell script that runs the following commands: #!/bin/sh # Turn on KTR_INTR, KTR_PROC, and KTR_LOCK sysctl -w debug.ktr_mask=0x1208 sysctl -w debug.ktr_verbose=2 run_magic_command_that_hangs_my_machine and run the script. You probably want to run it over a tty or remote login so tthat the serial console output is just the logging (warning, it will be very verbose!). Also, you probably want to use http://www.FreeBSD.org/~jhb/patches/mtx_quiet.patch to shut up most of the irrelevant and cluttery mutex trace messages. Note that having this much logging on will probably slow the machine to a crawl as well, so you may have to just start this up and go off and do something else until it hangs. :-/ Another alternative is to rig up a NMI debouncer and use it to break into the debugger. Then you can start poking around to see who owns sched_lock, etc. Thanks ... -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED]
RE: System hangs with -current ...
Okay, can't seem to find a 9pin-9pin NULL modem cable in this 'pit of the earth' town, so figured I'd do the sysctl commands on my console and use an ssh connection into the machine to run the 'hanging sequence' ... the console flashed a bunch of 'debugging info' and then hung solid ... I could still login remotely and whatnot, type commands, just nothing was happening on the console, couldn't change vty's, nothing ... is it supposed to do that? *raised eyebrow* On Thu, 22 Feb 2001, John Baldwin wrote: On 23-Feb-01 The Hermit Hacker wrote: On Thu, 22 Feb 2001, John Baldwin wrote: On 22-Feb-01 The Hermit Hacker wrote: Okay, I have to pick up a NULL modem cable tomorrow and dive into this ... finally ... The various KTR_ that you mention below, these are kernel settings that I compile into the kernel? Yes. You want this: options KTR options KTR_EXTEND options KTR_COMPILE=0x1208 okay, just so that I understand ... I compile my kernel with these options, and then run the two sysctl commands you list below? the KTR_COMPILE arg looks similar to the ktr_mask one below, which is why I'm confirming ... Yes. KTR_COMPILE controls what KTR tracepoints are actually compiled into the kernel. The ktr_mask sysctl controls a runtime mask that lets you choose which of the compiled in masks you want to enable. I have manpages for this stuff, but they are waiting for doc guys to review them. The mtx_quiet.patch is old and won't apply to current now I'm afraid. On Tue, 2 Jan 2001, John Baldwin wrote: On 02-Jan-01 The Hermit Hacker wrote: Over the past several months, as others have reported, I've been getting system hangs using 5.0-CURRENT w/ SMP ... I've got DDB enabled, but ctl-alt-esc doesn't break me to the debugger ... I'm not complaining about the hangs, if I was overly concerned, I'd run -STABLE, but I'm wondering how one goes about providing debug information on them other then through DDB? Not easily. :( If you can make the problem easily repeatable, then you can try turning on KTR in your kernel (see NOTES, you will need KTR_EXTEND), setting up a serial console that you log the output of, create a shell script that runs the following commands: #!/bin/sh # Turn on KTR_INTR, KTR_PROC, and KTR_LOCK sysctl -w debug.ktr_mask=0x1208 sysctl -w debug.ktr_verbose=2 run_magic_command_that_hangs_my_machine and run the script. You probably want to run it over a tty or remote login so tthat the serial console output is just the logging (warning, it will be very verbose!). Also, you probably want to use http://www.FreeBSD.org/~jhb/patches/mtx_quiet.patch to shut up most of the irrelevant and cluttery mutex trace messages. Note that having this much logging on will probably slow the machine to a crawl as well, so you may have to just start this up and go off and do something else until it hangs. :-/ Another alternative is to rig up a NMI debouncer and use it to break into the debugger. Then you can start poking around to see who owns sched_lock, etc. Thanks ... -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
RE: System hangs with -current ...
Yup, definitely doesn't like me using the console ... just tried it again, and its as if it can't scroll up the screen to send more data or something? I just rebooted, and then ssh'd in from remote ... type'd the two sysctl commands, and got: cpu1 ../../i386/i386/trap.c.181 GOT (spin) sched lock [0xc0320f20] r=0 at ../../i386/i386/trap.c:181 cpcsocp/../i386/i386/trap.c.217 REL (spin) sched l on my screen ... type'd exactly as seen ... and that's it ... console is now locked again ... On Tue, 27 Feb 2001, The Hermit Hacker wrote: Okay, can't seem to find a 9pin-9pin NULL modem cable in this 'pit of the earth' town, so figured I'd do the sysctl commands on my console and use an ssh connection into the machine to run the 'hanging sequence' ... the console flashed a bunch of 'debugging info' and then hung solid ... I could still login remotely and whatnot, type commands, just nothing was happening on the console, couldn't change vty's, nothing ... is it supposed to do that? *raised eyebrow* On Thu, 22 Feb 2001, John Baldwin wrote: On 23-Feb-01 The Hermit Hacker wrote: On Thu, 22 Feb 2001, John Baldwin wrote: On 22-Feb-01 The Hermit Hacker wrote: Okay, I have to pick up a NULL modem cable tomorrow and dive into this ... finally ... The various KTR_ that you mention below, these are kernel settings that I compile into the kernel? Yes. You want this: options KTR options KTR_EXTEND options KTR_COMPILE=0x1208 okay, just so that I understand ... I compile my kernel with these options, and then run the two sysctl commands you list below? the KTR_COMPILE arg looks similar to the ktr_mask one below, which is why I'm confirming ... Yes. KTR_COMPILE controls what KTR tracepoints are actually compiled into the kernel. The ktr_mask sysctl controls a runtime mask that lets you choose which of the compiled in masks you want to enable. I have manpages for this stuff, but they are waiting for doc guys to review them. The mtx_quiet.patch is old and won't apply to current now I'm afraid. On Tue, 2 Jan 2001, John Baldwin wrote: On 02-Jan-01 The Hermit Hacker wrote: Over the past several months, as others have reported, I've been getting system hangs using 5.0-CURRENT w/ SMP ... I've got DDB enabled, but ctl-alt-esc doesn't break me to the debugger ... I'm not complaining about the hangs, if I was overly concerned, I'd run -STABLE, but I'm wondering how one goes about providing debug information on them other then through DDB? Not easily. :( If you can make the problem easily repeatable, then you can try turning on KTR in your kernel (see NOTES, you will need KTR_EXTEND), setting up a serial console that you log the output of, create a shell script that runs the following commands: #!/bin/sh # Turn on KTR_INTR, KTR_PROC, and KTR_LOCK sysctl -w debug.ktr_mask=0x1208 sysctl -w debug.ktr_verbose=2 run_magic_command_that_hangs_my_machine and run the script. You probably want to run it over a tty or remote login so tthat the serial console output is just the logging (warning, it will be very verbose!). Also, you probably want to use http://www.FreeBSD.org/~jhb/patches/mtx_quiet.patch to shut up most of the irrelevant and cluttery mutex trace messages. Note that having this much logging on will probably slow the machine to a crawl as well, so you may have to just start this up and go off and do something else until it hangs. :-/ Another alternative is to rig up a NMI debouncer and use it to break into the debugger. Then you can start poking around to see who owns sched_lock, etc. Thanks ... -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
RE: System hangs with -current ...
Okay, I have to pick up a NULL modem cable tomorrow and dive into this ... finally ... The various KTR_ that you mention below, these are kernel settings that I compile into the kernel? On Tue, 2 Jan 2001, John Baldwin wrote: On 02-Jan-01 The Hermit Hacker wrote: Over the past several months, as others have reported, I've been getting system hangs using 5.0-CURRENT w/ SMP ... I've got DDB enabled, but ctl-alt-esc doesn't break me to the debugger ... I'm not complaining about the hangs, if I was overly concerned, I'd run -STABLE, but I'm wondering how one goes about providing debug information on them other then through DDB? Not easily. :( If you can make the problem easily repeatable, then you can try turning on KTR in your kernel (see NOTES, you will need KTR_EXTEND), setting up a serial console that you log the output of, create a shell script that runs the following commands: #!/bin/sh # Turn on KTR_INTR, KTR_PROC, and KTR_LOCK sysctl -w debug.ktr_mask=0x1208 sysctl -w debug.ktr_verbose=2 run_magic_command_that_hangs_my_machine and run the script. You probably want to run it over a tty or remote login so tthat the serial console output is just the logging (warning, it will be very verbose!). Also, you probably want to use http://www.FreeBSD.org/~jhb/patches/mtx_quiet.patch to shut up most of the irrelevant and cluttery mutex trace messages. Note that having this much logging on will probably slow the machine to a crawl as well, so you may have to just start this up and go off and do something else until it hangs. :-/ Another alternative is to rig up a NMI debouncer and use it to break into the debugger. Then you can start poking around to see who owns sched_lock, etc. Thanks ... -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
RE: System hangs with -current ...
On 22-Feb-01 The Hermit Hacker wrote: Okay, I have to pick up a NULL modem cable tomorrow and dive into this ... finally ... The various KTR_ that you mention below, these are kernel settings that I compile into the kernel? Yes. You want this: options KTR options KTR_EXTEND options KTR_COMPILE=0x1208 The mtx_quiet.patch is old and won't apply to current now I'm afraid. On Tue, 2 Jan 2001, John Baldwin wrote: On 02-Jan-01 The Hermit Hacker wrote: Over the past several months, as others have reported, I've been getting system hangs using 5.0-CURRENT w/ SMP ... I've got DDB enabled, but ctl-alt-esc doesn't break me to the debugger ... I'm not complaining about the hangs, if I was overly concerned, I'd run -STABLE, but I'm wondering how one goes about providing debug information on them other then through DDB? Not easily. :( If you can make the problem easily repeatable, then you can try turning on KTR in your kernel (see NOTES, you will need KTR_EXTEND), setting up a serial console that you log the output of, create a shell script that runs the following commands: #!/bin/sh # Turn on KTR_INTR, KTR_PROC, and KTR_LOCK sysctl -w debug.ktr_mask=0x1208 sysctl -w debug.ktr_verbose=2 run_magic_command_that_hangs_my_machine and run the script. You probably want to run it over a tty or remote login so tthat the serial console output is just the logging (warning, it will be very verbose!). Also, you probably want to use http://www.FreeBSD.org/~jhb/patches/mtx_quiet.patch to shut up most of the irrelevant and cluttery mutex trace messages. Note that having this much logging on will probably slow the machine to a crawl as well, so you may have to just start this up and go off and do something else until it hangs. :-/ Another alternative is to rig up a NMI debouncer and use it to break into the debugger. Then you can start poking around to see who owns sched_lock, etc. Thanks ... -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
RE: System hangs with -current ...
On Thu, 22 Feb 2001, John Baldwin wrote: On 22-Feb-01 The Hermit Hacker wrote: Okay, I have to pick up a NULL modem cable tomorrow and dive into this ... finally ... The various KTR_ that you mention below, these are kernel settings that I compile into the kernel? Yes. You want this: options KTR options KTR_EXTEND options KTR_COMPILE=0x1208 okay, just so that I understand ... I compile my kernel with these options, and then run the two sysctl commands you list below? the KTR_COMPILE arg looks similar to the ktr_mask one below, which is why I'm confirming ... The mtx_quiet.patch is old and won't apply to current now I'm afraid. On Tue, 2 Jan 2001, John Baldwin wrote: On 02-Jan-01 The Hermit Hacker wrote: Over the past several months, as others have reported, I've been getting system hangs using 5.0-CURRENT w/ SMP ... I've got DDB enabled, but ctl-alt-esc doesn't break me to the debugger ... I'm not complaining about the hangs, if I was overly concerned, I'd run -STABLE, but I'm wondering how one goes about providing debug information on them other then through DDB? Not easily. :( If you can make the problem easily repeatable, then you can try turning on KTR in your kernel (see NOTES, you will need KTR_EXTEND), setting up a serial console that you log the output of, create a shell script that runs the following commands: #!/bin/sh # Turn on KTR_INTR, KTR_PROC, and KTR_LOCK sysctl -w debug.ktr_mask=0x1208 sysctl -w debug.ktr_verbose=2 run_magic_command_that_hangs_my_machine and run the script. You probably want to run it over a tty or remote login so tthat the serial console output is just the logging (warning, it will be very verbose!). Also, you probably want to use http://www.FreeBSD.org/~jhb/patches/mtx_quiet.patch to shut up most of the irrelevant and cluttery mutex trace messages. Note that having this much logging on will probably slow the machine to a crawl as well, so you may have to just start this up and go off and do something else until it hangs. :-/ Another alternative is to rig up a NMI debouncer and use it to break into the debugger. Then you can start poking around to see who owns sched_lock, etc. Thanks ... -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
RE: System hangs with -current ...
On 23-Feb-01 The Hermit Hacker wrote: On Thu, 22 Feb 2001, John Baldwin wrote: On 22-Feb-01 The Hermit Hacker wrote: Okay, I have to pick up a NULL modem cable tomorrow and dive into this ... finally ... The various KTR_ that you mention below, these are kernel settings that I compile into the kernel? Yes. You want this: options KTR options KTR_EXTEND options KTR_COMPILE=0x1208 okay, just so that I understand ... I compile my kernel with these options, and then run the two sysctl commands you list below? the KTR_COMPILE arg looks similar to the ktr_mask one below, which is why I'm confirming ... Yes. KTR_COMPILE controls what KTR tracepoints are actually compiled into the kernel. The ktr_mask sysctl controls a runtime mask that lets you choose which of the compiled in masks you want to enable. I have manpages for this stuff, but they are waiting for doc guys to review them. The mtx_quiet.patch is old and won't apply to current now I'm afraid. On Tue, 2 Jan 2001, John Baldwin wrote: On 02-Jan-01 The Hermit Hacker wrote: Over the past several months, as others have reported, I've been getting system hangs using 5.0-CURRENT w/ SMP ... I've got DDB enabled, but ctl-alt-esc doesn't break me to the debugger ... I'm not complaining about the hangs, if I was overly concerned, I'd run -STABLE, but I'm wondering how one goes about providing debug information on them other then through DDB? Not easily. :( If you can make the problem easily repeatable, then you can try turning on KTR in your kernel (see NOTES, you will need KTR_EXTEND), setting up a serial console that you log the output of, create a shell script that runs the following commands: #!/bin/sh # Turn on KTR_INTR, KTR_PROC, and KTR_LOCK sysctl -w debug.ktr_mask=0x1208 sysctl -w debug.ktr_verbose=2 run_magic_command_that_hangs_my_machine and run the script. You probably want to run it over a tty or remote login so tthat the serial console output is just the logging (warning, it will be very verbose!). Also, you probably want to use http://www.FreeBSD.org/~jhb/patches/mtx_quiet.patch to shut up most of the irrelevant and cluttery mutex trace messages. Note that having this much logging on will probably slow the machine to a crawl as well, so you may have to just start this up and go off and do something else until it hangs. :-/ Another alternative is to rig up a NMI debouncer and use it to break into the debugger. Then you can start poking around to see who owns sched_lock, etc. Thanks ... -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
System hangs with -current ...
Over the past several months, as others have reported, I've been getting system hangs using 5.0-CURRENT w/ SMP ... I've got DDB enabled, but ctl-alt-esc doesn't break me to the debugger ... I'm not complaining about the hangs, if I was overly concerned, I'd run -STABLE, but I'm wondering how one goes about providing debug information on them other then through DDB? Thanks ... Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
RE: System hangs with -current ...
On 02-Jan-01 The Hermit Hacker wrote: Over the past several months, as others have reported, I've been getting system hangs using 5.0-CURRENT w/ SMP ... I've got DDB enabled, but ctl-alt-esc doesn't break me to the debugger ... I'm not complaining about the hangs, if I was overly concerned, I'd run -STABLE, but I'm wondering how one goes about providing debug information on them other then through DDB? Not easily. :( If you can make the problem easily repeatable, then you can try turning on KTR in your kernel (see NOTES, you will need KTR_EXTEND), setting up a serial console that you log the output of, create a shell script that runs the following commands: #!/bin/sh # Turn on KTR_INTR, KTR_PROC, and KTR_LOCK sysctl -w debug.ktr_mask=0x1208 sysctl -w debug.ktr_verbose=2 run_magic_command_that_hangs_my_machine and run the script. You probably want to run it over a tty or remote login so tthat the serial console output is just the logging (warning, it will be very verbose!). Also, you probably want to use http://www.FreeBSD.org/~jhb/patches/mtx_quiet.patch to shut up most of the irrelevant and cluttery mutex trace messages. Note that having this much logging on will probably slow the machine to a crawl as well, so you may have to just start this up and go off and do something else until it hangs. :-/ Another alternative is to rig up a NMI debouncer and use it to break into the debugger. Then you can start poking around to see who owns sched_lock, etc. Thanks ... -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message