Re: OS bug in taskq
On Sat, 15 Dec 2007, Elliot Finley wrote: in the kernel and I'm still unable to obtain a crash dump. Hopefully there is enough info in this email for a hacker to point me in the right direction to debug this. If you're unable to obtain a crash dump, you should still be able to use interactive console-based debugging with DDB. I find this is easiest to do with a serial console from an adjacent machine, so that I can copy-and-paste the results into an e-mail rather than hand-transcribe. You can also use firewire consoles to the same effect, although I've never done that. Once the system panics, it will drop into DDB. I usually kick off debugging by doing a backtrace, bt, and showing the status of the current and then all processors show pcpu, show allpcpu. Depending on the type of bug, I find output from ps, alltrace, show lockedvnods, show alllocks, show uma, show malloc quite useful. The below panic is a NULL pointer dereference in the taskqueue code, but it's likely triggered by a bug in a consumer of the task queue service, rather than the task queue code itself. That means we'll need to identify what consumer that is. That information should become visible by looking at the arguments to the stack trace in DDB. If not, we may need to work a little harder to get a dump, or set up serial or firewire kgdb to inspect the live running system with a full debugger. On the swap / dump / etc thing. In order to capture a saved kernel dump, you need sufficient room for the full dump on whatever partition /var/crash is on, and it must be writable. Because dumps are normally written to swap partitions, running fsck before the dump is captured can lead to portions of the dump being overwritten if fsck uses a lot of memory (and hence overflows into swap). As many systems have a separate /var and /var is often small, it could well be that you can successfully capture the dump by just booting to single-user, manually fscking /var, mounting /var, and running savecore in the /var/crash directory. You can also configure additional partitions as purely dump partitions, rather than swap partitions. One trick I've used previousy is to add a disk temporarily just for the purposes of dumping to, and manually doing a dumpon for a partition on that disk (but not a swapon). Robert N M Watson Computer Laboratory University of Cambridge dmesg: Copyright (c) 1992-2007 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 6.2-RELEASE-p5 #1: Mon Nov 19 11:16:44 MST 2007 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/DDB-SMP Timecounter i8254 frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2793.20-MHz 686-class CPU) Origin = GenuineIntel Id = 0xf4a Stepping = 10 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0x641dSSE3,RSVD2,MON,DS_CPL,CNTX-ID,CX16,b14 AMD Features=0x2010NX,LM AMD Features2=0x1LAHF Logical CPUs per core: 2 real memory = 3220963328 (3071 MB) avail memory = 3150856192 (3004 MB) ACPI APIC Table: DELL PE BKC FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 6 cpu3 (AP): APIC ID: 7 ioapic0: Changing APIC ID to 8 ioapic1: Changing APIC ID to 9 ioapic1: WARNING: intbase 32 != expected base 24 ioapic2: Changing APIC ID to 10 ioapic2: WARNING: intbase 64 != expected base 56 ioapic0 Version 2.0 irqs 0-23 on motherboard ioapic1 Version 2.0 irqs 32-55 on motherboard ioapic2 Version 2.0 irqs 64-87 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.17.2 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) acpi0: DELL PE BKC on motherboard acpi0: Power Button (fixed) Timecounter ACPI-fast frequency 3579545 Hz quality 1000 acpi_timer0: 24-bit timer at 3.579545MHz port 0x808-0x80b on acpi0 cpu0: ACPI CPU on acpi0 cpu1: ACPI CPU on acpi0 cpu2: ACPI CPU on acpi0 cpu3: ACPI CPU on acpi0 pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0 pci0: ACPI PCI bus on pcib0 pcib1: ACPI PCI-PCI bridge at device 2.0 on pci0 pci1: ACPI PCI bus on pcib1 pcib2: ACPI PCI-PCI bridge at device 0.0 on pci1 pci2: ACPI PCI bus on pcib2 amr0: LSILogic MegaRAID 1.53 mem 0xd80f-0xd80f,0xdfdc-0xdfdf irq 46 at device 14.0 on pci2 amr0: delete logical drives supported by controller amr0: LSILogic PERC 4e/Di Firmware 522A, BIOS H430, 256MB RAM pcib3: ACPI PCI-PCI bridge at device 0.2 on pci1 pci3: ACPI PCI bus on pcib3 pcib4: ACPI PCI-PCI bridge at device 4.0 on pci0 pci4: ACPI PCI bus on pcib4 pcib5: ACPI PCI-PCI bridge at device 5.0 on pci0 pci5: ACPI PCI bus on pcib5 pcib6: ACPI PCI-PCI bridge at device 0.0 on pci5 pci6: ACPI PCI bus on pcib6 em0: Intel(R) PRO/1000 Network Connection Version - 6.2.9 port
Re: OS bug in taskq
On Sat, Dec 15, 2007 at 03:58:10PM -0800, Jeremy Chadwick wrote: On Sat, Dec 15, 2007 at 01:03:14PM -0700, Elliot Finley wrote: I have: dumpdev=AUTO in /etc/rc.conf and: ... in the kernel and I'm still unable to obtain a crash dump. Hopefully there is enough info in this email for a hacker to point me in the right direction to debug this. I can't help with the panic itself, but the reason for the inability to obtain a crash dump is mentioned in a thread I started in November: http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038069.html The explanation of the problem was documented best by Doug Barton in this thread (over at freebsd-rc@): http://lists.freebsd.org/pipermail/freebsd-rc/2007-November/001263.html Open PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=118255 Why does it work *sometimes* then? Or was this particular problem introduced more recently than the 6.2 branch? I have two apparently similarly configured systems running 6.2p8, with identical hardware, identical apps, and identical load, and I have at least attempted to configure them the same way. Both have /var/crash set up and dumpon enabled in rc.conf. Both crashed in the last week. I got a dump on one, which I now need to analyze, but have twice failed to get a dump on the other. (Once this past week, once the previous month.) -- Clifton -- Clifton Royston -- [EMAIL PROTECTED] / [EMAIL PROTECTED] President - I and I Computing * http://www.iandicomputing.com/ Custom programming, network design, systems and network consulting services ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
OS bug in taskq
Hello, After turning tls/ssl on in Exim and installing dovecot (with pop3s and imaps) I've been getting a panic: kernel trap 12 with interrupts disabled Fatel trap 12: page fault while in kernel mode cpuid = 2; apic id = 06 fault virtual address = 0x104 fault code= supervisor read, page not present instruction pointer = 0x20:0xc06730cd stack pointer = 0x28:0xea1ddc90 frame pointer = 0x28:0xea1ddc9c code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = resume, IOPL = 0 current process = 5 (thread taskq) It's always the same. Same cpuid, same pointers, etc... I have: dumpdev=AUTO in /etc/rc.conf and: options KDB options DDB # debugging kernel in the kernel and I'm still unable to obtain a crash dump. Hopefully there is enough info in this email for a hacker to point me in the right direction to debug this. dmesg: Copyright (c) 1992-2007 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 6.2-RELEASE-p5 #1: Mon Nov 19 11:16:44 MST 2007 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/DDB-SMP Timecounter i8254 frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2793.20-MHz 686-class CPU) Origin = GenuineIntel Id = 0xf4a Stepping = 10 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0x641dSSE3,RSVD2,MON,DS_CPL,CNTX-ID,CX16,b14 AMD Features=0x2010NX,LM AMD Features2=0x1LAHF Logical CPUs per core: 2 real memory = 3220963328 (3071 MB) avail memory = 3150856192 (3004 MB) ACPI APIC Table: DELL PE BKC FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 6 cpu3 (AP): APIC ID: 7 ioapic0: Changing APIC ID to 8 ioapic1: Changing APIC ID to 9 ioapic1: WARNING: intbase 32 != expected base 24 ioapic2: Changing APIC ID to 10 ioapic2: WARNING: intbase 64 != expected base 56 ioapic0 Version 2.0 irqs 0-23 on motherboard ioapic1 Version 2.0 irqs 32-55 on motherboard ioapic2 Version 2.0 irqs 64-87 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.17.2 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) acpi0: DELL PE BKC on motherboard acpi0: Power Button (fixed) Timecounter ACPI-fast frequency 3579545 Hz quality 1000 acpi_timer0: 24-bit timer at 3.579545MHz port 0x808-0x80b on acpi0 cpu0: ACPI CPU on acpi0 cpu1: ACPI CPU on acpi0 cpu2: ACPI CPU on acpi0 cpu3: ACPI CPU on acpi0 pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0 pci0: ACPI PCI bus on pcib0 pcib1: ACPI PCI-PCI bridge at device 2.0 on pci0 pci1: ACPI PCI bus on pcib1 pcib2: ACPI PCI-PCI bridge at device 0.0 on pci1 pci2: ACPI PCI bus on pcib2 amr0: LSILogic MegaRAID 1.53 mem 0xd80f-0xd80f,0xdfdc-0xdfdf irq 46 at device 14.0 on pci2 amr0: delete logical drives supported by controller amr0: LSILogic PERC 4e/Di Firmware 522A, BIOS H430, 256MB RAM pcib3: ACPI PCI-PCI bridge at device 0.2 on pci1 pci3: ACPI PCI bus on pcib3 pcib4: ACPI PCI-PCI bridge at device 4.0 on pci0 pci4: ACPI PCI bus on pcib4 pcib5: ACPI PCI-PCI bridge at device 5.0 on pci0 pci5: ACPI PCI bus on pcib5 pcib6: ACPI PCI-PCI bridge at device 0.0 on pci5 pci6: ACPI PCI bus on pcib6 em0: Intel(R) PRO/1000 Network Connection Version - 6.2.9 port 0xecc0-0xecff mem 0xdfae-0xdfaf irq 64 at device 7.0 on pci6 em0: Ethernet address: 00:18:8b:34:70:50 pcib7: ACPI PCI-PCI bridge at device 0.2 on pci5 pci7: ACPI PCI bus on pcib7 em1: Intel(R) PRO/1000 Network Connection Version - 6.2.9 port 0xdcc0-0xdcff mem 0xdf8e-0xdf8f irq 65 at device 8.0 on pci7 em1: Ethernet address: 00:18:8b:34:70:51 pcib8: ACPI PCI-PCI bridge at device 6.0 on pci0 pci8: ACPI PCI bus on pcib8 uhci0: Intel 82801EB (ICH5) USB controller USB-A port 0xbce0-0xbcff irq 16 at device 29.0 on pci0 uhci0: [GIANT-LOCKED] usb0: Intel 82801EB (ICH5) USB controller USB-A on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: Intel 82801EB (ICH5) USB controller USB-B port 0xbcc0-0xbcdf irq 19 at device 29.1 on pci0 uhci1: [GIANT-LOCKED] usb1: Intel 82801EB (ICH5) USB controller USB-B on uhci1 usb1: USB revision 1.0 uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered uhci2: Intel 82801EB (ICH5) USB controller USB-C port 0xbca0-0xbcbf irq 18 at device 29.2 on pci0 uhci2: [GIANT-LOCKED] usb2: Intel 82801EB (ICH5) USB controller USB-C on uhci2 usb2: USB revision 1.0 uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered ehci0: Intel 82801EB/R (ICH5) USB 2.0 controller mem
Re: OS bug in taskq
On Sat, Dec 15, 2007 at 01:03:14PM -0700, Elliot Finley wrote: I have: dumpdev=AUTO in /etc/rc.conf and: ... in the kernel and I'm still unable to obtain a crash dump. Hopefully there is enough info in this email for a hacker to point me in the right direction to debug this. I can't help with the panic itself, but the reason for the inability to obtain a crash dump is mentioned in a thread I started in November: http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038069.html The explanation of the problem was documented best by Doug Barton in this thread (over at freebsd-rc@): http://lists.freebsd.org/pipermail/freebsd-rc/2007-November/001263.html Open PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=118255 -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: OS bug in taskq
On Sat, 15 Dec 2007 15:58:10 -0800, you wrote: On Sat, Dec 15, 2007 at 01:03:14PM -0700, Elliot Finley wrote: I have: dumpdev=AUTO in /etc/rc.conf and: ... in the kernel and I'm still unable to obtain a crash dump. Hopefully there is enough info in this email for a hacker to point me in the right direction to debug this. I can't help with the panic itself, but the reason for the inability to obtain a crash dump is mentioned in a thread I started in November: http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038069.html The explanation of the problem was documented best by Doug Barton in this thread (over at freebsd-rc@): http://lists.freebsd.org/pipermail/freebsd-rc/2007-November/001263.html In this thread it states: Short term fix is to disable swapping on the system long enough to get the dump, then reboot with swapping turned back on. how do I turn swapping off? I don't think I can just not mount it, because then it wouldn't exist for the dump. Open PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=118255 ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]