Re: Interesting backtrace...
On 18 Mar 2001, Dag-Erling Smorgrav wrote: Anyway, here's the backtrace: root@des /var/crash# gdb -k ... This GDB was configured as "i386-unknown-freebsd". (kgdb) source ~des/kgdb -- What's in here? I guess it is commands to load the crash dump into the debugger. Could you post it, please? So I can make pretty backtraces too :-) (kgdb) kernel 1 Because that command doesn't work for me.. Leif To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting backtrace...
On Wed, Mar 21, 2001 at 11:16:01PM +, David Malone wrote: The graph seems to peak at about 160kB/s, which seems plausable. The code is at: http://www.maths.tcd.ie/~dwmalone/comp/-time.S http://www.maths.tcd.ie/~dwmalone/comp/-time.c http://www.maths.tcd.ie/~dwmalone/comp/-time.S Error 404 The file could not be found. Either the file does not exist or is read protected. Crap - sorry: http://www.maths.tcd.ie/~dwmalone/comp/bzero-time.S http://www.maths.tcd.ie/~dwmalone/comp/bzero-time.c David. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting backtrace...
On Wed, 21 Mar 2001, David Malone wrote: On Mon, Mar 19, 2001 at 02:47:34PM +1100, Bruce Evans wrote: npx.c already has one "fix" for the overflow problem. The problem is may be that clocks don't work early any more. It must be that microtime() doesn't work early any more. I checked that microtime() doesn't work for more than 10 msec if it uses the i8254. When it doesn't work for that long, the bandwidth test breaks down for bzero() bandwidths smaller than 100 MB/sec. Such bandwidths are normal for Intel i586's. E.g., my P5/133 has a generic_bzero() bandwidth of 87e6 bytes/sec and an i586_bzero() bandwidth of 174e6 bytes/sec. This is in userland with a slightly improved i586_bzero() (39 cycles instead of 41 for the inner loop IIRC) and with slightly improved page coloring, and a buffer size of 1MB (same as in the bandwidth test). So, the test always breaks down for my P5/133 if microtime() uses the i8254. OTOH, my K6-1/233 has bandwidths of 135e6 and 127e6 bytes/sec, respectively, so the test never breaks down for it. I did a quick check, and it does seem that i586_bzero can be faster on the k6-2. I found it was about twice as fast for large buffers. This was timed in userland using the TSC. With a slightly simplified version of i586_bzero (I removed all the kernel specific stuff and had it always save the floating point state on the stack). A graph is at: This is surprising. http://www.maths.tcd.ie/~dwmalone/comp/bzero-band.ps The graph seems to peak at about 160kB/s, which seems plausable. 160kB/sec is implausible :-). 160MB/sec is plausible. Half that is hard to understand. Why is it slower than my K6-1? Ah, I partly understand. My K6-1 has an L2 cache size of 1MB, so the 1MB buffer size is really too small for it if write allocation is enabled. P5's don't have write allocation, so the buffer size for them is not critical. All K6's have write allocation IIRC. With a buffer size of 2MB, the bandwidths for my K6-1/233 are 84e6 and 80e6 bytes/sec, respectively. So 80MB/sec is plausible and 160MB/sec is fast (it's equivalent to 320MB/sec without write allocation). These complications show how hard it is to write a single bandwidth test that works for all i586's. I think the next step (after fixing the i586 functions) should be to reduce the buffer size signicantly and not worry about cache effects. Cache effects benefit generic_bzero() in the bandwidth test but they probably benefit it in normal use too. The code is at: http://www.maths.tcd.ie/~dwmalone/comp/-time.S http://www.maths.tcd.ie/~dwmalone/comp/-time.c (It's crude, but seemed to produce moderately OK results. You get ocasional dips in the bandwidth due to using the tcs for timing. I only tried sizes which were a power of two, aswell...) I wrote not-so-crude read/write/copy/checksum userland benchmarks to test this stuff when I helped implement the i586-optimized routines. Here is the write benchmark. Compile it with 'cc -aout'. --- #include sys/types.h #include sys/time.h #include sys/resource.h #include machine/cpufunc.h #include stdlib.h #include stdio.h #include string.h #include unistd.h typedef void func_t(void *buf, size_t len); struct func { func_t *fn; char *name; char *description; }; static func_t zero0, zero1, zero2, zero3, zero4, zero5, zero6, zero7; static func_t zero8, zero9, zeroA, zeroB, zeroC, zeroD; static void usage(void); static char const *progname; static struct func funcs[] = { zero0, "zero0", "stosl", zero1, "zero1", "unroll 16", zero2, "zero2", "unroll 16 preallocate", zero3, "zero3", "unroll 32", zero4, "zero4", "unroll 32 preallocate", zero5, "zero5", "unroll 64", zero6, "zero6", "unroll 64 preallocate", zero7, "zero7", "fstl", zero8, "zero8", "movl", zero9, "zero9", "unroll 8", zeroA, "zeroA", "generic_bzero", zeroB, "zeroB", "i486_bzero", zeroC, "zeroC", "i586_bzero", zeroD, "zeroD", "i686_pagezero", bzero, "zeroE", "bzero (stosl)", }; #define NFUNC (sizeof funcs / sizeof funcs[0]) int main(int argc, char **argv) { unsigned char *buf; int ch; int funcn; int funcnspecified; int i586; size_t len; size_t max; int precache; int quiet; size_t thrashbufsize; unsigned long long tot; progname = argv[0]; funcnspecified = -1; i586 = 0; len = 4096; precache = 0; quiet = 0; tot = 1; while ((ch = getopt(argc, argv, "5f:l:pqt:")) != EOF) { switch (ch) { case '5': i586 = 1; break; case 'f': funcnspecified = strtoul(optarg, (char **) NULL, 0); if (funcnspecified 0 || funcnspecified = NFUNC) usage(); break; case 'l': len = strtoul(optarg, (char **) NULL, 0); break; case 'p': precache = 1; break; case 'q':
Re: Interesting backtrace...
On 19 Mar 2001, Dag-Erling Smorgrav wrote: Bruce Evans [EMAIL PROTECTED] writes: K6-2's aren't really i586's and i586_bzero should never be used for them (generic bzero is faster), Wrong. I fixed machdep.c to compute and print the bandwidth correctly: Wrong yourself. The fpu is too slow to use for copying for everything except original Pentiums. The bandwidth test is just done to avoid hard- configuring this knowledge. If this is the case, is there much point in keeping the fpu register bcopy and bzero at all? To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting backtrace...
On Mon, 19 Mar 2001, Jake Burkholder wrote: [bde wrote] Wrong yourself. The fpu is too slow to use for copying for everything except original Pentiums. The bandwidth test is just done to avoid hard- configuring this knowledge. If this is the case, is there much point in keeping the fpu register bcopy and bzero at all? Original Pentiums still exist, and copying through the FPU might be faster on future i386's. Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting backtrace...
On Sun, Mar 18, 2001 at 04:41:03PM +0100, Dag-Erling Smorgrav wrote: I finally caught a backtrace from one of those recurring stack smash panics. I've been getting a few of these every day for a couple of weeks now but never caught a dump; I caught this one by typing 'panic' immediately instead of trying to get a trace at the ddb prompt first. I have a back trace that is exactly like this. I got it by doing a "call dumpsys" at the ddb prompt. It is also at the pmap_zero_page line in vm_fault, then has the corrupted frame and then goes into another vault. Curiously, my machine is a K6-2 too. I've followed the same gdb steps which you went through, and the panic looks identical. The is the same panic as the one which I posted the ktr trace for a couple of days ago, if that helps. David. CPU: AMD-K6(tm) 3D processor (400.91-MHz 586-class CPU) Origin = "AuthenticAMD" Id = 0x58c Stepping = 12 Features=0x8021bfFPU,VME,DE,PSE,TSC,MSR,MCE,CX8,PGE,MMX AMD Features=0x8800SYSCALL,3DNow! (kgdb) where #0 dumpsys () at ../../kern/kern_shutdown.c:478 #1 0xc0133501 in db_fncall (dummy1=0, dummy2=0, dummy3=0, dummy4=0xc86dfc88 "\20053\001") at ../../ddb/db_command.c:532 #2 0xc013332d in db_command (last_cmdp=0xc02f7554, cmd_table=0xc02f73b4, aux_cmd_tablep=0xc033c95c) at ../../ddb/db_command.c:333 #3 0xc01333f2 in db_command_loop () at ../../ddb/db_command.c:455 #4 0xc01355bb in db_trap (type=12, code=0) at ../../ddb/db_trap.c:71 #5 0xc028932a in kdb_trap (type=12, code=0, regs=0xc86dfdd8) at ../../i386/i386/db_interface.c:164 #6 0xc0297fb8 in trap_fatal (frame=0xc86dfdd8, eva=4294906495) at ../../i386/i386/trap.c:983 #7 0xc0297d25 in trap_pfault (frame=0xc86dfdd8, usermode=0, eva=4294906495) at ../../i386/i386/trap.c:901 #8 0xc0296f7f in trap (frame={tf_fs = 24, tf_es = 16, tf_ds = 16, tf_edi = -932172416, tf_esi = -932879296, tf_ebp = -932315508, tf_isp = -932315644, tf_ebx = 0, tf_edx = -1065345032, tf_ecx = 0, tf_eax = -1071054797, tf_trapno = 12, tf_err = 0, tf_eip = -60801, tf_cs = 8, tf_eflags = 66118, tf_esp = -65536, tf_ss = -1}) at ../../i386/i386/trap.c:448 #9 0x127f in ?? () #10 0xc0266b53 in vm_fault (map=0xc8702d80, vaddr=135098368, fault_type=2 '\002', fault_flags=8) at ../../vm/vm_page.h:493 #11 0xc0297b65 in trap_pfault (frame=0xc86dffa8, usermode=1, eva=135098368) at ../../i386/i386/trap.c:876 #12 0xc0296bab in trap (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 135098368, tf_esi = 13, tf_ebp = -1077937844, tf_isp = -932315180, tf_ebx = 4096, tf_edx = 0, tf_ecx = 1024, tf_eax = -791621424, tf_trapno = 12, tf_err = 6, tf_eip = 134885503, tf_cs = 31, tf_eflags = 66070, tf_esp = -1077937900, tf_ss = 47}) at ../../i386/i386/trap.c:335 (kgdb) up 8 #8 0xc0296f7f in trap (frame={tf_fs = 24, tf_es = 16, tf_ds = 16, tf_edi = -932172416, tf_esi = -932879296, tf_ebp = -932315508, tf_isp = -932315644, tf_ebx = 0, tf_edx = -1065345032, tf_ecx = 0, tf_eax = -1071054797, tf_trapno = 12, tf_err = 0, tf_eip = -60801, tf_cs = 8, tf_eflags = 66118, tf_esp = -65536, tf_ss = -1}) at ../../i386/i386/trap.c:448 448 (void) trap_pfault(frame, FALSE, eva); (kgdb) p/x frame $1 = {tf_fs = 0x18, tf_es = 0x10, tf_ds = 0x10, tf_edi = 0xc8702d80, tf_esi = 0xc8656440, tf_ebp = 0xc86dfe8c, tf_isp = 0xc86dfe04, tf_ebx = 0x0, tf_edx = 0xc0801ff8, tf_ecx = 0x0, tf_eax = 0xc0290033, tf_trapno = 0xc, tf_err = 0x0, tf_eip = 0x127f, tf_cs = 0x8, tf_eflags = 0x10246, tf_esp = 0x, tf_ss = 0x} (kgdb) up 4 #12 0xc0296bab in trap (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 135098368, tf_esi = 13, tf_ebp = -1077937844, tf_isp = -932315180, tf_ebx = 4096, tf_edx = 0, tf_ecx = 1024, tf_eax = -791621424, tf_trapno = 12, tf_err = 6, tf_eip = 134885503, tf_cs = 31, tf_eflags = 66070, tf_esp = -1077937900, tf_ss = 47}) at ../../i386/i386/trap.c:335 335 i = trap_pfault(frame, TRUE, eva); (kgdb) p/x frame $2 = {tf_fs = 0x2f, tf_es = 0x2f, tf_ds = 0x2f, tf_edi = 0x80d7000, tf_esi = 0xd, tf_ebp = 0xbfbff94c, tf_isp = 0xc86dffd4, tf_ebx = 0x1000, tf_edx = 0x0, tf_ecx = 0x400, tf_eax = 0xd0d0d0d0, tf_trapno = 0xc, tf_err = 0x6, tf_eip = 0x80a307f, tf_cs = 0x1f, tf_eflags = 0x10216, tf_esp = 0xbfbff914, tf_ss = 0x2f} (kgdb) p/x CADDR2 $3 = 0xc0801000 (kgdb) p/x CMAP2 $4 = 0xbff02004 To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting backtrace...
Sun, Mar 18, 2001 at 16:41:03, des (Dag-Erling Smorgrav) wrote about "Interesting backtrace...": I finally caught a backtrace from one of those recurring stack smash panics. I've been getting a few of these every day for a couple of weeks now but never caught a dump; I caught this one by typing 'panic' immediately instead of trying to get a trace at the ddb prompt first. [...] #11 0x037f in ?? () #12 0xc023c8bb in vm_fault (map=0xd0768a00, vaddr=138502144, fault_type=2 '\002', fault_flags=8) at ../../vm/vm_page.h:493 I seen a bunch of identical panics on my home system (5.0-current of ~2001.02.27.22.10.00 UTC). I did not reported them yet because of lack of understanding what's happen because pmap_zero_page() call is occured in vm_fault() without this call in source code ;| Looks to me like there was a page fault, and the stack got corrupted while handling that fault (possibly somewhere in pmap_zero_page(), called from vm_page_zero_fill() which is inlined in vm_fault()). (BTW, this is a K6-2, which as far as I can tell is a 586-class CPU) The same, K6-2: CPU: AMD-K6(tm) 3D processor (298.96-MHz 586-class CPU) Origin = "AuthenticAMD" Id = 0x58c Stepping = 12 Features=0x8021bfFPU,VME,DE,PSE,TSC,MSR,MCE,CX8,PGE,MMX AMD Features=0x8800SYSCALL,3DNow! /netch To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting backtrace...
Valentin Nechayev [EMAIL PROTECTED] writes: I did not reported them yet because of lack of understanding what's happen because pmap_zero_page() call is occured in vm_fault() without this call in source code ;| It's called by vm_page_zero_fill() which is inlined and therefore doesn't show up in the backtrace. DES -- Dag-Erling Smorgrav - [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting backtrace...
Verbose boot log as requested. Copyright (c) 1992-2001 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.0-CURRENT #63: Sun Mar 18 22:21:49 CET 2001 [EMAIL PROTECTED]:/usr/src/sys/compile/DES Calibrating clock(s) ... TSC clock: 350796186 Hz, i8254 clock: 1193186 Hz CLK_USE_I8254_CALIBRATION not specified - using default frequency Timecounter "i8254" frequency 1193182 Hz CLK_USE_TSC_CALIBRATION not specified - using old calibration method CPU: AMD-K6(tm) 3D processor (350.80-MHz 586-class CPU) Origin = "AuthenticAMD" Id = 0x58c Stepping = 12 Features=0x8021bfFPU,VME,DE,PSE,TSC,MSR,MCE,CX8,PGE,MMX AMD Features=0x8800SYSCALL,3DNow! Data TLB: 128 entries, 2-way associative Instruction TLB: 64 entries, 1-way associative L1 data cache: 32 kbytes, 32 bytes/line, 2 lines/tag, 2-way associative L1 instruction cache: 32 kbytes, 32 bytes/line, 2 lines/tag, 2-way associative Write Allocate Enable Limit: 192M bytes Write Allocate 15-16M bytes: Enable real memory = 201310208 (196592K bytes) Physical memory chunk(s): 0x1000 - 0x0009, 651264 bytes (159 pages) 0x003e7000 - 0x0bfbbfff, 196956160 bytes (48085 pages) avail memory = 191610880 (187120K bytes) bios32: Found BIOS32 Service Directory header at 0xc00f9b80 bios32: Entry = 0xf0530 (c00f0530) Rev = 0 Len = 1 pcibios: PCI BIOS entry at 0xf+0x560 pnpbios: Found PnP BIOS data at 0xc00fcfb0 pnpbios: Entry = f:cfe0 Rev = 1.0 pnpbios: OEM ID cd041 Other BIOS signatures found: Preloaded elf kernel "kernel" at 0xc03c1000. null: null device, zero device random: entropy source mem: memory I/O K6-family MTRR support enabled (2 registers) VESA: information block 56 45 53 41 00 02 a5 72 00 c0 01 00 00 00 22 00 00 01 80 00 03 01 ba 72 00 c0 c1 72 00 c0 ca 72 00 c0 00 01 01 01 02 01 03 01 05 01 07 01 08 01 09 01 0a 01 0b 01 0c 01 10 01 11 01 12 01 13 01 VESA: 25 mode(s) found VESA: v2.0, 8192k memory, flags:0x1, mode table:0xc033fd82 (122) VESA: Matrox Graphics Inc. VESA: Matrox MGA-G200 00 Using $PIR table, 8 entries at 0xc00f0b40 apm0: APM BIOS on motherboard apm0: found APM BIOS v1.2, connected at v1.2 npx0: math processor on motherboard npx0: INT 16 interface i586_bzero() bandwidth = -1980152482 bytes/sec bzero() bandwidth = 129299198 bytes/sec pcib0: AcerLabs M1541 (Aladdin-V) PCI host bridge at pcibus 0 on motherboard pci0: physical bus=0 map[10]: type 1, range 32, base e000, size 26, enabled found- vendor=0x10b9, dev=0x1541, revid=0x04 bus=0, slot=0, func=0 class=06-00-00, hdrtype=0x00, mfdev=0 found- vendor=0x10b9, dev=0x5243, revid=0x04 bus=0, slot=1, func=0 class=06-04-00, hdrtype=0x01, mfdev=0 found- vendor=0x10b9, dev=0x7101, revid=0x00 bus=0, slot=3, func=0 class=06-80-00, hdrtype=0x00, mfdev=0 found- vendor=0x10b9, dev=0x1533, revid=0xc3 bus=0, slot=7, func=0 class=06-01-00, hdrtype=0x00, mfdev=0 map[10]: type 4, range 32, base d800, size 6, enabled found- vendor=0x10b7, dev=0x9001, revid=0x00 bus=0, slot=11, func=0 class=02-00-00, hdrtype=0x00, mfdev=0 intpin=a, irq=12 map[20]: type 4, range 32, base d400, size 4, enabled found- vendor=0x10b9, dev=0x5229, revid=0xc1 bus=0, slot=15, func=0 class=01-01-8a, hdrtype=0x00, mfdev=0 intpin=a, irq=0 pci0: PCI bus on pcib0 agp0: Ali M1541 host to AGP bridge mem 0xe000-0xe3ff at device 0.0 on pci0 agp0: allocating GATT for aperture of size 64M pcib1: PCI-PCI bridge at device 1.0 on pci0 pcib1: secondary bus 1 pcib1: subordinate bus 1 pcib1: I/O decode0xe000-0xdfff pcib1: memory decode 0xdf00-0xdfff pcib1: prefetched decode 0xe6f0-0xe7ff pci1: physical bus=1 map[10]: type 3, range 32, base e700, size 24, enabled map[14]: type 1, range 32, base df80, size 14, enabled map[18]: type 1, range 32, base df00, size 23, enabled found- vendor=0x102b, dev=0x0521, revid=0x01 bus=1, slot=0, func=0 class=03-00-00, hdrtype=0x00, mfdev=0 intpin=a, irq=11 powerspec 1 supports D0 D3 current D0 pci1: PCI bus on pcib1 pci1: display, VGA at 0.0 (no driver attached) pci0: bridge, PCI-unknown at 3.0 (no driver attached) isab0: PCI-ISA bridge at device 7.0 on pci0 isa0: ISA bus on isab0 xl0: 3Com 3c900-COMBO Etherlink XL port 0xd800-0xd83f irq 12 at device 11.0 on pci0 xl0: Ethernet address: 00:60:08:cf:a8:e4 xl0: media options word: e138 xl0: found 10baseT xl0: found AUI xl0: found BNC xl0: selecting 10baseT transceiver, half duplex bpf: xl0 attached atapci0: AcerLabs Aladdin ATA33 controller port 0xd400-0xd40f irq 0 at device 15.0 on pci0 ata0: iobase=0x01f0 altiobase=0x03f6 bmaddr=0xd400 ata0: mask=03 ostat0=50 ostat2=00 ata0-master: ATAPI probe 00 00 ata0-slave: ATAPI probe 00 00 ata0: mask=03 stat0=50
Re: Interesting backtrace...
On Sun, Mar 18, 2001 at 04:41:03PM +0100, Dag-Erling Smorgrav wrote: I finally caught a backtrace from one of those recurring stack smash panics. I've been getting a few of these every day for a couple of weeks now but never caught a dump; I caught this one by typing 'panic' immediately instead of trying to get a trace at the ddb prompt first. I have a suggestion for what is happening, but I'm not sure exactly what could cause it. From looking at the stack from the core I have found where the esp and eip in the second trap frame are comming, from: 0x, 0x, ... another 100 bytes 0xc0293748, return address in pmap_zero_page (calling bzero) 0xc0801000, CADDR2 0x1000, PAGESIZE 0xc86dff38, pushed ebp 0xc0266b53return address in vm_fault (calling pmap_zero_page) Now, 108 bytes is exactly the amount that the i586_bzero shifts the stack by if it uses the floating point registers and it needs to preserve them. At the end it checks "PCPU(NPXPROC)" again and if it is zero it doesn't bother popping the 108 bytes off the stack. Presumably what is happening is i586_bzero begins and finds that PCPU(NPXPROC) is not zero, so it decides to preserve the fpu registers. Then something interrupts it, but doesn't restore PCPU(NPXPROC). When i586_bzero returns it uses the first 8 bytes of the fpu registers for its intstruction pointer and stack pointer and goes boom. I haven't tried to find out what is supposed to save and restore PCPU(NPXPROC). In my trace the process in question had recently been interrupted by the ata interrupt. David. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting backtrace...
On Sun, 18 Mar 2001, David Malone wrote: Presumably what is happening is i586_bzero begins and finds that PCPU(NPXPROC) is not zero, so it decides to preserve the fpu registers. Then something interrupts it, but doesn't restore PCPU(NPXPROC). When i586_bzero returns it uses the first 8 bytes of the fpu registers for its intstruction pointer and stack pointer and goes boom. i586_bzero is missing the hack of disabling interrupts to prevent problems with npxproc getting switched, and it uses its own funky locking which hasn't been regraded for SMPng, so it is quite likely to be buggy. K6-2's aren't really i586's and i586_bzero should never be used for them (generic bzero is faster), but there is apparently another bug that may cause them to be used. From des's dmesg output: i586_bzero() bandwidth = -1980152482 bytes/sec ^ bzero() bandwidth = 129299198 bytes/sec i586_bzero gets used because negative bandwidths are significantly smaller than positive ones, so plain bzero is faster according to this message, but whatever the overflow apparently causes other bad things. npx.c already has one "fix" for the overflow problem. The problem is may be that clocks don't work early any more. Similarly for the i586-optimized bcopy and copyin/out (the bandwidth test for bzero controls them all). Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
RE: Interesting backtrace...
On 18-Mar-01 Dag-Erling Smorgrav wrote: I finally caught a backtrace from one of those recurring stack smash panics. I've been getting a few of these every day for a couple of weeks now but never caught a dump; I caught this one by typing 'panic' immediately instead of trying to get a trace at the ddb prompt first. These panics invariably start like this (always the same eip): kernel: type 12 trap, code=0 Stopped at -0xfc81:kernel: type 12 trap, code=0 db Anyway, here's the backtrace: #12 0xc023c8bb in vm_fault (map=0xd0768a00, vaddr=138502144, fault_type=2 '\002', fault_flags=8) at ../../vm/vm_page.h:493 pmap_zero_page(VM_PAGE_TO_PHYS(m)); Can you throw some extra tests in there to make sure m isn't NULL? Also, you might want to check VM_PAGE_TO_PHYS(m) for any weird values. -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting backtrace...
Bruce Evans [EMAIL PROTECTED] writes: K6-2's aren't really i586's and i586_bzero should never be used for them (generic bzero is faster), Wrong. I fixed machdep.c to compute and print the bandwidth correctly: des@des ~% egrep '(CPU|bzero)' /var/run/dmesg.boot CPU: AMD-K6(tm) 3D processor (350.80-MHz 586-class CPU) i586_bzero() bandwidth = 1056759 kBps bzero() bandwidth = 124211 kBps i586_bzero gets used because negative bandwidths are significantly smaller than positive ones, Uh, Bruce, we pick the method that gives the *highest* bandwidth, not the lowest. so plain bzero is faster according to this message, There you go contradicting yourself... Anyway, the bug is not K6-specific - I guess the reason why we're only seeing it on K6's is that they're the only 586-class CPUs that are fast enough to still be in widespread use. Except I just remembered I have a dual Pentium box I use for SMP work, but haven't booted in several weeks... because it keeps crashing... with a smashed stack. DES -- Dag-Erling Smorgrav - [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting backtrace...
John Baldwin [EMAIL PROTECTED] writes: Can you throw some extra tests in there to make sure m isn't NULL? Also, you might want to check VM_PAGE_TO_PHYS(m) for any weird values. No need - David and Jake already tracked it down to evilness in i586_bzero(). DES -- Dag-Erling Smorgrav - [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting backtrace...
Dag-Erling Smorgrav [EMAIL PROTECTED] writes: Wrong. I fixed machdep.c to compute and print the bandwidth correctly: I mean npx.c. I'll commit the fix in a second. DES -- Dag-Erling Smorgrav - [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting backtrace...
On Mon, 19 Mar 2001, Bruce Evans wrote: K6-2's aren't really i586's and i586_bzero should never be used for them (generic bzero is faster), but there is apparently another bug that may cause them to be used. From des's dmesg output: i586_bzero() bandwidth = -1980152482 bytes/sec ^ bzero() bandwidth = 129299198 bytes/sec i586_bzero gets used because negative bandwidths are significantly ^ oops, I meant "should not get used" smaller than positive ones, so plain bzero is faster according to this message, but whatever the overflow apparently causes other bad things. The overflow is actually only in the error message. It is caused by a preposterous value for `usec'. npx.c already has one "fix" for the overflow problem. The problem is may be that clocks don't work early any more. It must be that microtime() doesn't work early any more. Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting backtrace...
On 19 Mar 2001, Dag-Erling Smorgrav wrote: Bruce Evans [EMAIL PROTECTED] writes: K6-2's aren't really i586's and i586_bzero should never be used for them (generic bzero is faster), Wrong. I fixed machdep.c to compute and print the bandwidth correctly: Wrong yourself. The fpu is too slow to use for copying for everything except original Pentiums. The bandwidth test is just done to avoid hard- configuring this knowledge. des@des ~% egrep '(CPU|bzero)' /var/run/dmesg.boot CPU: AMD-K6(tm) 3D processor (350.80-MHz 586-class CPU) i586_bzero() bandwidth = 1056759 kBps bzero() bandwidth = 124211 kBps I don't believe a bandwitdh of 1 GB/sec. It may be possible if the buffer fits in an L1 or on-chip L2 cache (the test buffer is a bit small for today's L2 cache sizes), but then plain bzero() would also benefit from the cache. i586_bzero gets used because negative bandwidths are significantly smaller than positive ones, Uh, Bruce, we pick the method that gives the *highest* bandwidth, not the lowest. Sorry, I meant "should not get used". so plain bzero is faster according to this message, There you go contradicting yourself... This part is correct. Anyway, the bug is not K6-specific - I guess the reason why we're only seeing it on K6's is that they're the only 586-class CPUs that are fast enough to still be in widespread use. The bug in i586_bzero() affects anything that gets that far, but only original Pentiums should get that far. Apparently not many people run -current on those. I actually turned on ny P5/133 a week ago, but I didn't notice the bug. The "bug" in npx.c is not really a bug. It's just that the printf was written before %lld was supported in the kernel, so it truncates to long and uses %ld. This shouldn't be a problem until the bandwidth of main memory exceeds 2GB/sec, which won't happen soon (neither will your apparent 1GB/sec bandwidth). However, bugs in microtime() sometime cause the bandwidth to apparently exceed 2GB/sec. It is a feature that huge bandwidths sometimes get printed as negative values -- negative values are more obviously wrong. This is why %ld is used instead of %lu. Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting backtrace...
On 19 Mar 2001, Dag-Erling Smorgrav wrote: Dag-Erling Smorgrav [EMAIL PROTECTED] writes: Wrong. I fixed machdep.c to compute and print the bandwidth correctly: I mean npx.c. I'll commit the fix in a second. Please send it to the maintainer for review. Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting backtrace...
Bruce Evans [EMAIL PROTECTED] writes: On 19 Mar 2001, Dag-Erling Smorgrav wrote: Dag-Erling Smorgrav [EMAIL PROTECTED] writes: Wrong. I fixed machdep.c to compute and print the bandwidth correctly: I mean npx.c. I'll commit the fix in a second. Please send it to the maintainer for review. I'm not aware of npx.c having a maintainer. The change was OK'ed by Jake Burkholder and/or John Baldwin on IRC. DES -- Dag-Erling Smorgrav - [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting backtrace...
On 19 Mar 2001, Dag-Erling Smorgrav wrote: Anyway, the bug is not K6-specific - I guess the reason why we're only seeing it on K6's is that they're the only 586-class CPUs that are fast enough to still be in widespread use. I have the same panics in one of my pentium 166 mmx boxes. Even some addresses are the same as in your dump. I've posted a message about this bug a week ago (with subj "double panic" or something like this). To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message