Re: Interesting backtrace...

2001-03-31 Thread Leif Neland



On 18 Mar 2001, Dag-Erling Smorgrav wrote:

 Anyway, here's the backtrace:

 root@des /var/crash# gdb -k
...
 This GDB was configured as "i386-unknown-freebsd".
 (kgdb) source ~des/kgdb  -- What's in here?
I guess it is commands to load the crash dump into the debugger.
Could you post it, please?
So I can make pretty backtraces too :-)


 (kgdb) kernel 1

Because that command doesn't work for me..

Leif


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting backtrace...

2001-03-22 Thread David Malone

 On Wed, Mar 21, 2001 at 11:16:01PM +, David Malone wrote:
  The graph seems to peak at about 160kB/s, which seems plausable.
  The code is at:
  
  http://www.maths.tcd.ie/~dwmalone/comp/-time.S
  http://www.maths.tcd.ie/~dwmalone/comp/-time.c

 http://www.maths.tcd.ie/~dwmalone/comp/-time.S
 Error 404
 The file could not be found.
 Either the file does not exist or is read protected.

Crap - sorry:

http://www.maths.tcd.ie/~dwmalone/comp/bzero-time.S
http://www.maths.tcd.ie/~dwmalone/comp/bzero-time.c

David.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting backtrace...

2001-03-21 Thread Bruce Evans

On Wed, 21 Mar 2001, David Malone wrote:

 On Mon, Mar 19, 2001 at 02:47:34PM +1100, Bruce Evans wrote:
   npx.c already has one "fix" for the overflow problem.  The problem
   is may be that clocks don't work early any more.
  
  It must be that microtime() doesn't work early any more.

I checked that microtime() doesn't work for more than 10 msec if it
uses the i8254.  When it doesn't work for that long, the bandwidth
test breaks down for bzero() bandwidths smaller than 100 MB/sec.  Such
bandwidths are normal for Intel i586's.  E.g., my P5/133 has a
generic_bzero() bandwidth of 87e6 bytes/sec and an i586_bzero()
bandwidth of 174e6 bytes/sec.  This is in userland with a slightly
improved i586_bzero() (39 cycles instead of 41 for the inner loop
IIRC) and with slightly improved page coloring, and a buffer size of
1MB (same as in the bandwidth test).  So, the test always breaks down
for my P5/133 if microtime() uses the i8254.  OTOH, my K6-1/233 has
bandwidths of 135e6 and 127e6 bytes/sec, respectively, so the test
never breaks down for it.

 I did a quick check, and it does seem that i586_bzero can be faster
 on the k6-2. I found it was about twice as fast for large buffers.
 This was timed in userland using the TSC. With a slightly simplified
 version of i586_bzero (I removed all the kernel specific stuff and
 had it always save the floating point state on the stack). A graph
 is at:

This is surprising.

   http://www.maths.tcd.ie/~dwmalone/comp/bzero-band.ps
 
 The graph seems to peak at about 160kB/s, which seems plausable.

160kB/sec is implausible :-).  160MB/sec is plausible.  Half that
is hard to understand.  Why is it slower than my K6-1?  Ah, I
partly understand.  My K6-1 has an L2 cache size of 1MB, so the
1MB buffer size is really too small for it if write allocation
is enabled.  P5's don't have write allocation, so the buffer size
for them is not critical.  All K6's have write allocation IIRC.
With a buffer size of 2MB, the bandwidths for my K6-1/233 are
84e6 and 80e6 bytes/sec, respectively.  So 80MB/sec is plausible
and 160MB/sec is fast (it's equivalent to 320MB/sec without
write allocation).

These complications show how hard it is to write a single bandwidth
test that works for all i586's.  I think the next step (after fixing
the i586 functions) should be to reduce the buffer size signicantly
and not worry about cache effects.  Cache effects benefit generic_bzero()
in the bandwidth test but they probably benefit it in normal use too.

 The code is at:
 
   http://www.maths.tcd.ie/~dwmalone/comp/-time.S
   http://www.maths.tcd.ie/~dwmalone/comp/-time.c
 
 (It's crude, but seemed to produce moderately OK results. You get
 ocasional dips in the bandwidth due to using the tcs for timing.
 I only tried sizes which were a power of two, aswell...)

I wrote not-so-crude read/write/copy/checksum userland benchmarks to
test this stuff when I helped implement the i586-optimized routines.
Here is the write benchmark.  Compile it with 'cc -aout'.

---
#include sys/types.h
#include sys/time.h
#include sys/resource.h

#include machine/cpufunc.h

#include stdlib.h
#include stdio.h
#include string.h
#include unistd.h

typedef void func_t(void *buf, size_t len);

struct func
{
func_t *fn;
char *name;
char *description;
};

static func_t zero0, zero1, zero2, zero3, zero4, zero5, zero6, zero7;
static func_t zero8, zero9, zeroA, zeroB, zeroC, zeroD;
static void usage(void);

static char const *progname;

static struct func funcs[] =
{
zero0, "zero0", "stosl",
zero1, "zero1", "unroll 16",
zero2, "zero2", "unroll 16 preallocate",
zero3, "zero3", "unroll 32",
zero4, "zero4", "unroll 32 preallocate",
zero5, "zero5", "unroll 64",
zero6, "zero6", "unroll 64 preallocate",
zero7, "zero7", "fstl",
zero8, "zero8", "movl",
zero9, "zero9", "unroll 8",
zeroA, "zeroA", "generic_bzero",
zeroB, "zeroB", "i486_bzero",
zeroC, "zeroC", "i586_bzero",
zeroD, "zeroD", "i686_pagezero",
bzero, "zeroE", "bzero (stosl)",
};
#define NFUNC   (sizeof funcs / sizeof funcs[0])

int main(int argc, char **argv)
{
unsigned char *buf;
int ch;
int funcn;
int funcnspecified;
int i586;
size_t len;
size_t max;
int precache;
int quiet;
size_t thrashbufsize;
unsigned long long tot;

progname = argv[0];
funcnspecified = -1;
i586 = 0;
len = 4096;
precache = 0;
quiet = 0;
tot = 1;
while ((ch = getopt(argc, argv, "5f:l:pqt:")) != EOF)
{
switch (ch)
{
case '5':
i586 = 1;
break;
case 'f':
funcnspecified = strtoul(optarg, (char **) NULL, 0);
if (funcnspecified  0 || funcnspecified = NFUNC)
usage();
break;
case 'l':
len = strtoul(optarg, (char **) NULL, 0);
break;
case 'p':
precache = 1;
break;
case 'q':
   

Re: Interesting backtrace...

2001-03-19 Thread Jake Burkholder

 On 19 Mar 2001, Dag-Erling Smorgrav wrote:
 
  Bruce Evans [EMAIL PROTECTED] writes:
   K6-2's aren't really i586's and i586_bzero should never be used for
   them (generic bzero is faster),
  
  Wrong. I fixed machdep.c to compute and print the bandwidth correctly:
 
 Wrong yourself.  The fpu is too slow to use for copying for everything
 except original Pentiums.  The bandwidth test is just done to avoid hard-
 configuring this knowledge.
 

If this is the case, is there much point in keeping the fpu register
bcopy and bzero at all?


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting backtrace...

2001-03-19 Thread Bruce Evans

On Mon, 19 Mar 2001, Jake Burkholder wrote:

[bde wrote]
  Wrong yourself.  The fpu is too slow to use for copying for everything
  except original Pentiums.  The bandwidth test is just done to avoid hard-
  configuring this knowledge.
 
 If this is the case, is there much point in keeping the fpu register
 bcopy and bzero at all?

Original Pentiums still exist, and copying through the FPU might be faster
on future i386's.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting backtrace...

2001-03-18 Thread David Malone

On Sun, Mar 18, 2001 at 04:41:03PM +0100, Dag-Erling Smorgrav wrote:
 I finally caught a backtrace from one of those recurring stack smash
 panics. I've been getting a few of these every day for a couple of
 weeks now but never caught a dump; I caught this one by typing 'panic'
 immediately instead of trying to get a trace at the ddb prompt first.

I have a back trace that is exactly like this. I got it by doing
a "call dumpsys" at the ddb prompt. It is also at the pmap_zero_page
line in vm_fault, then has the corrupted frame and then goes into another
vault. Curiously, my machine is a K6-2 too.

I've followed the same gdb steps which you went through, and the panic
looks identical. The is the same panic as the one which I posted the
ktr trace for a couple of days ago, if that helps.

David.

CPU: AMD-K6(tm) 3D processor (400.91-MHz 586-class CPU)
  Origin = "AuthenticAMD"  Id = 0x58c  Stepping = 12
  Features=0x8021bfFPU,VME,DE,PSE,TSC,MSR,MCE,CX8,PGE,MMX
  AMD Features=0x8800SYSCALL,3DNow!

(kgdb) where
#0  dumpsys () at ../../kern/kern_shutdown.c:478
#1  0xc0133501 in db_fncall (dummy1=0, dummy2=0, dummy3=0, 
dummy4=0xc86dfc88 "\20053\001") at ../../ddb/db_command.c:532
#2  0xc013332d in db_command (last_cmdp=0xc02f7554, cmd_table=0xc02f73b4, 
aux_cmd_tablep=0xc033c95c) at ../../ddb/db_command.c:333
#3  0xc01333f2 in db_command_loop () at ../../ddb/db_command.c:455
#4  0xc01355bb in db_trap (type=12, code=0) at ../../ddb/db_trap.c:71
#5  0xc028932a in kdb_trap (type=12, code=0, regs=0xc86dfdd8)
at ../../i386/i386/db_interface.c:164
#6  0xc0297fb8 in trap_fatal (frame=0xc86dfdd8, eva=4294906495)
at ../../i386/i386/trap.c:983
#7  0xc0297d25 in trap_pfault (frame=0xc86dfdd8, usermode=0, eva=4294906495)
at ../../i386/i386/trap.c:901
#8  0xc0296f7f in trap (frame={tf_fs = 24, tf_es = 16, tf_ds = 16, 
  tf_edi = -932172416, tf_esi = -932879296, tf_ebp = -932315508, 
  tf_isp = -932315644, tf_ebx = 0, tf_edx = -1065345032, tf_ecx = 0, 
  tf_eax = -1071054797, tf_trapno = 12, tf_err = 0, tf_eip = -60801, 
  tf_cs = 8, tf_eflags = 66118, tf_esp = -65536, tf_ss = -1})
at ../../i386/i386/trap.c:448
#9  0x127f in ?? ()
#10 0xc0266b53 in vm_fault (map=0xc8702d80, vaddr=135098368, 
fault_type=2 '\002', fault_flags=8) at ../../vm/vm_page.h:493
#11 0xc0297b65 in trap_pfault (frame=0xc86dffa8, usermode=1, eva=135098368)
at ../../i386/i386/trap.c:876
#12 0xc0296bab in trap (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, 
  tf_edi = 135098368, tf_esi = 13, tf_ebp = -1077937844, 
  tf_isp = -932315180, tf_ebx = 4096, tf_edx = 0, tf_ecx = 1024, 
  tf_eax = -791621424, tf_trapno = 12, tf_err = 6, tf_eip = 134885503, 
  tf_cs = 31, tf_eflags = 66070, tf_esp = -1077937900, tf_ss = 47})
at ../../i386/i386/trap.c:335
(kgdb) up 8
#8  0xc0296f7f in trap (frame={tf_fs = 24, tf_es = 16, tf_ds = 16, 
  tf_edi = -932172416, tf_esi = -932879296, tf_ebp = -932315508, 
  tf_isp = -932315644, tf_ebx = 0, tf_edx = -1065345032, tf_ecx = 0, 
  tf_eax = -1071054797, tf_trapno = 12, tf_err = 0, tf_eip = -60801, 
  tf_cs = 8, tf_eflags = 66118, tf_esp = -65536, tf_ss = -1})
at ../../i386/i386/trap.c:448
448 (void) trap_pfault(frame, FALSE, eva);
(kgdb) p/x frame
$1 = {tf_fs = 0x18, tf_es = 0x10, tf_ds = 0x10, tf_edi = 0xc8702d80, 
  tf_esi = 0xc8656440, tf_ebp = 0xc86dfe8c, tf_isp = 0xc86dfe04, tf_ebx = 0x0, 
  tf_edx = 0xc0801ff8, tf_ecx = 0x0, tf_eax = 0xc0290033, tf_trapno = 0xc, 
  tf_err = 0x0, tf_eip = 0x127f, tf_cs = 0x8, tf_eflags = 0x10246, 
  tf_esp = 0x, tf_ss = 0x}
(kgdb) up 4
#12 0xc0296bab in trap (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, 
  tf_edi = 135098368, tf_esi = 13, tf_ebp = -1077937844, 
  tf_isp = -932315180, tf_ebx = 4096, tf_edx = 0, tf_ecx = 1024, 
  tf_eax = -791621424, tf_trapno = 12, tf_err = 6, tf_eip = 134885503, 
  tf_cs = 31, tf_eflags = 66070, tf_esp = -1077937900, tf_ss = 47})
at ../../i386/i386/trap.c:335
335 i = trap_pfault(frame, TRUE, eva);
(kgdb) p/x frame
$2 = {tf_fs = 0x2f, tf_es = 0x2f, tf_ds = 0x2f, tf_edi = 0x80d7000, 
  tf_esi = 0xd, tf_ebp = 0xbfbff94c, tf_isp = 0xc86dffd4, tf_ebx = 0x1000, 
  tf_edx = 0x0, tf_ecx = 0x400, tf_eax = 0xd0d0d0d0, tf_trapno = 0xc, 
  tf_err = 0x6, tf_eip = 0x80a307f, tf_cs = 0x1f, tf_eflags = 0x10216, 
  tf_esp = 0xbfbff914, tf_ss = 0x2f}
(kgdb) p/x CADDR2
$3 = 0xc0801000
(kgdb) p/x CMAP2
$4 = 0xbff02004

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting backtrace...

2001-03-18 Thread Valentin Nechayev

 Sun, Mar 18, 2001 at 16:41:03, des (Dag-Erling Smorgrav) wrote about "Interesting 
backtrace...": 

 I finally caught a backtrace from one of those recurring stack smash
 panics. I've been getting a few of these every day for a couple of
 weeks now but never caught a dump; I caught this one by typing 'panic'
 immediately instead of trying to get a trace at the ddb prompt first.

[...]

 #11 0x037f in ?? ()
 #12 0xc023c8bb in vm_fault (map=0xd0768a00, vaddr=138502144,
 fault_type=2 '\002', fault_flags=8) at ../../vm/vm_page.h:493

I seen a bunch of identical panics on my home system (5.0-current
of ~2001.02.27.22.10.00 UTC).
I did not reported them yet because of lack of understanding
what's happen because pmap_zero_page() call is occured in vm_fault()
without this call in source code ;|

 Looks to me like there was a page fault, and the stack got corrupted
 while handling that fault (possibly somewhere in pmap_zero_page(),
 called from vm_page_zero_fill() which is inlined in vm_fault()).
 (BTW, this is a K6-2, which as far as I can tell is a 586-class CPU)

The same, K6-2:

CPU: AMD-K6(tm) 3D processor (298.96-MHz 586-class CPU)
  Origin = "AuthenticAMD"  Id = 0x58c  Stepping = 12
  Features=0x8021bfFPU,VME,DE,PSE,TSC,MSR,MCE,CX8,PGE,MMX
  AMD Features=0x8800SYSCALL,3DNow!


/netch

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting backtrace...

2001-03-18 Thread Dag-Erling Smorgrav

Valentin Nechayev [EMAIL PROTECTED] writes:
 I did not reported them yet because of lack of understanding
 what's happen because pmap_zero_page() call is occured in vm_fault()
 without this call in source code ;|

It's called by vm_page_zero_fill() which is inlined and therefore
doesn't show up in the backtrace.

DES
-- 
Dag-Erling Smorgrav - [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting backtrace...

2001-03-18 Thread Dag-Erling Smorgrav

Verbose boot log as requested.

Copyright (c) 1992-2001 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.0-CURRENT #63: Sun Mar 18 22:21:49 CET 2001
[EMAIL PROTECTED]:/usr/src/sys/compile/DES
Calibrating clock(s) ... TSC clock: 350796186 Hz, i8254 clock: 1193186 Hz
CLK_USE_I8254_CALIBRATION not specified - using default frequency
Timecounter "i8254"  frequency 1193182 Hz
CLK_USE_TSC_CALIBRATION not specified - using old calibration method
CPU: AMD-K6(tm) 3D processor (350.80-MHz 586-class CPU)
  Origin = "AuthenticAMD"  Id = 0x58c  Stepping = 12
  Features=0x8021bfFPU,VME,DE,PSE,TSC,MSR,MCE,CX8,PGE,MMX
  AMD Features=0x8800SYSCALL,3DNow!
Data TLB: 128 entries, 2-way associative
Instruction TLB: 64 entries, 1-way associative
L1 data cache: 32 kbytes, 32 bytes/line, 2 lines/tag, 2-way associative
L1 instruction cache: 32 kbytes, 32 bytes/line, 2 lines/tag, 2-way associative
Write Allocate Enable Limit: 192M bytes
Write Allocate 15-16M bytes: Enable
real memory  = 201310208 (196592K bytes)
Physical memory chunk(s):
0x1000 - 0x0009, 651264 bytes (159 pages)
0x003e7000 - 0x0bfbbfff, 196956160 bytes (48085 pages)
avail memory = 191610880 (187120K bytes)
bios32: Found BIOS32 Service Directory header at 0xc00f9b80
bios32: Entry = 0xf0530 (c00f0530)  Rev = 0  Len = 1
pcibios: PCI BIOS entry at 0xf+0x560
pnpbios: Found PnP BIOS data at 0xc00fcfb0
pnpbios: Entry = f:cfe0  Rev = 1.0
pnpbios: OEM ID cd041
Other BIOS signatures found:
Preloaded elf kernel "kernel" at 0xc03c1000.
null: null device, zero device
random: entropy source
mem: memory  I/O
K6-family MTRR support enabled (2 registers)
VESA: information block
56 45 53 41 00 02 a5 72 00 c0 01 00 00 00 22 00 
00 01 80 00 03 01 ba 72 00 c0 c1 72 00 c0 ca 72 
00 c0 00 01 01 01 02 01 03 01 05 01 07 01 08 01 
09 01 0a 01 0b 01 0c 01 10 01 11 01 12 01 13 01 
VESA: 25 mode(s) found
VESA: v2.0, 8192k memory, flags:0x1, mode table:0xc033fd82 (122)
VESA: Matrox Graphics Inc.
VESA: Matrox MGA-G200 00
Using $PIR table, 8 entries at 0xc00f0b40
apm0: APM BIOS on motherboard
apm0: found APM BIOS v1.2, connected at v1.2
npx0: math processor on motherboard
npx0: INT 16 interface
i586_bzero() bandwidth = -1980152482 bytes/sec
bzero() bandwidth = 129299198 bytes/sec
pcib0: AcerLabs M1541 (Aladdin-V) PCI host bridge at pcibus 0 on motherboard
pci0: physical bus=0
map[10]: type 1, range 32, base e000, size 26, enabled
found- vendor=0x10b9, dev=0x1541, revid=0x04
bus=0, slot=0, func=0
class=06-00-00, hdrtype=0x00, mfdev=0
found- vendor=0x10b9, dev=0x5243, revid=0x04
bus=0, slot=1, func=0
class=06-04-00, hdrtype=0x01, mfdev=0
found- vendor=0x10b9, dev=0x7101, revid=0x00
bus=0, slot=3, func=0
class=06-80-00, hdrtype=0x00, mfdev=0
found- vendor=0x10b9, dev=0x1533, revid=0xc3
bus=0, slot=7, func=0
class=06-01-00, hdrtype=0x00, mfdev=0
map[10]: type 4, range 32, base d800, size  6, enabled
found- vendor=0x10b7, dev=0x9001, revid=0x00
bus=0, slot=11, func=0
class=02-00-00, hdrtype=0x00, mfdev=0
intpin=a, irq=12
map[20]: type 4, range 32, base d400, size  4, enabled
found- vendor=0x10b9, dev=0x5229, revid=0xc1
bus=0, slot=15, func=0
class=01-01-8a, hdrtype=0x00, mfdev=0
intpin=a, irq=0
pci0: PCI bus on pcib0
agp0: Ali M1541 host to AGP bridge mem 0xe000-0xe3ff at device 0.0 on pci0
agp0: allocating GATT for aperture of size 64M
pcib1: PCI-PCI bridge at device 1.0 on pci0
pcib1:   secondary bus 1
pcib1:   subordinate bus   1
pcib1:   I/O decode0xe000-0xdfff
pcib1:   memory decode 0xdf00-0xdfff
pcib1:   prefetched decode 0xe6f0-0xe7ff
pci1: physical bus=1
map[10]: type 3, range 32, base e700, size 24, enabled
map[14]: type 1, range 32, base df80, size 14, enabled
map[18]: type 1, range 32, base df00, size 23, enabled
found- vendor=0x102b, dev=0x0521, revid=0x01
bus=1, slot=0, func=0
class=03-00-00, hdrtype=0x00, mfdev=0
intpin=a, irq=11
powerspec 1  supports D0 D3  current D0
pci1: PCI bus on pcib1
pci1: display, VGA at 0.0 (no driver attached)
pci0: bridge, PCI-unknown at 3.0 (no driver attached)
isab0: PCI-ISA bridge at device 7.0 on pci0
isa0: ISA bus on isab0
xl0: 3Com 3c900-COMBO Etherlink XL port 0xd800-0xd83f irq 12 at device 11.0 on pci0
xl0: Ethernet address: 00:60:08:cf:a8:e4
xl0: media options word: e138
xl0: found 10baseT
xl0: found AUI
xl0: found BNC
xl0: selecting 10baseT transceiver, half duplex
bpf: xl0 attached
atapci0: AcerLabs Aladdin ATA33 controller port 0xd400-0xd40f irq 0 at device 15.0 
on pci0
ata0: iobase=0x01f0 altiobase=0x03f6 bmaddr=0xd400
ata0: mask=03 ostat0=50 ostat2=00
ata0-master: ATAPI probe 00 00
ata0-slave: ATAPI probe 00 00
ata0: mask=03 stat0=50 

Re: Interesting backtrace...

2001-03-18 Thread David Malone

On Sun, Mar 18, 2001 at 04:41:03PM +0100, Dag-Erling Smorgrav wrote:
 I finally caught a backtrace from one of those recurring stack smash
 panics. I've been getting a few of these every day for a couple of
 weeks now but never caught a dump; I caught this one by typing 'panic'
 immediately instead of trying to get a trace at the ddb prompt first.

I have a suggestion for what is happening, but I'm not sure exactly
what could cause it.

From looking at the stack from the core I have found where the esp
and eip in the second trap frame are comming, from:

  0x,
  0x,
  ... another 100 bytes
  0xc0293748,   return address in pmap_zero_page (calling bzero)
  0xc0801000,   CADDR2
  0x1000,   PAGESIZE
  0xc86dff38,   pushed ebp
  0xc0266b53return address in vm_fault (calling pmap_zero_page)

Now, 108 bytes is exactly the amount that the i586_bzero shifts
the stack by if it uses the floating point registers and it needs
to preserve them. At the end it checks "PCPU(NPXPROC)" again and
if it is zero it doesn't bother popping the 108 bytes off the stack.

Presumably what is happening is i586_bzero begins and finds that
PCPU(NPXPROC) is not zero, so it decides to preserve the fpu
registers. Then something interrupts it, but doesn't restore
PCPU(NPXPROC). When i586_bzero returns it uses the first 8 bytes
of the fpu registers for its intstruction pointer and stack pointer
and goes boom.

I haven't tried to find out what is supposed to save and restore
PCPU(NPXPROC). In my trace the process in question had recently
been interrupted by the ata interrupt.

David.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting backtrace...

2001-03-18 Thread Bruce Evans

On Sun, 18 Mar 2001, David Malone wrote:

 Presumably what is happening is i586_bzero begins and finds that
 PCPU(NPXPROC) is not zero, so it decides to preserve the fpu
 registers. Then something interrupts it, but doesn't restore
 PCPU(NPXPROC). When i586_bzero returns it uses the first 8 bytes
 of the fpu registers for its intstruction pointer and stack pointer
 and goes boom.

i586_bzero is missing the hack of disabling interrupts to prevent
problems with npxproc getting switched, and it uses its own funky
locking which hasn't been regraded for SMPng, so it is quite likely
to be buggy.

K6-2's aren't really i586's and i586_bzero should never be used for
them (generic bzero is faster), but there is apparently another
bug that may cause them to be used.  From des's dmesg output:

 i586_bzero() bandwidth = -1980152482 bytes/sec
   ^
 bzero() bandwidth = 129299198 bytes/sec

i586_bzero gets used because negative bandwidths are significantly
smaller than positive ones, so plain bzero is faster according to this
message, but whatever the overflow apparently causes other bad things.
npx.c already has one "fix" for the overflow problem.  The problem
is may be that clocks don't work early any more.

Similarly for the i586-optimized bcopy and copyin/out (the bandwidth
test for bzero controls them all).

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



RE: Interesting backtrace...

2001-03-18 Thread John Baldwin


On 18-Mar-01 Dag-Erling Smorgrav wrote:
 I finally caught a backtrace from one of those recurring stack smash
 panics. I've been getting a few of these every day for a couple of
 weeks now but never caught a dump; I caught this one by typing 'panic'
 immediately instead of trying to get a trace at the ddb prompt first.
 
 These panics invariably start like this (always the same eip):
 
 kernel: type 12 trap, code=0
 Stopped at  -0xfc81:kernel: type 12 trap, code=0
 db
 
 Anyway, here's the backtrace:


#12 0xc023c8bb in vm_fault (map=0xd0768a00, vaddr=138502144,
 fault_type=2 '\002', fault_flags=8) at ../../vm/vm_page.h:493

pmap_zero_page(VM_PAGE_TO_PHYS(m));

Can you throw some extra tests in there to make sure m isn't NULL?  Also, you
might want to check VM_PAGE_TO_PHYS(m) for any weird values.

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting backtrace...

2001-03-18 Thread Dag-Erling Smorgrav

Bruce Evans [EMAIL PROTECTED] writes:
 K6-2's aren't really i586's and i586_bzero should never be used for
 them (generic bzero is faster),

Wrong. I fixed machdep.c to compute and print the bandwidth correctly:

des@des ~% egrep '(CPU|bzero)' /var/run/dmesg.boot
CPU: AMD-K6(tm) 3D processor (350.80-MHz 586-class CPU)
i586_bzero() bandwidth = 1056759 kBps
bzero() bandwidth = 124211 kBps

 i586_bzero gets used because negative bandwidths are significantly
 smaller than positive ones,

Uh, Bruce, we pick the method that gives the *highest* bandwidth, not
the lowest.

 so plain bzero is faster according to this
 message,

There you go contradicting yourself...

Anyway, the bug is not K6-specific - I guess the reason why we're only
seeing it on K6's is that they're the only 586-class CPUs that are
fast enough to still be in widespread use.

Except I just remembered I have a dual Pentium box I use for SMP work,
but haven't booted in several weeks... because it keeps crashing...
with a smashed stack.

DES
-- 
Dag-Erling Smorgrav - [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting backtrace...

2001-03-18 Thread Dag-Erling Smorgrav

John Baldwin [EMAIL PROTECTED] writes:
 Can you throw some extra tests in there to make sure m isn't NULL?  Also, you
 might want to check VM_PAGE_TO_PHYS(m) for any weird values.

No need - David and Jake already tracked it down to evilness in
i586_bzero().

DES
-- 
Dag-Erling Smorgrav - [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting backtrace...

2001-03-18 Thread Dag-Erling Smorgrav

Dag-Erling Smorgrav [EMAIL PROTECTED] writes:
 Wrong. I fixed machdep.c to compute and print the bandwidth correctly:

I mean npx.c. I'll commit the fix in a second.

DES
-- 
Dag-Erling Smorgrav - [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting backtrace...

2001-03-18 Thread Bruce Evans

On Mon, 19 Mar 2001, Bruce Evans wrote:

 K6-2's aren't really i586's and i586_bzero should never be used for
 them (generic bzero is faster), but there is apparently another
 bug that may cause them to be used.  From des's dmesg output:
 
  i586_bzero() bandwidth = -1980152482 bytes/sec
^
  bzero() bandwidth = 129299198 bytes/sec
 
 i586_bzero gets used because negative bandwidths are significantly
 ^ oops, I meant "should not get used"
 smaller than positive ones, so plain bzero is faster according to this
 message, but whatever the overflow apparently causes other bad things.

The overflow is actually only in the error message.  It is caused by
a preposterous value for `usec'.

 npx.c already has one "fix" for the overflow problem.  The problem
 is may be that clocks don't work early any more.

It must be that microtime() doesn't work early any more.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting backtrace...

2001-03-18 Thread Bruce Evans

On 19 Mar 2001, Dag-Erling Smorgrav wrote:

 Bruce Evans [EMAIL PROTECTED] writes:
  K6-2's aren't really i586's and i586_bzero should never be used for
  them (generic bzero is faster),
 
 Wrong. I fixed machdep.c to compute and print the bandwidth correctly:

Wrong yourself.  The fpu is too slow to use for copying for everything
except original Pentiums.  The bandwidth test is just done to avoid hard-
configuring this knowledge.

 des@des ~% egrep '(CPU|bzero)' /var/run/dmesg.boot
 CPU: AMD-K6(tm) 3D processor (350.80-MHz 586-class CPU)
 i586_bzero() bandwidth = 1056759 kBps
 bzero() bandwidth = 124211 kBps

I don't believe a bandwitdh of 1 GB/sec.  It may be possible if the
buffer fits in an L1 or on-chip L2 cache (the test buffer is a bit small
for today's L2 cache sizes), but then plain bzero() would also benefit
from the cache.

  i586_bzero gets used because negative bandwidths are significantly
  smaller than positive ones,
 
 Uh, Bruce, we pick the method that gives the *highest* bandwidth, not
 the lowest.

Sorry, I meant "should not get used".

 
  so plain bzero is faster according to this
  message,
 
 There you go contradicting yourself...

This part is correct.

 Anyway, the bug is not K6-specific - I guess the reason why we're only
 seeing it on K6's is that they're the only 586-class CPUs that are
 fast enough to still be in widespread use.

The bug in i586_bzero() affects anything that gets that far, but only
original Pentiums should get that far.  Apparently not many people
run -current on those.  I actually turned on ny P5/133 a week ago,
but I didn't notice the bug.

The "bug" in npx.c is not really a bug.  It's just that the printf was
written before %lld was supported in the kernel, so it truncates to
long and uses %ld.  This shouldn't be a problem until the bandwidth
of main memory exceeds 2GB/sec, which won't happen soon (neither will
your apparent 1GB/sec bandwidth).  However, bugs in microtime() sometime
cause the bandwidth to apparently exceed 2GB/sec.  It is a feature that
huge bandwidths sometimes get printed as negative values -- negative
values are more obviously wrong.  This is why %ld is used instead of %lu.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting backtrace...

2001-03-18 Thread Bruce Evans

On 19 Mar 2001, Dag-Erling Smorgrav wrote:

 Dag-Erling Smorgrav [EMAIL PROTECTED] writes:
  Wrong. I fixed machdep.c to compute and print the bandwidth correctly:
 
 I mean npx.c. I'll commit the fix in a second.

Please send it to the maintainer for review.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting backtrace...

2001-03-18 Thread Dag-Erling Smorgrav

Bruce Evans [EMAIL PROTECTED] writes:
 On 19 Mar 2001, Dag-Erling Smorgrav wrote:
  Dag-Erling Smorgrav [EMAIL PROTECTED] writes:
   Wrong. I fixed machdep.c to compute and print the bandwidth correctly:
  I mean npx.c. I'll commit the fix in a second.
 Please send it to the maintainer for review.

I'm not aware of npx.c having a maintainer. The change was OK'ed by
Jake Burkholder and/or John Baldwin on IRC.

DES
-- 
Dag-Erling Smorgrav - [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting backtrace...

2001-03-18 Thread Ilmar S. Habibulin

On 19 Mar 2001, Dag-Erling Smorgrav wrote:

 Anyway, the bug is not K6-specific - I guess the reason why we're only
 seeing it on K6's is that they're the only 586-class CPUs that are
 fast enough to still be in widespread use.
I have the same panics in one of my pentium 166 mmx boxes. Even some
addresses are the same as in your dump. I've posted a message about this
bug a week ago (with subj "double panic" or something like this).



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message