Re: uvmfault (7.99.1/amd64)

2014-09-13 Thread Robert Swindells

Thomas Klausner wrote:
>On Sat, Sep 13, 2014 at 09:40:35AM +0100, Robert Swindells wrote:
>> >#8  0x804c3b5e in in_delayed_cksum (m=0x8003393c8000) at 
>> >/archive/foreign/src/sys/netinet/ip_output.c:791
>> >Backtrace stopped: previous frame inner to this frame (corrupt stack?)
>> >
>> >This does not really look like useful information, does it?
>> 
>> Can you tell which protocol family you were using at the time ?
>
>I'm nfs-mounting via wm0:
>wm0: flags=8843 mtu 1500
>capabilities=7ff80
>capabilities=7ff80
>capabilities=7ff80
>enabled=0
>ec_capabilities=7
>ec_enabled=0
>address: ...
>media: Ethernet autoselect (1000baseT 
> full-duplex,flowcontrol,rxpause,txpause)
>status: active
>inet ...
>inet6 ...

I just added a wm card to my main system and it seems solid with all the
offload features turned on, even TSO.

Obviously it doesn't help with finding any problem in the kernel.

>My /etc/fstab has IPv4 addresses for the NFS mounts, like this:
>
>192.168.1.2:/volume1/music  /disk/music nfs 
>intr,nodev,nosuid,rw,soft,tcp
>
>So it should be IPv4 only.

And TCP, I was using UDP over IPv6.

A common factor is writing to NFS though.

>> I was regularly getting a similar crash when using NFS over IPv6, this
>> was with a network controller that only offloads checksumming for IPv4,
>> the in_delayed_cksum() function is where the network stack does the
>> checksum in software.
>> 
>> I confess that the current way that I'm trying to fix it is by
>> switching to a network card with hardware checksumming for both IPv4
>> and IPv6.
>
>From the capabilities cited above, my card already should do that, right?

No, the enabled=0 means they are all turned off.

To turn on the checksumming you can run:

# ifconfig wm0 ip4csum udp4csum tcp4csum udp6csum tcp6csum

Or put the options in you /etc/ifconfig.wm0 file.

Don't do this if you are using bridge(4) on this machine.

Robert Swindells


Re: uvmfault (7.99.1/amd64)

2014-09-13 Thread Thomas Klausner
On Sat, Sep 13, 2014 at 07:57:20AM +0100, Nick Hudson wrote:
> On 09/13/14 07:55, Thomas Klausner wrote:
> >My main machine suddenly hung last night and then rebooted. There was
> >no big load on it at that time. dmesg contains:
> >
> >uvm_fault(0x810157c0, 0x8003393c8000, 1) -> e
> >fatal page fault in supervisor mode
> >trap type 6 code 0 rip 80264fc5 cs 8 rflags 10202 cr2 
> >8003393c8000 ilevel 4 rsp fe813d81d720
> >curlwp 0xfe813dc10aa0 pid 0.143 lowest kstack 0xfe813d81a2c0
> >panic: trap
> >cpu7: Begin traceback...
> >vpanic() at netbsd:vpanic+0x13c
> >snprintf() at netbsd:snprintf
> >startlwp() at netbsd:startlwp
> >cpu7: End traceback...
> >
> >dumping to dev 168,3 (offset=8, size=8373576):
> >dump Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 
> >2005,
> >(new kernel booting messages follow)
> >
> >I did get a core dump, and I do have a kernel with symbols.
> ># gdb netbsd
> >GNU gdb (GDB) 7.7.1
> >Copyright (C) 2014 Free Software Foundation, Inc.
> >License GPLv3+: GNU GPL version 3 or later 
> >This is free software: you are free to change and redistribute it.
> >There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> >and "show warranty" for details.
> >This GDB was configured as "x86_64--netbsd".
> >Type "show configuration" for configuration details.
> >For bug reporting instructions, please see:
> >.
> >Find the GDB manual and other documentation resources online at:
> >.
> >For help, type "help".
> >Type "apropos word" to search for commands related to "word"...
> >Reading symbols from netbsd...done.
> >(gdb) target  kvm netbsd.core
> >0x805b6ac5 in cpu_reboot (howto=howto@entry=260, 
> >bootstr=bootstr@entry=0x0) at 
> >/archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671
> >671 dumpsys();
> >(gdb) bt
> >#0  0x805b6ac5 in cpu_reboot (howto=howto@entry=260, 
> >bootstr=bootstr@entry=0x0) at 
> >/archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671
> >#1  0x807b0ae4 in vpanic (fmt=fmt@entry=0x80c51a95 "trap", 
> >ap=ap@entry=0xfe813d81d510) at 
> >/archive/foreign/src/sys/kern/subr_prf.c:340
> >#2  0x807b0b9f in panic (fmt=fmt@entry=0x80c51a95 "trap") at 
> >/archive/foreign/src/sys/kern/subr_prf.c:256
> >#3  0x807fc037 in trap (frame=0xfe813d81d630) at 
> >/archive/foreign/src/sys/arch/amd64/amd64/trap.c:298
> >#4  0x8010108e in alltraps ()
> >#5  0x80264fc5 in .Mmbuf_inner_loop ()
> >#6  0xfe8692e23400 in ?? ()
> >#7  0xfe813d81d750 in ?? ()
> >#8  0x804c3b5e in in_delayed_cksum (m=0x8003393c8000) at 
> >/archive/foreign/src/sys/netinet/ip_output.c:791
> >Backtrace stopped: previous frame inner to this frame (corrupt stack?)
> >
> >This does not really look like useful information, does it?
> >  Thomas
> >
> >
> Try crash(8). It does a better job of stack traces through traps.

# crash -M netbsd.core -N netbsd 
Crash version 7.99.1, image version 7.99.1.
System panicked: trap
Backtrace from time of crash is available.
crash> bt
_KERNEL_OPT_NVGA_RASTERCONSOLE() at 0
_KERNEL_OPT_IPFILTER_COMPAT() at _KERNEL_OPT_IPFILTER_COMPAT+0x3
vpanic() at vpanic+0x145
snprintf() at snprintf
startlwp() at startlwp
crash> 

That looks weird.
 Thomas


Re: uvmfault (7.99.1/amd64)

2014-09-13 Thread Thomas Klausner
On Sat, Sep 13, 2014 at 09:40:35AM +0100, Robert Swindells wrote:
> >#8  0x804c3b5e in in_delayed_cksum (m=0x8003393c8000) at 
> >/archive/foreign/src/sys/netinet/ip_output.c:791
> >Backtrace stopped: previous frame inner to this frame (corrupt stack?)
> >
> >This does not really look like useful information, does it?
> 
> Can you tell which protocol family you were using at the time ?

I'm nfs-mounting via wm0:
wm0: flags=8843 mtu 1500
capabilities=7ff80
capabilities=7ff80
capabilities=7ff80
enabled=0
ec_capabilities=7
ec_enabled=0
address: ...
media: Ethernet autoselect (1000baseT 
full-duplex,flowcontrol,rxpause,txpause)
status: active
inet ...
inet6 ...

My /etc/fstab has IPv4 addresses for the NFS mounts, like this:

192.168.1.2:/volume1/music  /disk/music nfs 
intr,nodev,nosuid,rw,soft,tcp

So it should be IPv4 only.

> I was regularly getting a similar crash when using NFS over IPv6, this
> was with a network controller that only offloads checksumming for IPv4,
> the in_delayed_cksum() function is where the network stack does the
> checksum in software.
> 
> I confess that the current way that I'm trying to fix it is by
> switching to a network card with hardware checksumming for both IPv4
> and IPv6.

>From the capabilities cited above, my card already should do that, right?
 Thomas


Re: uvmfault (7.99.1/amd64)

2014-09-13 Thread Nick Hudson

On 09/13/14 07:55, Thomas Klausner wrote:

My main machine suddenly hung last night and then rebooted. There was
no big load on it at that time. dmesg contains:

uvm_fault(0x810157c0, 0x8003393c8000, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip 80264fc5 cs 8 rflags 10202 cr2 8003393c8000 
ilevel 4 rsp fe813d81d720
curlwp 0xfe813dc10aa0 pid 0.143 lowest kstack 0xfe813d81a2c0
panic: trap
cpu7: Begin traceback...
vpanic() at netbsd:vpanic+0x13c
snprintf() at netbsd:snprintf
startlwp() at netbsd:startlwp
cpu7: End traceback...

dumping to dev 168,3 (offset=8, size=8373576):
dump Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
(new kernel booting messages follow)

I did get a core dump, and I do have a kernel with symbols.
# gdb netbsd
GNU gdb (GDB) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64--netbsd".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from netbsd...done.
(gdb) target  kvm netbsd.core
0x805b6ac5 in cpu_reboot (howto=howto@entry=260, 
bootstr=bootstr@entry=0x0) at 
/archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671
671 dumpsys();
(gdb) bt
#0  0x805b6ac5 in cpu_reboot (howto=howto@entry=260, 
bootstr=bootstr@entry=0x0) at 
/archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671
#1  0x807b0ae4 in vpanic (fmt=fmt@entry=0x80c51a95 "trap", 
ap=ap@entry=0xfe813d81d510) at /archive/foreign/src/sys/kern/subr_prf.c:340
#2  0x807b0b9f in panic (fmt=fmt@entry=0x80c51a95 "trap") at 
/archive/foreign/src/sys/kern/subr_prf.c:256
#3  0x807fc037 in trap (frame=0xfe813d81d630) at 
/archive/foreign/src/sys/arch/amd64/amd64/trap.c:298
#4  0x8010108e in alltraps ()
#5  0x80264fc5 in .Mmbuf_inner_loop ()
#6  0xfe8692e23400 in ?? ()
#7  0xfe813d81d750 in ?? ()
#8  0x804c3b5e in in_delayed_cksum (m=0x8003393c8000) at 
/archive/foreign/src/sys/netinet/ip_output.c:791
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

This does not really look like useful information, does it?
  Thomas



Try crash(8). It does a better job of stack traces through traps.

NIck


Re: uvmfault (7.99.1/amd64)

2014-09-13 Thread Robert Swindells

Thomas Klausner wrote:
>My main machine suddenly hung last night and then rebooted. There was
>no big load on it at that time. dmesg contains:

[snip]

>#8  0x804c3b5e in in_delayed_cksum (m=0x8003393c8000) at 
>/archive/foreign/src/sys/netinet/ip_output.c:791
>Backtrace stopped: previous frame inner to this frame (corrupt stack?)
>
>This does not really look like useful information, does it?

Can you tell which protocol family you were using at the time ?

I was regularly getting a similar crash when using NFS over IPv6, this
was with a network controller that only offloads checksumming for IPv4,
the in_delayed_cksum() function is where the network stack does the
checksum in software.

I confess that the current way that I'm trying to fix it is by
switching to a network card with hardware checksumming for both IPv4
and IPv6.

Robert Swindells


uvmfault (7.99.1/amd64)

2014-09-12 Thread Thomas Klausner
My main machine suddenly hung last night and then rebooted. There was
no big load on it at that time. dmesg contains:

uvm_fault(0x810157c0, 0x8003393c8000, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip 80264fc5 cs 8 rflags 10202 cr2 8003393c8000 
ilevel 4 rsp fe813d81d720
curlwp 0xfe813dc10aa0 pid 0.143 lowest kstack 0xfe813d81a2c0
panic: trap
cpu7: Begin traceback...
vpanic() at netbsd:vpanic+0x13c
snprintf() at netbsd:snprintf
startlwp() at netbsd:startlwp
cpu7: End traceback...

dumping to dev 168,3 (offset=8, size=8373576):
dump Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
(new kernel booting messages follow)

I did get a core dump, and I do have a kernel with symbols.
# gdb netbsd
GNU gdb (GDB) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64--netbsd".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from netbsd...done.
(gdb) target  kvm netbsd.core
0x805b6ac5 in cpu_reboot (howto=howto@entry=260, 
bootstr=bootstr@entry=0x0) at 
/archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671
671 dumpsys();
(gdb) bt
#0  0x805b6ac5 in cpu_reboot (howto=howto@entry=260, 
bootstr=bootstr@entry=0x0) at 
/archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671
#1  0x807b0ae4 in vpanic (fmt=fmt@entry=0x80c51a95 "trap", 
ap=ap@entry=0xfe813d81d510) at /archive/foreign/src/sys/kern/subr_prf.c:340
#2  0x807b0b9f in panic (fmt=fmt@entry=0x80c51a95 "trap") at 
/archive/foreign/src/sys/kern/subr_prf.c:256
#3  0x807fc037 in trap (frame=0xfe813d81d630) at 
/archive/foreign/src/sys/arch/amd64/amd64/trap.c:298
#4  0x8010108e in alltraps ()
#5  0x80264fc5 in .Mmbuf_inner_loop ()
#6  0xfe8692e23400 in ?? ()
#7  0xfe813d81d750 in ?? ()
#8  0x804c3b5e in in_delayed_cksum (m=0x8003393c8000) at 
/archive/foreign/src/sys/netinet/ip_output.c:791
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

This does not really look like useful information, does it?
 Thomas