Re: NFS related panic? (was: Re: Killing a zombie process?)

2015-11-03 Thread Rhialto
On Fri 23 Oct 2015 at 00:46:57 +0200, Rhialto wrote:
> This problem is very repeatable, usually within a few hours, just now it
> happened within half an hour.
> 
> It seems to me that somehow the nfs_reqq list gets corrupted. Then
> either there is a crash when traversing it in nfs_timer() (occurring in
> nfs_sigintr() due to being called with a bogus pointer), or there is a
> hang when one of the NFS requests gets lost and never retried.

I tried it with a TCP mount for NFS. Still hangs, this time in a bit
under an hour of uptime.

So the cause is likely not packet loss.

-Olaf.
-- 
___ Olaf 'Rhialto' Seibert  -- The Doctor: No, 'eureka' is Greek for
\X/ rhialto/at/xs4all.nl-- 'this bath is too hot.'


signature.asc
Description: PGP signature


Re: NFS related panic? (was: Re: Killing a zombie process?)

2015-10-22 Thread Rhialto
This problem is very repeatable, usually within a few hours, just now it
happened within half an hour.

It seems to me that somehow the nfs_reqq list gets corrupted. Then
either there is a crash when traversing it in nfs_timer() (occurring in
nfs_sigintr() due to being called with a bogus pointer), or there is a
hang when one of the NFS requests gets lost and never retried.

-Olaf.
-- 
___ Olaf 'Rhialto' Seibert  -- The Doctor: No, 'eureka' is Greek for
\X/ rhialto/at/xs4all.nl-- 'this bath is too hot.'


signature.asc
Description: PGP signature


Re: NFS related panic? (was: Re: Killing a zombie process?)

2015-10-19 Thread Rhialto
On Tue 20 Oct 2015 at 01:04:59 +0200, Rhialto wrote:
> with a rebuilt netbsd.gdb (hopefully the addresses match)
> 
> #5  0x806b94b4 in nfs_sigintr (nmp=0x0, rep=0xfe81163730a8,
> l=0x0) at ../../../../nfs/nfs_socket.c:871

nmp should not be NULL here... let's look at rep, where it comes from
via "nmp = rep->r_nmp;"

(gdb) print *(struct nfsreq *)0xfe81163730a8
$1 = {r_chain = {tqe_next = 0xfe811edcee40, tqe_prev = 0x1}, r_mreq = 
0x828f9888, r_mrep = 0x0, r_md = 0x0, r_dpos = 0x0, r_nmp = 0x0, r_xid 
= 0, r_flags = 0, r_retry = 0, r_rexmit = 0, r_procnum = 0, r_rtt = 0, 
  r_lwp = 0x0}

well, r_chain.tqe_prev looks bogus (unless that's a special marker), so
let's look at tqe_next:

(gdb) print *((struct nfsreq *)0xfe81163730a8)->r_chain.tqe_next
$3 = {r_chain = {tqe_next = 0x0, tqe_prev = 0x15aa3c85d}, r_mreq = 
0xbd83e8af8fe58282, r_mrep = 0x81e39981e3a781e3, r_md = 0xe39d81e38180e38c, 
r_dpos = 0x8890e5b4a0e5ae81, r_nmp = 0xe57baf81e3ab81e3, r_xid = 2179183259, 
  r_flags = -1565268289, r_retry = 0, r_rexmit = 0, r_procnum = 1520683101, 
r_rtt = 1, r_lwp = 0x80e39981e3a781e3}

well, even more bogus. Too bad that the next frame has its argument
optimized out...

-Olaf.
-- 
___ Olaf 'Rhialto' Seibert  -- The Doctor: No, 'eureka' is Greek for
\X/ rhialto/at/xs4all.nl-- 'this bath is too hot.'


signature.asc
Description: PGP signature


Re: NFS related panic? (was: Re: Killing a zombie process?)

2015-10-19 Thread Rhialto
with a rebuilt netbsd.gdb (hopefully the addresses match)

(gdb) target kvm netbsd.5.core
0x8063d735 in cpu_reboot (howto=howto@entry=260,
bootstr=bootstr@entry=0x0) at ../../../../arch/amd64/amd64/machdep.c:671
671 dumpsys();
(gdb) bt
#0  0x8063d735 in cpu_reboot (howto=howto@entry=260,
bootstr=bootstr@entry=0x0) at ../../../../arch/amd64/amd64/machdep.c:671
#1  0x80865182 in vpanic (fmt=0x80d123b2 "trap",
fmt@entry=0x80d123d2 "otection fault",
ap=ap@entry=0xfe80b9fc1d10) at ../../../../kern/subr_prf.c:340
#2  0x8086523d in panic (fmt=fmt@entry=0x80d123d2
"otection fault") at ../../../../kern/subr_prf.c:256
#3  0x808a84d6 in trap (frame=0xfe80b9fc1e30) at
../../../../arch/amd64/amd64/trap.c:298
#4  0x80100f46 in alltraps ()
#5  0x806b94b4 in nfs_sigintr (nmp=0x0, rep=0xfe81163730a8,
l=0x0) at ../../../../nfs/nfs_socket.c:871
#6  0x806b9b0e in nfs_timer (arg=) at
../../../../nfs/nfs_socket.c:752
#7  0x805e9458 in callout_softclock (v=) at
../../../../kern/kern_timeout.c:736
#8  0x805df84a in softint_execute (l=,
s=, si=) at
../../../../kern/kern_softint.c:589
#9  softint_dispatch (pinned=, s=2) at
../../../../kern/kern_softint.c:871
#10 0x8011402f in Xsoftintr ()

(gdb) kvm proc 0xfe813fb39860
nfs_timer (arg=) at ../../../../nfs/nfs_socket.c:735
735 {
(gdb) bt
#0  nfs_timer (arg=) at ../../../../nfs/nfs_socket.c:735
#1  0x in ?? ()

-Olaf.
-- 
___ Olaf 'Rhialto' Seibert  -- The Doctor: No, 'eureka' is Greek for
\X/ rhialto/at/xs4all.nl-- 'this bath is too hot.'


signature.asc
Description: PGP signature


NFS related panic? (was: Re: Killing a zombie process?)

2015-10-19 Thread Rhialto
On Fri 16 Oct 2015 at 16:31:18 +0200, J. Hannken-Illjes wrote:
> On 16 Oct 2015, at 13:44, Rhialto  wrote:
> 
> > "Interesting" results: it built packages overnight (from around 22:30 to
> > 12:13, so for nearly 14 hours), then, when I didn't look, it rebooted.
> 
> With panic?

I re-tried and with a pure GENERIC 7.0 kernel it happened again and now
I have a crash dump. Its dmesg ends with this:

nfs server 10.0.0.16:/mnt/scratch: not responding
nfs server 10.0.0.16:/mnt/scratch: is alive again
fatal page fault in supervisor mode
trap type 6 code 0 rip 806b94b4 cs 8 rflags 10246 cr2 38 ilevel 2 rsp ff
fffe80b9fc1f28
curlwp 0xfe813fb39860 pid 0.5 lowest kstack 0xfe80b9fbf2c0
panic: trap
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x13c
snprintf() at netbsd:snprintf
startlwp() at netbsd:startlwp
alltraps() at netbsd:alltraps+0x96
callout_softclock() at netbsd:callout_softclock+0x248
softint_dispatch() at netbsd:softint_dispatch+0x79
DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe80b9fc1ff0
Xsoftintr() at netbsd:Xsoftintr+0x4f
--- interrupt ---
0:
cpu0: End traceback...

dumping to dev 0,1 (offset=199775, size=1023726):


pid 0.5 is this:

PIDLID S CPU FLAGS   STRUCT LWP *   NAME WAIT
0>   5 7   0   200   fe813fb39860  softclk/0

gdb (without debugging symbols) so far thinks this is in nfs_timer():

(gdb) kvm proc 0xfe813fb39860
0x806b9aab in nfs_timer ()
(gdb) bt
#0  0x806b9aab in nfs_timer ()
#1  0x in ?? ()

-Olaf.
-- 
___ Olaf 'Rhialto' Seibert  -- The Doctor: No, 'eureka' is Greek for
\X/ rhialto/at/xs4all.nl-- 'this bath is too hot.'


signature.asc
Description: PGP signature