Re: panic: in_pcblookup_local (?)

2013-05-04 Thread Ian FREISLICH
Peter Wemm wrote:
   offending revision.
 
  I've started a binary search.  I'll let you know what that turns up.
 
  Thanks, and sorry for getting my Ian's mixed up. :-/
 
  --
  John Baldwin
 
 There's been two separate machines, at least twice each on this exact
 panic / trace.  Always with doing a 'svn update'.
 
 Rolling back to April 5th 249172 solves it.  (There's nothing
 particular about that rev, except it was top-of-tree when the last
 update was done).
 
 I see a number locking changes in the area.  Note that this is UDP,
 most likely a dns lookup.

I'll work to confirm this here.  I was a little slow in bisecting
because I spent 2 days trying to figure out what revision caused
PF to rapidly expire its entire state table which prevented testing
this condition.

Ian

-- 
Ian Freislich
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: panic: in_pcblookup_local (?)

2013-05-03 Thread Peter Wemm
On Thu, May 2, 2013 at 11:32 AM, John Baldwin j...@freebsd.org wrote:
 On Thursday, May 02, 2013 1:53:47 pm Ian FREISLICH wrote:
 John Baldwin wrote:
  On Thursday, May 02, 2013 7:25:08 am Robert N. M. Watson wrote:
  
   On 2 May 2013, at 11:42, Glen Barber wrote:
  
Hmm.  Perhaps it would be worthwhile for me to rebuild the current
kernel with DDB support.  It looks like the machine has panicked a few
times over the last two weeks or so, but based on the timestamps of the
crash dumps and nagios complaints, happened during the middle of the
night when I would not have really noticed, or otherwise would have 
just
blamed my ISP.
   
Two of the panics are ath(4) related.  One looks similar to the one
referenced in this thread, similarly triggered by a CFEngine process.
   
In that case, the backtrace looks like:
   
#4 0x808cdbb3 at calltrap+0x8
#5 0x807371d8 at in_pcb_lport+0x128
#6 0x8073745a at in_pcbbind_setup+0x16a
#7 0x80737d8e at in_pcbconnect_setup+0x71e
#8 0x80737df9 at in_pcbconnect_mbuf+0x59
#9 0x807bf29f at udp_connect+0x11f
#10 0x80680615 at kern_connectat+0x275
   
Regarding DDB though, it would be rather difficult to access the 
machine
if it drops to a DDB debugger session, since the machine acts as my
firewall.
  
   Thanks -- will take a look at the attached.
  
   FWIW, though, I'm worried by the number of panics you are seeing, 
   especiall
 y
  given that they involve multiple subsystems, and in particular, John's
  observation about a potentially corrupted pointer. This makes me wonder
  whether (a) you are experiencing hardware faults -- it would be worth 
  running

  some memory/cpu/etc tests and (b) if we might be seeing a software memory
  corruption bug of some sort.
 
  Other users have reported this (Ian Lepore), and Peter Wemm can now 
  reproduce
  these at will as well, so I think this is a software bug.  What might be
  easiest if we can't figure this out from the crashdump is just to bisect 
  the
  offending revision.

 I've started a binary search.  I'll let you know what that turns up.

 Thanks, and sorry for getting my Ian's mixed up. :-/

 --
 John Baldwin

I forgot to roll back one of the routers at nyi.freebsd.org and it
paniced again, the same way as before:

Fatal trap 9: general protection fault while in kernel mode^M
cpuid = 3; apic id = 03^M
instruction pointer = 0x20:0x8067284c^M
stack pointer   = 0x28:0xff8098688760^M
frame pointer   = 0x28:0xff80986887a0^M
code segment= base 0x0, limit 0xf, type 0x1b^M
= DPL 0, pres 1, long 1, def32 0, gran 1^M
processor eflags= interrupt enabled, resume, IOPL = 0^M
current process = 15041 (svn)^M
[ thread pid 15041 tid 100208 ]^M
Stopped at  in_pcblookup_local+0x5c:cmpw%r12w,0x18(%rax)^M


#8  0x80829dff in calltrap () at ../../../amd64/amd64/exception.S:228
#9  0x8067284c in in_pcblookup_local (pcbinfo=0x80c9e180, laddr=
  {s_addr = 708980576}, lport=607, lookupflags=1, cred=0xfe006956d700)
at ../../../netinet/in_pcb.c:1438
#10 0x80672d38 in in_pcb_lport (inp=0xfe00098aa620,
laddrp=0xff809845d860, lportp=0xff809845d86e,
cred=0xfe006956d700,
lookupflags=1) at ../../../netinet/in_pcb.c:457
#11 0x80672fba in in_pcbbind_setup (inp=0xfe00098aa620, nam=0x0,
laddrp=0xff809845d900, lportp=0xff809845d90e,
cred=0xfe006956d700)
at ../../../netinet/in_pcb.c:615
#12 0x806738ee in in_pcbconnect_setup (inp=0xfe00098aa620,
nam=value optimized out, laddrp=0xff809845d9b8,
lportp=0xff809845d9be,
faddrp=0xff809845d9b4, fportp=0xff809845d9bc, oinpp=0x0,
cred=0xfe006956d700) at ../../../netinet/in_pcb.c:1019
#13 0x80673959 in in_pcbconnect_mbuf (inp=0xfe00098aa620,
nam=value optimized out, cred=value optimized out, m=0x0)
at ../../../netinet/in_pcb.c:645
#14 0x806fafcf in udp_connect (so=0xfe002e150d48,
nam=0xfe00264df3b0,
td=0xfe00091df490) at ../../../netinet/udp_usrreq.c:1530
#15 0x805faea5 in kern_connectat (td=0xfe00091df490, dirfd=-100,
fd=value optimized out, sa=0xfe00264df3b0) at
../../../kern/uipc_syscalls.c:593
#16 0x805fafc1 in sys_connect (td=0xfe00091df490,
uap=0xff809845db70)
at ../../../kern/uipc_syscalls.c:559
#17 0x8083f571 in amd64_syscall (td=0xfe00091df490, traced=0)
at subr_syscall.c:134

There's been two separate machines, at least twice each on this exact
panic / trace.  Always with doing a 'svn update'.

Rolling back to April 5th 249172 solves it.  (There's nothing
particular about that rev, except it was top-of-tree when the last
update was done).

I see a number locking changes in the area.  Note that this is UDP,
most likely a dns 

Re: panic: in_pcblookup_local (?)

2013-05-02 Thread Robert N. M. Watson

On 2 May 2013, at 01:57, Glen Barber wrote:

 So, I am admittedly not too familiar with DDB.  In fact, I just now
 realize the kernel is built without DDB...

DDB is a very powerful tool in that it's been custom-developed to help debug 
common kernel panics. It lacks some of the flexibility, and especially the 
data-type awareness of GDB, but GDB is a less well-suited tool when 
investigating common crash patterns. I'll usually start out debugging in DDB, 
and find that 90% of my in-development panics can be debugged with it, 
resorting to GDB for post-mortem analyses in production or particularly hard 
debugging cases (usually where DDB's pretty printers for data types fall 
short). I've wanted, for a long time, to teach DDB how to pretty-print 
arbitrary types using DTrace's CTF meta-data, which would address the most 
significant major case where I turn to GDB. Mind you, the limitations I see in 
GDB are made up for in most part by John's GDB scripts :-).

 Put those in a dir and do 'source gdb6'.  You can then run 'ps' to get a 
 good 
 ps listing that includes threads.  You can also use 'thread apply all bt' to 
 get stacktraces of all threads in kgdb.  I believe there is an 'allpcpu' 
 command that is similar to 'show allpcpu' in DDB.
 
 I have the outputs of 'ps', 'allpcpu', and 'thread apply all bt' saved
 to separate script(1) files.  Is there anything in particular I can look
 for before uploading the files somewhere public?  At quick-ish look
 though, I did not see anything cf-agent (the current process at time of
 panic) related.

To be honest, it's probably easiest if I just take a look at it and see what I 
see. In as much as I find interesting things, I'll follow up explaining what 
they are. We may find we can't track this problem down from the data we have -- 
but it's worth a try.

Robert
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: panic: in_pcblookup_local (?)

2013-05-02 Thread Glen Barber
On Thu, May 02, 2013 at 10:27:39AM +0100, Robert N. M. Watson wrote:
 
 On 2 May 2013, at 01:57, Glen Barber wrote:
 
  So, I am admittedly not too familiar with DDB.  In fact, I just now
  realize the kernel is built without DDB...
 
 DDB is a very powerful tool in that it's been custom-developed
 to help debug common kernel panics. It lacks some of the flexibility,
 and especially the data-type awareness of GDB, but GDB is a less
 well-suited tool when investigating common crash patterns. I'll
 usually start out debugging in DDB, and find that 90% of my
 in-development panics can be debugged with it, resorting to GDB for
 post-mortem analyses in production or particularly hard debugging
 cases (usually where DDB's pretty printers for data types fall
 short). I've wanted, for a long time, to teach DDB how to pretty-print
 arbitrary types using DTrace's CTF meta-data, which would address
 the most significant major case where I turn to GDB. Mind you, the
 limitations I see in GDB are made up for in most part by John's GDB
 scripts :-).
 

Hmm.  Perhaps it would be worthwhile for me to rebuild the current
kernel with DDB support.  It looks like the machine has panicked a few
times over the last two weeks or so, but based on the timestamps of the
crash dumps and nagios complaints, happened during the middle of the
night when I would not have really noticed, or otherwise would have just
blamed my ISP.

Two of the panics are ath(4) related.  One looks similar to the one
referenced in this thread, similarly triggered by a CFEngine process.

In that case, the backtrace looks like:

#4 0x808cdbb3 at calltrap+0x8
#5 0x807371d8 at in_pcb_lport+0x128
#6 0x8073745a at in_pcbbind_setup+0x16a
#7 0x80737d8e at in_pcbconnect_setup+0x71e
#8 0x80737df9 at in_pcbconnect_mbuf+0x59
#9 0x807bf29f at udp_connect+0x11f
#10 0x80680615 at kern_connectat+0x275

Regarding DDB though, it would be rather difficult to access the machine
if it drops to a DDB debugger session, since the machine acts as my
firewall.

  Put those in a dir and do 'source gdb6'.  You can then run 'ps' to get a 
  good 
  ps listing that includes threads.  You can also use 'thread apply all bt' 
  to 
  get stacktraces of all threads in kgdb.  I believe there is an 'allpcpu' 
  command that is similar to 'show allpcpu' in DDB.
  
  I have the outputs of 'ps', 'allpcpu', and 'thread apply all bt' saved
  to separate script(1) files.  Is there anything in particular I can look
  for before uploading the files somewhere public?  At quick-ish look
  though, I did not see anything cf-agent (the current process at time of
  panic) related.
 
 To be honest, it's probably easiest if I just take a look at it
 and see what I see. In as much as I find interesting things, I'll
 follow up explaining what they are. We may find we can't track this
 problem down from the data we have -- but it's worth a try.
 

Sure.  The files are available here:

https://www.glenbarber.us/stuff/in_pcblookup_local/vmcore.4.ps.txt
https://www.glenbarber.us/stuff/in_pcblookup_local/vmcore.4.allpcpu.txt

https://www.glenbarber.us/stuff/in_pcblookup_local/vmcore.4.thread_apply_all_bt.txt

Thanks to both of you for looking into this.

Glen



pgpAoAdObxR0p.pgp
Description: PGP signature


Re: panic: in_pcblookup_local (?)

2013-05-02 Thread Robert N. M. Watson

On 2 May 2013, at 11:42, Glen Barber wrote:

 Hmm.  Perhaps it would be worthwhile for me to rebuild the current
 kernel with DDB support.  It looks like the machine has panicked a few
 times over the last two weeks or so, but based on the timestamps of the
 crash dumps and nagios complaints, happened during the middle of the
 night when I would not have really noticed, or otherwise would have just
 blamed my ISP.
 
 Two of the panics are ath(4) related.  One looks similar to the one
 referenced in this thread, similarly triggered by a CFEngine process.
 
 In that case, the backtrace looks like:
 
 #4 0x808cdbb3 at calltrap+0x8
 #5 0x807371d8 at in_pcb_lport+0x128
 #6 0x8073745a at in_pcbbind_setup+0x16a
 #7 0x80737d8e at in_pcbconnect_setup+0x71e
 #8 0x80737df9 at in_pcbconnect_mbuf+0x59
 #9 0x807bf29f at udp_connect+0x11f
 #10 0x80680615 at kern_connectat+0x275
 
 Regarding DDB though, it would be rather difficult to access the machine
 if it drops to a DDB debugger session, since the machine acts as my
 firewall.

Thanks -- will take a look at the attached.

FWIW, though, I'm worried by the number of panics you are seeing, especially 
given that they involve multiple subsystems, and in particular, John's 
observation about a potentially corrupted pointer. This makes me wonder whether 
(a) you are experiencing hardware faults -- it would be worth running some 
memory/cpu/etc tests and (b) if we might be seeing a software memory corruption 
bug of some sort.

Robert
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: panic: in_pcblookup_local (?)

2013-05-02 Thread Glen Barber
On Thu, May 02, 2013 at 12:25:08PM +0100, Robert N. M. Watson wrote:
 
 On 2 May 2013, at 11:42, Glen Barber wrote:
 
  Hmm.  Perhaps it would be worthwhile for me to rebuild the current
  kernel with DDB support.  It looks like the machine has panicked a few
  times over the last two weeks or so, but based on the timestamps of the
  crash dumps and nagios complaints, happened during the middle of the
  night when I would not have really noticed, or otherwise would have just
  blamed my ISP.
  
  Two of the panics are ath(4) related.  One looks similar to the one
  referenced in this thread, similarly triggered by a CFEngine process.
  
  In that case, the backtrace looks like:
  
  #4 0x808cdbb3 at calltrap+0x8
  #5 0x807371d8 at in_pcb_lport+0x128
  #6 0x8073745a at in_pcbbind_setup+0x16a
  #7 0x80737d8e at in_pcbconnect_setup+0x71e
  #8 0x80737df9 at in_pcbconnect_mbuf+0x59
  #9 0x807bf29f at udp_connect+0x11f
  #10 0x80680615 at kern_connectat+0x275
  
  Regarding DDB though, it would be rather difficult to access the machine
  if it drops to a DDB debugger session, since the machine acts as my
  firewall.
 
 Thanks -- will take a look at the attached.
 
 FWIW, though, I'm worried by the number of panics you are seeing,
 especially given that they involve multiple subsystems, and in
 particular, John's observation about a potentially corrupted pointer.
 This makes me wonder whether (a) you are experiencing hardware
 faults -- it would be worth running some memory/cpu/etc tests and
 (b) if we might be seeing a software memory corruption bug of some
 sort.
 

I will run memtest this weekend, once I move some wires around so I do
not lose internet access entirely.  I'll run some stress tests that do
not require the machine to be offline in the meantime.

I certainly won't discount hardware issue being the cause.

For what it is worth, I just looked through my svn commit logs for that
machine's configuration, and the only relatively recent change that was
made was enabling powerd(8) - but that was about 3 months ago.

Glen



pgpJKiZ5dteTa.pgp
Description: PGP signature


Re: panic: in_pcblookup_local (?)

2013-05-02 Thread John Baldwin
On Thursday, May 02, 2013 5:27:39 am Robert N. M. Watson wrote:
 
 On 2 May 2013, at 01:57, Glen Barber wrote:
 
  So, I am admittedly not too familiar with DDB.  In fact, I just now
  realize the kernel is built without DDB...
 
 DDB is a very powerful tool in that it's been custom-developed to help debug 
 common kernel panics. It lacks some of the flexibility, and especially the 
 data-type awareness 
of GDB, but GDB is a less well-suited tool when investigating common crash 
patterns. I'll usually start out debugging in DDB, and find that 90% of my 
in-development panics 
can be debugged with it, resorting to GDB for post-mortem analyses in 
production or particularly hard debugging cases (usually where DDB's pretty 
printers for data types fall 
short). I've wanted, for a long time, to teach DDB how to pretty-print 
arbitrary types using DTrace's CTF meta-data, which would address the most 
significant major case where 
I turn to GDB. Mind you, the limitations I see in GDB are made up for in most 
part by John's GDB scripts :-).

Heh, I prefer DDB for active development as well, but after being forced to
work in an environment where I had to largely do post-mortem analysis, I had
to get a gdb environment that was close to as functional.  Also, using kgdb
on a live system to obtain info is less invasive than ddb (doesn't halt the
system), and you can easily add new scripts to generate useful reports without
having to recompile or reboot.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: panic: in_pcblookup_local (?)

2013-05-02 Thread John Baldwin
On Thursday, May 02, 2013 7:25:08 am Robert N. M. Watson wrote:
 
 On 2 May 2013, at 11:42, Glen Barber wrote:
 
  Hmm.  Perhaps it would be worthwhile for me to rebuild the current
  kernel with DDB support.  It looks like the machine has panicked a few
  times over the last two weeks or so, but based on the timestamps of the
  crash dumps and nagios complaints, happened during the middle of the
  night when I would not have really noticed, or otherwise would have just
  blamed my ISP.
  
  Two of the panics are ath(4) related.  One looks similar to the one
  referenced in this thread, similarly triggered by a CFEngine process.
  
  In that case, the backtrace looks like:
  
  #4 0x808cdbb3 at calltrap+0x8
  #5 0x807371d8 at in_pcb_lport+0x128
  #6 0x8073745a at in_pcbbind_setup+0x16a
  #7 0x80737d8e at in_pcbconnect_setup+0x71e
  #8 0x80737df9 at in_pcbconnect_mbuf+0x59
  #9 0x807bf29f at udp_connect+0x11f
  #10 0x80680615 at kern_connectat+0x275
  
  Regarding DDB though, it would be rather difficult to access the machine
  if it drops to a DDB debugger session, since the machine acts as my
  firewall.
 
 Thanks -- will take a look at the attached.
 
 FWIW, though, I'm worried by the number of panics you are seeing, especially 
given that they involve multiple subsystems, and in particular, John's 
observation about a potentially corrupted pointer. This makes me wonder 
whether (a) you are experiencing hardware faults -- it would be worth running 
some memory/cpu/etc tests and (b) if we might be seeing a software memory 
corruption bug of some sort.

Other users have reported this (Ian Lepore), and Peter Wemm can now reproduce
these at will as well, so I think this is a software bug.  What might be 
easiest if we can't figure this out from the crashdump is just to bisect the
offending revision.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: panic: in_pcblookup_local (?)

2013-05-02 Thread Ian FREISLICH
John Baldwin wrote:
 On Thursday, May 02, 2013 7:25:08 am Robert N. M. Watson wrote:
  
  On 2 May 2013, at 11:42, Glen Barber wrote:
  
   Hmm.  Perhaps it would be worthwhile for me to rebuild the current
   kernel with DDB support.  It looks like the machine has panicked a few
   times over the last two weeks or so, but based on the timestamps of the
   crash dumps and nagios complaints, happened during the middle of the
   night when I would not have really noticed, or otherwise would have just
   blamed my ISP.
   
   Two of the panics are ath(4) related.  One looks similar to the one
   referenced in this thread, similarly triggered by a CFEngine process.
   
   In that case, the backtrace looks like:
   
   #4 0x808cdbb3 at calltrap+0x8
   #5 0x807371d8 at in_pcb_lport+0x128
   #6 0x8073745a at in_pcbbind_setup+0x16a
   #7 0x80737d8e at in_pcbconnect_setup+0x71e
   #8 0x80737df9 at in_pcbconnect_mbuf+0x59
   #9 0x807bf29f at udp_connect+0x11f
   #10 0x80680615 at kern_connectat+0x275
   
   Regarding DDB though, it would be rather difficult to access the machine
   if it drops to a DDB debugger session, since the machine acts as my
   firewall.
  
  Thanks -- will take a look at the attached.
  
  FWIW, though, I'm worried by the number of panics you are seeing, especiall
y 
 given that they involve multiple subsystems, and in particular, John's 
 observation about a potentially corrupted pointer. This makes me wonder 
 whether (a) you are experiencing hardware faults -- it would be worth running
 
 some memory/cpu/etc tests and (b) if we might be seeing a software memory 
 corruption bug of some sort.
 
 Other users have reported this (Ian Lepore), and Peter Wemm can now reproduce
 these at will as well, so I think this is a software bug.  What might be 
 easiest if we can't figure this out from the crashdump is just to bisect the
 offending revision.

I've started a binary search.  I'll let you know what that turns up.

Ian

-- 
Ian Freislich
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: panic: in_pcblookup_local (?)

2013-05-02 Thread John Baldwin
On Thursday, May 02, 2013 1:53:47 pm Ian FREISLICH wrote:
 John Baldwin wrote:
  On Thursday, May 02, 2013 7:25:08 am Robert N. M. Watson wrote:
   
   On 2 May 2013, at 11:42, Glen Barber wrote:
   
Hmm.  Perhaps it would be worthwhile for me to rebuild the current
kernel with DDB support.  It looks like the machine has panicked a few
times over the last two weeks or so, but based on the timestamps of the
crash dumps and nagios complaints, happened during the middle of the
night when I would not have really noticed, or otherwise would have just
blamed my ISP.

Two of the panics are ath(4) related.  One looks similar to the one
referenced in this thread, similarly triggered by a CFEngine process.

In that case, the backtrace looks like:

#4 0x808cdbb3 at calltrap+0x8
#5 0x807371d8 at in_pcb_lport+0x128
#6 0x8073745a at in_pcbbind_setup+0x16a
#7 0x80737d8e at in_pcbconnect_setup+0x71e
#8 0x80737df9 at in_pcbconnect_mbuf+0x59
#9 0x807bf29f at udp_connect+0x11f
#10 0x80680615 at kern_connectat+0x275

Regarding DDB though, it would be rather difficult to access the machine
if it drops to a DDB debugger session, since the machine acts as my
firewall.
   
   Thanks -- will take a look at the attached.
   
   FWIW, though, I'm worried by the number of panics you are seeing, 
   especiall
 y 
  given that they involve multiple subsystems, and in particular, John's 
  observation about a potentially corrupted pointer. This makes me wonder 
  whether (a) you are experiencing hardware faults -- it would be worth 
  running
  
  some memory/cpu/etc tests and (b) if we might be seeing a software memory 
  corruption bug of some sort.
  
  Other users have reported this (Ian Lepore), and Peter Wemm can now 
  reproduce
  these at will as well, so I think this is a software bug.  What might be 
  easiest if we can't figure this out from the crashdump is just to bisect the
  offending revision.
 
 I've started a binary search.  I'll let you know what that turns up.

Thanks, and sorry for getting my Ian's mixed up. :-/

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: panic: in_pcblookup_local (?)

2013-05-01 Thread John Baldwin
On Tuesday, April 30, 2013 5:19:08 pm Glen Barber wrote:
 On Tue, Apr 30, 2013 at 04:53:13PM -0400, John Baldwin wrote:
  Try 'p phd' to start.  INP_PCBPORTHASH is a macro, so you will
  have to do it by hand:
  
  'p pcbinfo-ipi_porthashbase[lport  pcbinfo-ipi_porthashmask]'
  
  (That should be what 'porthash' is.)
  
 
 Thanks for the pointers.  (Hah!)
 
 Hopefully this is the info you are looking for:
 
 Script started on Tue Apr 30 17:16:07 2013
 root@orion:/usr/obj/usr/src/sys/ORION # kgdb ./kernel.debug 
/var/crash/vmcore.4
 [...]
 #0  doadump (textdump=value optimized out) at pcpu.h:231
 231   __asm(movq %%gs:%1,%0 : =r (td)
 (kgdb) frame 6
 #6  0x80736cec in in_pcblookup_local (pcbinfo=0x80dc9180, 
laddr=
   {s_addr = 50374848}, lport=339, lookupflags=1, 
cred=0xfe016cdad100)
 at /usr/src/sys/netinet/in_pcb.c:1438
 1438  LIST_FOREACH(phd, porthash, phd_hash) {
 (kgdb) p phd
 $1 = (struct inpcbport *) 0x9e17b100fe00

That is odd, that looks word-swapped, as if it should be
0xfe009e17b100 (which would be a more normal pointer in the kernel on 
amd64).

 (kgdb) p pcbinfo-ipi_porthashbase[lport  pcbinfo-ipi_porthashmask]
 $2 = {lh_first = 0x0}

So the list is now empty. :(

This feels like the list was updated out from under the pcbinfo.  Looking at
your earlier e-mail:

(kgdb) p *pcbinfo
$1 = {ipi_lock = {lock_object = {lo_name = 0x809d4d82 udp, lo_flags 
= 69926912, 
  lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, ipi_listhead = 
0x80dc9108, 
  ipi_count = 28, ipi_gencnt = 535501, ipi_lastport = 21249, ipi_lastlow = 0, 
  ipi_lasthi = 0, ipi_zone = 0xfe0017b60380, ipi_pcbgroups = 0x0, 
ipi_npcbgroups = 0, 
  ipi_hashfields = 0, ipi_hash_lock = {lock_object = {
  lo_name = 0x80a03d80 pcbinfohash, lo_flags = 69402624, lo_data 
= 0, 
  lo_witness = 0x0}, rw_lock = 18446741877615517696}, ipi_hashbase = 
0xfe00120f6000, 
  ipi_hashmask = 127, ipi_porthashbase = 0xfe00120f5c04, ipi_porthashmask 
= 127, 
  ipi_wildbase = 0x0, ipi_wildmask = 0, ipi_vnet = 0x0, ipi_pspare = {0x0, 
0x0}}

It looks like the ipi_hash_lock is locked (and udp_connect() locks it), so I 
think the offending code is somewhere else.  Also, I can't find anything that
removes an inp without hold the correct pcbinfo lock.  Only thing I can think
of is if the pcbinfo pointer for an inp could change, so we could maybe
lock the wrong one while removing it?

Hmm, you know.  In in_pcbremlists() and in_pcbdrop(), we read inp_phd 
without holding the hash lock. I think that probably don't actaully break
anything, but this feels like a locking issue of some sort.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: panic: in_pcblookup_local (?)

2013-05-01 Thread Robert N. M. Watson

On 1 May 2013, at 16:56, John Baldwin wrote:

 It looks like the ipi_hash_lock is locked (and udp_connect() locks it), so I 
 think the offending code is somewhere else.  Also, I can't find anything that
 removes an inp without hold the correct pcbinfo lock.  Only thing I can think
 of is if the pcbinfo pointer for an inp could change, so we could maybe
 lock the wrong one while removing it?
 
 Hmm, you know.  In in_pcbremlists() and in_pcbdrop(), we read inp_phd 
 without holding the hash lock. I think that probably don't actaully break
 anything, but this feels like a locking issue of some sort.

I'll need to catch up on this thread later, but a few questions:

Do we know if the application in question is multithreaded, and if so, might it 
be attempting concurrent operations on this socket?

The corrupted pointer is worrying ... but interesting, and suggests something 
else is going on here -- stack corruption earlier in the system call, perhaps?

In general, to modify our various hash lists you must lock both the inpcb and 
the list. It's therefore sufficient to hold either lock to read, so reading 
inp_phd should be OK with the inpcb lock held, even without the hash lock held.

Do we have a dump of *inp, and if so, can we confirm that the inpcb is still 
properly referenced, if there is an associated socket, likewise a dump of 
*inp-inp_socket to check things are properly referenced there?

Robert
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: panic: in_pcblookup_local (?)

2013-05-01 Thread Glen Barber
On Wed, May 01, 2013 at 06:45:53PM +0100, Robert N. M. Watson wrote:
 
 On 1 May 2013, at 16:56, John Baldwin wrote:
 
  It looks like the ipi_hash_lock is locked (and udp_connect() locks it), so 
  I 
  think the offending code is somewhere else.  Also, I can't find anything 
  that
  removes an inp without hold the correct pcbinfo lock.  Only thing I can 
  think
  of is if the pcbinfo pointer for an inp could change, so we could maybe
  lock the wrong one while removing it?
  
  Hmm, you know.  In in_pcbremlists() and in_pcbdrop(), we read inp_phd 
  without holding the hash lock. I think that probably don't actaully break
  anything, but this feels like a locking issue of some sort.
 
 I'll need to catch up on this thread later, but a few questions:
 
 Do we know if the application in question is multithreaded, and
 if so, might it be attempting concurrent operations on this socket?

I do not know if zabbix-agent is multithreaded, but cf-agent is.

 The corrupted pointer is worrying ... but interesting, and suggests
 something else is going on here -- stack corruption earlier in the
 system call, perhaps?
 
 In general, to modify our various hash lists you must lock both
 the inpcb and the list. It's therefore sufficient to hold either
 lock to read, so reading inp_phd should be OK with the inpcb lock
 held, even without the hash lock held.
 
 Do we have a dump of *inp, and if so, can we confirm that the
 inpcb is still properly referenced, if there is an associated socket,
 likewise a dump of *inp-inp_socket to check things are properly
 referenced there?
 

I will follow up with this information as soon as possible.

Glen



pgpKPwolYUmX7.pgp
Description: PGP signature


Re: panic: in_pcblookup_local (?)

2013-05-01 Thread Robert N. M. Watson

On 1 May 2013, at 19:03, Glen Barber wrote:

 I'll need to catch up on this thread later, but a few questions:
 
 Do we know if the application in question is multithreaded, and
 if so, might it be attempting concurrent operations on this socket?
 
 I do not know if zabbix-agent is multithreaded, but cf-agent is.

If in DDB, it would be useful to do a ps so we can identify threads in the 
process, and in particular, whether they might be in the kernel around the 
moment of the panic.

 I will follow up with this information as soon as possible.

Thanks. Do keep around as much information as you can from DDB, crashdumps, 
etc. A useful set of things to keep from DDB includes the initial panic 
information and trap frame, show pcpu, show allpcpu, trace, alltrace, 
ps, and if WITNESS is compiled in, show locks and show alllocks. On busy 
systems, all the backtraces add up to a lot of space, so you might hold onto 
that rather than e-mail it, but contain useful information. Often, debugging 
this sort of race condition involves looking at what other network-centred 
threads are doing -- e.g., device-driver ithreads, netisr, other involved user 
threads. You may be able to extract much of that information using ps on the 
crashdump (not sure if procstat is there yet for crashdumps) -- if so, be sure 
to use -H (or whatever the argument is to print thread, not just process, 
information).

Off to a formal dinner, but back later!

Robert
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: panic: in_pcblookup_local (?)

2013-05-01 Thread John Baldwin
On Wednesday, May 01, 2013 2:08:57 pm Robert N. M. Watson wrote:
 
 On 1 May 2013, at 19:03, Glen Barber wrote:
 
  I'll need to catch up on this thread later, but a few questions:
  
  Do we know if the application in question is multithreaded, and
  if so, might it be attempting concurrent operations on this socket?
  
  I do not know if zabbix-agent is multithreaded, but cf-agent is.
 
 If in DDB, it would be useful to do a ps so we can identify threads in the 
process, and in particular, whether they might be in the kernel around the 
moment of the panic.
 
  I will follow up with this information as soon as possible.
 
 Thanks. Do keep around as much information as you can from DDB, crashdumps, 
etc. A useful set of things to keep from DDB includes the initial panic 
information and trap frame, show pcpu, show allpcpu, trace, alltrace, 
ps, and if WITNESS is compiled in, show locks and show alllocks. On busy 
systems, all the backtraces add up to a lot of space, so you might hold onto 
that rather than e-mail it, but contain useful information. Often, debugging 
this sort of race condition involves looking at what other network-centred 
threads are doing -- e.g., device-driver ithreads, netisr, other involved user 
threads. You may be able to extract much of that information using ps on the 
crashdump (not sure if procstat is there yet for crashdumps) -- if so, be sure 
to use -H (or whatever the argument is to print thread, not just process, 
information).

You can also grab my kgdb scripts from www.freebsd.org/~jhb/gdb/

Put those in a dir and do 'source gdb6'.  You can then run 'ps' to get a good 
ps listing that includes threads.  You can also use 'thread apply all bt' to 
get stacktraces of all threads in kgdb.  I believe there is an 'allpcpu' 
command that is similar to 'show allpcpu' in DDB.

Robert, in this case he has a full crashdump, so we can get quite a bit of 
information from it.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: panic: in_pcblookup_local (?)

2013-05-01 Thread Glen Barber
On Wed, May 01, 2013 at 02:30:36PM -0400, John Baldwin wrote:
 On Wednesday, May 01, 2013 2:08:57 pm Robert N. M. Watson wrote:
  If in DDB, it would be useful to do a ps so we can identify threads in 
  the 
  process, and in particular, whether they might be in the kernel around the 
  moment of the panic.
  
   I will follow up with this information as soon as possible.
  
  Thanks. Do keep around as much information as you can from DDB, crashdumps, 
  etc. A useful set of things to keep from DDB includes the initial panic 
  information and trap frame, show pcpu, show allpcpu, trace, 
  alltrace, 
  ps, and if WITNESS is compiled in, show locks and show alllocks. On 
  busy 
  systems, all the backtraces add up to a lot of space, so you might hold 
  onto 
  that rather than e-mail it, but contain useful information. Often, 
  debugging 
  this sort of race condition involves looking at what other network-centred 
  threads are doing -- e.g., device-driver ithreads, netisr, other involved 
  user 
  threads. You may be able to extract much of that information using ps on 
  the 
  crashdump (not sure if procstat is there yet for crashdumps) -- if so, be 
  sure 
  to use -H (or whatever the argument is to print thread, not just process, 
  information).
 

So, I am admittedly not too familiar with DDB.  In fact, I just now
realize the kernel is built without DDB...

Additionally, the kernel is built without WITNESS.

 You can also grab my kgdb scripts from www.freebsd.org/~jhb/gdb/
 

Thanks for these.

 Put those in a dir and do 'source gdb6'.  You can then run 'ps' to get a good 
 ps listing that includes threads.  You can also use 'thread apply all bt' to 
 get stacktraces of all threads in kgdb.  I believe there is an 'allpcpu' 
 command that is similar to 'show allpcpu' in DDB.
 

I have the outputs of 'ps', 'allpcpu', and 'thread apply all bt' saved
to separate script(1) files.  Is there anything in particular I can look
for before uploading the files somewhere public?  At quick-ish look
though, I did not see anything cf-agent (the current process at time of
panic) related.

 Robert, in this case he has a full crashdump, so we can get quite a bit of 
 information from it.
 

Right, and I can keep anything available for as long as necessary.

Glen



pgpYzxdKdEa4Y.pgp
Description: PGP signature


Re: panic: in_pcblookup_local (?)

2013-04-30 Thread John Baldwin
On Monday, April 29, 2013 8:35:52 pm Glen Barber wrote:
 On Mon, Apr 29, 2013 at 12:24:06PM -0400, John Baldwin wrote:
  On Sunday, April 28, 2013 12:02:56 am Glen Barber wrote:
   On Sat, Apr 27, 2013 at 10:17:32AM +0200, Ian FREISLICH wrote:
Hi

I've been getting the following panic on recent current r249717.
Sadly the crashdump is useless.

   
   I just saw similar panic on 10-CURRENT r249588.
   
Fatal trap 9: general protection fault while in kernel mode
cpuid = 15; apic id = 0f
instruction pointer = 0x20:0x80546fbc
stack pointer   = 0x28:0xff846b60
frame pointer   = 0x28:0xff846b6777b0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 4361 (zabbix_agentd)
   
   Hmm..  This is interests me.  In my case, cf-agent was the current
   process.
   
   Backtrace of my panic follows.  Any pointers on how to debug this
   further would be appreciated.
   
   Glen
   
   Script started on Sat Apr 27 23:53:53 2013
   root@orion:/usr/obj/usr/src/sys/ORION # kgdb ./kernel.debug 
  /var/crash/vmcore.4
   GNU gdb 6.1.1 [FreeBSD]
   Copyright 2004 Free Software Foundation, Inc.
   GDB is free software, covered by the GNU General Public License, and you 
   are
   welcome to change it and/or distribute copies of it under certain 
  conditions.
   Type show copying to see the conditions.
   There is absolutely no warranty for GDB.  Type show warranty for 
   details.
   This GDB was configured as amd64-marcel-freebsd...
   
   Unread portion of the kernel message buffer:
   
   
   Fatal trap 9: general protection fault while in kernel mode
   cpuid = 1; apic id = 01
   instruction pointer   = 0x20:0x80736cec
   stack pointer = 0x28:0xff81aad4e760
   frame pointer = 0x28:0xff81aad4e7a0
   code segment  = base 0x0, limit 0xf, type 0x1b
 = DPL 0, pres 1, long 1, def32 0, gran 1
   processor eflags  = interrupt enabled, resume, IOPL = 0
   current process   = 78664 (cf-agent)
   trap number   = 9
   panic: general protection fault
   cpuid = 1
   KDB: stack backtrace:
   #0 0x80642a56 at kdb_backtrace+0x66
   #1 0x80606eeb at panic+0x13b
   #2 0x808e3b10 at trap_fatal+0x290
   #3 0x808e4331 at trap+0x241
   #4 0x808cdbb3 at calltrap+0x8
   #5 0x807371d8 at in_pcb_lport+0x128
   #6 0x8073745a at in_pcbbind_setup+0x16a
   #7 0x80737d8e at in_pcbconnect_setup+0x71e
   #8 0x80737df9 at in_pcbconnect_mbuf+0x59
   #9 0x807bf29f at udp_connect+0x11f
   #10 0x80680615 at kern_connectat+0x275
   #11 0x80680731 at sys_connect+0x41
   #12 0x808e32cb at amd64_syscall+0x63b
   #13 0x808cde97 at Xfast_syscall+0xf7
   Uptime: 3d19h38m52s
   (ada0:ahcich0:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 
   00
   (ada0:ahcich0:0:0:0): CAM status: CCB request is in progress
   (ada0:ahcich0:0:0:0): Error 5, Retries exhausted
   (ada0:ahcich0:0:0:0): Synchronize cache failed
   (ada1:ahcich1:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 
   00
   (ada1:ahcich1:0:0:0): CAM status: CCB request is in progress
   (ada1:ahcich1:0:0:0): Error 5, Retries exhausted
   (ada1:ahcich1:0:0:0): Synchronize cache failed
   (ada2:ahcich4:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 
   00
   (ada2:ahcich4:0:0:0): CAM status: CCB request is in progress
   (ada2:ahcich4:0:0:0): Error 5, Retries exhausted
   (ada2:ahcich4:0:0:0): Synchronize cache failed
   (ada3:ahcich5:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 
   00
   (ada3:ahcich5:0:0:0): CAM status: CCB request is in progress
   (ada3:ahcich5:0:0:0): Error 5, Retries exhausted
   (ada3:ahcich5:0:0:0): Synchronize cache failed
   Dumping 1014 out of 6049 
  MB:..2%..12%..21%..32%..42%..51%..62%..71%..81%..92%
   
   Reading symbols from /boot/kernel/zfs.ko.symbols...done.
   Loaded symbols for /boot/kernel/zfs.ko.symbols
   Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
   Loaded symbols for /boot/kernel/opensolaris.ko.symbols
   #0  doadump (textdump=value optimized out) at pcpu.h:231
   231   __asm(movq %%gs:%1,%0 : =r (td)
   (kgdb) frame 6
   #6  0x80736cec in in_pcblookup_local (pcbinfo=0x80dc9180, 
  laddr=
 {s_addr = 50374848}, lport=339, lookupflags=1, 
  cred=0xfe016cdad100)
   at /usr/src/sys/netinet/in_pcb.c:1438
   1438  LIST_FOREACH(phd, porthash, phd_hash) {
   (kgdb) list *0x80736cec
   0x80736cec is in in_pcblookup_local 
  (/usr/src/sys/netinet/in_pcb.c:1439).
   1434   * port hash list.
   1435   */
   1436

Re: panic: in_pcblookup_local (?)

2013-04-30 Thread Glen Barber
On Tue, Apr 30, 2013 at 04:53:13PM -0400, John Baldwin wrote:
 Try 'p phd' to start.  INP_PCBPORTHASH is a macro, so you will
 have to do it by hand:
 
 'p pcbinfo-ipi_porthashbase[lport  pcbinfo-ipi_porthashmask]'
 
 (That should be what 'porthash' is.)
 

Thanks for the pointers.  (Hah!)

Hopefully this is the info you are looking for:

Script started on Tue Apr 30 17:16:07 2013
root@orion:/usr/obj/usr/src/sys/ORION # kgdb ./kernel.debug /var/crash/vmcore.4
[...]
#0  doadump (textdump=value optimized out) at pcpu.h:231
231 __asm(movq %%gs:%1,%0 : =r (td)
(kgdb) frame 6
#6  0x80736cec in in_pcblookup_local (pcbinfo=0x80dc9180, laddr=
  {s_addr = 50374848}, lport=339, lookupflags=1, cred=0xfe016cdad100)
at /usr/src/sys/netinet/in_pcb.c:1438
1438LIST_FOREACH(phd, porthash, phd_hash) {
(kgdb) p phd
$1 = (struct inpcbport *) 0x9e17b100fe00
(kgdb) p pcbinfo-ipi_porthashbase[lport  pcbinfo-ipi_porthashmask]
$2 = {lh_first = 0x0}
(kgdb) p lport
$3 = 339
(kgdb) p pcbinfo-ipi_porthashmask
$4 = 127
(kgdb) root@orion:/usr/obj/usr/src/sys/ORION # ^D

Script done on Tue Apr 30 17:16:55 2013

Glen



pgp_ehcyNbq1m.pgp
Description: PGP signature


Re: panic: in_pcblookup_local (?)

2013-04-29 Thread John Baldwin
On Sunday, April 28, 2013 12:02:56 am Glen Barber wrote:
 On Sat, Apr 27, 2013 at 10:17:32AM +0200, Ian FREISLICH wrote:
  Hi
  
  I've been getting the following panic on recent current r249717.
  Sadly the crashdump is useless.
  
 
 I just saw similar panic on 10-CURRENT r249588.
 
  Fatal trap 9: general protection fault while in kernel mode
  cpuid = 15; apic id = 0f
  instruction pointer = 0x20:0x80546fbc
  stack pointer   = 0x28:0xff846b60
  frame pointer   = 0x28:0xff846b6777b0
  code segment= base 0x0, limit 0xf, type 0x1b
  = DPL 0, pres 1, long 1, def32 0, gran 1
  processor eflags= interrupt enabled, resume, IOPL = 0
  current process = 4361 (zabbix_agentd)
 
 Hmm..  This is interests me.  In my case, cf-agent was the current
 process.
 
 Backtrace of my panic follows.  Any pointers on how to debug this
 further would be appreciated.
 
 Glen
 
 Script started on Sat Apr 27 23:53:53 2013
 root@orion:/usr/obj/usr/src/sys/ORION # kgdb ./kernel.debug 
/var/crash/vmcore.4
 GNU gdb 6.1.1 [FreeBSD]
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain 
conditions.
 Type show copying to see the conditions.
 There is absolutely no warranty for GDB.  Type show warranty for details.
 This GDB was configured as amd64-marcel-freebsd...
 
 Unread portion of the kernel message buffer:
 
 
 Fatal trap 9: general protection fault while in kernel mode
 cpuid = 1; apic id = 01
 instruction pointer   = 0x20:0x80736cec
 stack pointer = 0x28:0xff81aad4e760
 frame pointer = 0x28:0xff81aad4e7a0
 code segment  = base 0x0, limit 0xf, type 0x1b
   = DPL 0, pres 1, long 1, def32 0, gran 1
 processor eflags  = interrupt enabled, resume, IOPL = 0
 current process   = 78664 (cf-agent)
 trap number   = 9
 panic: general protection fault
 cpuid = 1
 KDB: stack backtrace:
 #0 0x80642a56 at kdb_backtrace+0x66
 #1 0x80606eeb at panic+0x13b
 #2 0x808e3b10 at trap_fatal+0x290
 #3 0x808e4331 at trap+0x241
 #4 0x808cdbb3 at calltrap+0x8
 #5 0x807371d8 at in_pcb_lport+0x128
 #6 0x8073745a at in_pcbbind_setup+0x16a
 #7 0x80737d8e at in_pcbconnect_setup+0x71e
 #8 0x80737df9 at in_pcbconnect_mbuf+0x59
 #9 0x807bf29f at udp_connect+0x11f
 #10 0x80680615 at kern_connectat+0x275
 #11 0x80680731 at sys_connect+0x41
 #12 0x808e32cb at amd64_syscall+0x63b
 #13 0x808cde97 at Xfast_syscall+0xf7
 Uptime: 3d19h38m52s
 (ada0:ahcich0:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
 (ada0:ahcich0:0:0:0): CAM status: CCB request is in progress
 (ada0:ahcich0:0:0:0): Error 5, Retries exhausted
 (ada0:ahcich0:0:0:0): Synchronize cache failed
 (ada1:ahcich1:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
 (ada1:ahcich1:0:0:0): CAM status: CCB request is in progress
 (ada1:ahcich1:0:0:0): Error 5, Retries exhausted
 (ada1:ahcich1:0:0:0): Synchronize cache failed
 (ada2:ahcich4:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
 (ada2:ahcich4:0:0:0): CAM status: CCB request is in progress
 (ada2:ahcich4:0:0:0): Error 5, Retries exhausted
 (ada2:ahcich4:0:0:0): Synchronize cache failed
 (ada3:ahcich5:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
 (ada3:ahcich5:0:0:0): CAM status: CCB request is in progress
 (ada3:ahcich5:0:0:0): Error 5, Retries exhausted
 (ada3:ahcich5:0:0:0): Synchronize cache failed
 Dumping 1014 out of 6049 
MB:..2%..12%..21%..32%..42%..51%..62%..71%..81%..92%
 
 Reading symbols from /boot/kernel/zfs.ko.symbols...done.
 Loaded symbols for /boot/kernel/zfs.ko.symbols
 Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
 Loaded symbols for /boot/kernel/opensolaris.ko.symbols
 #0  doadump (textdump=value optimized out) at pcpu.h:231
 231   __asm(movq %%gs:%1,%0 : =r (td)
 (kgdb) frame 6
 #6  0x80736cec in in_pcblookup_local (pcbinfo=0x80dc9180, 
laddr=
   {s_addr = 50374848}, lport=339, lookupflags=1, 
cred=0xfe016cdad100)
 at /usr/src/sys/netinet/in_pcb.c:1438
 1438  LIST_FOREACH(phd, porthash, phd_hash) {
 (kgdb) list *0x80736cec
 0x80736cec is in in_pcblookup_local 
(/usr/src/sys/netinet/in_pcb.c:1439).
 1434   * port hash list.
 1435   */
 1436  porthash = 
 pcbinfo-ipi_porthashbase[INP_PCBPORTHASH(lport,
 1437  pcbinfo-ipi_porthashmask)];
 1438  LIST_FOREACH(phd, porthash, phd_hash) {
 1439  if (phd-phd_port == lport)
 1440  break;
 1441  }
 1442  if (phd != NULL) {
 1443  /*


Re: panic: in_pcblookup_local (?)

2013-04-29 Thread Glen Barber
On Mon, Apr 29, 2013 at 12:24:06PM -0400, John Baldwin wrote:
 On Sunday, April 28, 2013 12:02:56 am Glen Barber wrote:
  On Sat, Apr 27, 2013 at 10:17:32AM +0200, Ian FREISLICH wrote:
   Hi
   
   I've been getting the following panic on recent current r249717.
   Sadly the crashdump is useless.
   
  
  I just saw similar panic on 10-CURRENT r249588.
  
   Fatal trap 9: general protection fault while in kernel mode
   cpuid = 15; apic id = 0f
   instruction pointer = 0x20:0x80546fbc
   stack pointer   = 0x28:0xff846b60
   frame pointer   = 0x28:0xff846b6777b0
   code segment= base 0x0, limit 0xf, type 0x1b
   = DPL 0, pres 1, long 1, def32 0, gran 1
   processor eflags= interrupt enabled, resume, IOPL = 0
   current process = 4361 (zabbix_agentd)
  
  Hmm..  This is interests me.  In my case, cf-agent was the current
  process.
  
  Backtrace of my panic follows.  Any pointers on how to debug this
  further would be appreciated.
  
  Glen
  
  Script started on Sat Apr 27 23:53:53 2013
  root@orion:/usr/obj/usr/src/sys/ORION # kgdb ./kernel.debug 
 /var/crash/vmcore.4
  GNU gdb 6.1.1 [FreeBSD]
  Copyright 2004 Free Software Foundation, Inc.
  GDB is free software, covered by the GNU General Public License, and you are
  welcome to change it and/or distribute copies of it under certain 
 conditions.
  Type show copying to see the conditions.
  There is absolutely no warranty for GDB.  Type show warranty for details.
  This GDB was configured as amd64-marcel-freebsd...
  
  Unread portion of the kernel message buffer:
  
  
  Fatal trap 9: general protection fault while in kernel mode
  cpuid = 1; apic id = 01
  instruction pointer = 0x20:0x80736cec
  stack pointer   = 0x28:0xff81aad4e760
  frame pointer   = 0x28:0xff81aad4e7a0
  code segment= base 0x0, limit 0xf, type 0x1b
  = DPL 0, pres 1, long 1, def32 0, gran 1
  processor eflags= interrupt enabled, resume, IOPL = 0
  current process = 78664 (cf-agent)
  trap number = 9
  panic: general protection fault
  cpuid = 1
  KDB: stack backtrace:
  #0 0x80642a56 at kdb_backtrace+0x66
  #1 0x80606eeb at panic+0x13b
  #2 0x808e3b10 at trap_fatal+0x290
  #3 0x808e4331 at trap+0x241
  #4 0x808cdbb3 at calltrap+0x8
  #5 0x807371d8 at in_pcb_lport+0x128
  #6 0x8073745a at in_pcbbind_setup+0x16a
  #7 0x80737d8e at in_pcbconnect_setup+0x71e
  #8 0x80737df9 at in_pcbconnect_mbuf+0x59
  #9 0x807bf29f at udp_connect+0x11f
  #10 0x80680615 at kern_connectat+0x275
  #11 0x80680731 at sys_connect+0x41
  #12 0x808e32cb at amd64_syscall+0x63b
  #13 0x808cde97 at Xfast_syscall+0xf7
  Uptime: 3d19h38m52s
  (ada0:ahcich0:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
  (ada0:ahcich0:0:0:0): CAM status: CCB request is in progress
  (ada0:ahcich0:0:0:0): Error 5, Retries exhausted
  (ada0:ahcich0:0:0:0): Synchronize cache failed
  (ada1:ahcich1:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
  (ada1:ahcich1:0:0:0): CAM status: CCB request is in progress
  (ada1:ahcich1:0:0:0): Error 5, Retries exhausted
  (ada1:ahcich1:0:0:0): Synchronize cache failed
  (ada2:ahcich4:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
  (ada2:ahcich4:0:0:0): CAM status: CCB request is in progress
  (ada2:ahcich4:0:0:0): Error 5, Retries exhausted
  (ada2:ahcich4:0:0:0): Synchronize cache failed
  (ada3:ahcich5:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
  (ada3:ahcich5:0:0:0): CAM status: CCB request is in progress
  (ada3:ahcich5:0:0:0): Error 5, Retries exhausted
  (ada3:ahcich5:0:0:0): Synchronize cache failed
  Dumping 1014 out of 6049 
 MB:..2%..12%..21%..32%..42%..51%..62%..71%..81%..92%
  
  Reading symbols from /boot/kernel/zfs.ko.symbols...done.
  Loaded symbols for /boot/kernel/zfs.ko.symbols
  Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
  Loaded symbols for /boot/kernel/opensolaris.ko.symbols
  #0  doadump (textdump=value optimized out) at pcpu.h:231
  231 __asm(movq %%gs:%1,%0 : =r (td)
  (kgdb) frame 6
  #6  0x80736cec in in_pcblookup_local (pcbinfo=0x80dc9180, 
 laddr=
{s_addr = 50374848}, lport=339, lookupflags=1, 
 cred=0xfe016cdad100)
  at /usr/src/sys/netinet/in_pcb.c:1438
  1438LIST_FOREACH(phd, porthash, phd_hash) {
  (kgdb) list *0x80736cec
  0x80736cec is in in_pcblookup_local 
 (/usr/src/sys/netinet/in_pcb.c:1439).
  1434 * port hash list.
  1435 */
  1436porthash = 
  pcbinfo-ipi_porthashbase[INP_PCBPORTHASH(lport,
  1437pcbinfo-ipi_porthashmask)];
  1438LIST_FOREACH(phd, porthash, 

panic: in_pcblookup_local (?)

2013-04-27 Thread Ian FREISLICH
Hi

I've been getting the following panic on recent current r249717.
Sadly the crashdump is useless.

Fatal trap 9: general protection fault while in kernel mode
cpuid = 15; apic id = 0f
instruction pointer = 0x20:0x80546fbc
stack pointer   = 0x28:0xff846b60
frame pointer   = 0x28:0xff846b6777b0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 4361 (zabbix_agentd)
trap number = 9
panic: general protection fault
cpuid = 15
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xff846b677410
panic() at panic+0x13d/frame 0xff846b677510
trap_fatal() at trap_fatal+0x290/frame 0xff846b677570
trap() at trap+0xff/frame 0xff846b6776b0
calltrap() at calltrap+0x8/frame 0xff846b6776b0
--- trap 0x9, rip = 0x80546fbc, rsp = 0xff846b60, rbp = 
0xff846b6777b0 ---
in_pcblookup_local() at in_pcblookup_local+0x5c/frame 0xff846b6777b0
in_pcb_lport() at in_pcb_lport+0x109/frame 0xff846b677820
in_pcbbind_setup() at in_pcbbind_setup+0x16a/frame 0xff846b6778a0
in_pcbconnect_setup() at in_pcbconnect_setup+0x71e/frame 0xff846b677990
in_pcbconnect_mbuf() at in_pcbconnect_mbuf+0x59/frame 0xff846b6779e0
udp_connect() at udp_connect+0x11e/frame 0xff846b677a30
kern_connectat() at kern_connectat+0x1f5/frame 0xff846b677a90
sys_connect() at sys_connect+0x41/frame 0xff846b677ad0
amd64_syscall() at amd64_syscall+0x572/frame 0xff846b677bf0
Xfast_syscall() at Xfast_syscall+0xf7/frame 0xff846b677bf0
--- syscall (98, FreeBSD ELF64, sys_connect), rip = 0x80127104a, rsp = 
0x7fff97a8, rbp = 0x8014f68d4 ---
Uptime: 20m13s
Dumping 1688 out of 16368 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
Dump complete
Automatic reboot in 15 seconds - press a key on the console to abort
Rebooting...
cpu_reset: Restarting BSP
cpu_reset_proxy: Stopped CPU 15

Ian

-- 
Ian Freislich
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: panic: in_pcblookup_local (?)

2013-04-27 Thread Glen Barber
On Sat, Apr 27, 2013 at 10:17:32AM +0200, Ian FREISLICH wrote:
 Hi
 
 I've been getting the following panic on recent current r249717.
 Sadly the crashdump is useless.
 

I just saw similar panic on 10-CURRENT r249588.

 Fatal trap 9: general protection fault while in kernel mode
 cpuid = 15; apic id = 0f
 instruction pointer = 0x20:0x80546fbc
 stack pointer   = 0x28:0xff846b60
 frame pointer   = 0x28:0xff846b6777b0
 code segment= base 0x0, limit 0xf, type 0x1b
 = DPL 0, pres 1, long 1, def32 0, gran 1
 processor eflags= interrupt enabled, resume, IOPL = 0
 current process = 4361 (zabbix_agentd)

Hmm..  This is interests me.  In my case, cf-agent was the current
process.

Backtrace of my panic follows.  Any pointers on how to debug this
further would be appreciated.

Glen

Script started on Sat Apr 27 23:53:53 2013
root@orion:/usr/obj/usr/src/sys/ORION # kgdb ./kernel.debug /var/crash/vmcore.4
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as amd64-marcel-freebsd...

Unread portion of the kernel message buffer:


Fatal trap 9: general protection fault while in kernel mode
cpuid = 1; apic id = 01
instruction pointer = 0x20:0x80736cec
stack pointer   = 0x28:0xff81aad4e760
frame pointer   = 0x28:0xff81aad4e7a0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 78664 (cf-agent)
trap number = 9
panic: general protection fault
cpuid = 1
KDB: stack backtrace:
#0 0x80642a56 at kdb_backtrace+0x66
#1 0x80606eeb at panic+0x13b
#2 0x808e3b10 at trap_fatal+0x290
#3 0x808e4331 at trap+0x241
#4 0x808cdbb3 at calltrap+0x8
#5 0x807371d8 at in_pcb_lport+0x128
#6 0x8073745a at in_pcbbind_setup+0x16a
#7 0x80737d8e at in_pcbconnect_setup+0x71e
#8 0x80737df9 at in_pcbconnect_mbuf+0x59
#9 0x807bf29f at udp_connect+0x11f
#10 0x80680615 at kern_connectat+0x275
#11 0x80680731 at sys_connect+0x41
#12 0x808e32cb at amd64_syscall+0x63b
#13 0x808cde97 at Xfast_syscall+0xf7
Uptime: 3d19h38m52s
(ada0:ahcich0:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: CCB request is in progress
(ada0:ahcich0:0:0:0): Error 5, Retries exhausted
(ada0:ahcich0:0:0:0): Synchronize cache failed
(ada1:ahcich1:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
(ada1:ahcich1:0:0:0): CAM status: CCB request is in progress
(ada1:ahcich1:0:0:0): Error 5, Retries exhausted
(ada1:ahcich1:0:0:0): Synchronize cache failed
(ada2:ahcich4:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
(ada2:ahcich4:0:0:0): CAM status: CCB request is in progress
(ada2:ahcich4:0:0:0): Error 5, Retries exhausted
(ada2:ahcich4:0:0:0): Synchronize cache failed
(ada3:ahcich5:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
(ada3:ahcich5:0:0:0): CAM status: CCB request is in progress
(ada3:ahcich5:0:0:0): Error 5, Retries exhausted
(ada3:ahcich5:0:0:0): Synchronize cache failed
Dumping 1014 out of 6049 MB:..2%..12%..21%..32%..42%..51%..62%..71%..81%..92%

Reading symbols from /boot/kernel/zfs.ko.symbols...done.
Loaded symbols for /boot/kernel/zfs.ko.symbols
Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
Loaded symbols for /boot/kernel/opensolaris.ko.symbols
#0  doadump (textdump=value optimized out) at pcpu.h:231
231 __asm(movq %%gs:%1,%0 : =r (td)
(kgdb) bt
#0  doadump (textdump=value optimized out) at pcpu.h:231
#1  0x80606a56 in kern_reboot (howto=260) at 
/usr/src/sys/kern/kern_shutdown.c:447
#2  0x80606ed5 in panic (fmt=value optimized out)
at /usr/src/sys/kern/kern_shutdown.c:754
#3  0x808e3b10 in trap_fatal (frame=0x9, eva=value optimized out)
at /usr/src/sys/amd64/amd64/trap.c:872
#4  0x808e4331 in trap (frame=0xff81aad4e6b0)
at /usr/src/sys/amd64/amd64/trap.c:605
#5  0x808cdbb3 in calltrap () at 
/usr/src/sys/amd64/amd64/exception.S:228
#6  0x80736cec in in_pcblookup_local (pcbinfo=0x80dc9180, laddr=
  {s_addr = 50374848}, lport=339, lookupflags=1, cred=0xfe016cdad100)
at /usr/src/sys/netinet/in_pcb.c:1438
#7  0x807371d8 in in_pcb_lport (inp=0xfe016c2fb7a8, 
laddrp=0xff81aad4e860, 
lportp=0xff81aad4e86e, cred=0xfe016cdad100, lookupflags=1)
at /usr/src/sys/netinet/in_pcb.c:457
#8  0x8073745a in in_pcbbind_setup