Re: panic: in_pcblookup_local (?)
Peter Wemm wrote: offending revision. I've started a binary search. I'll let you know what that turns up. Thanks, and sorry for getting my Ian's mixed up. :-/ -- John Baldwin There's been two separate machines, at least twice each on this exact panic / trace. Always with doing a 'svn update'. Rolling back to April 5th 249172 solves it. (There's nothing particular about that rev, except it was top-of-tree when the last update was done). I see a number locking changes in the area. Note that this is UDP, most likely a dns lookup. I'll work to confirm this here. I was a little slow in bisecting because I spent 2 days trying to figure out what revision caused PF to rapidly expire its entire state table which prevented testing this condition. Ian -- Ian Freislich ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: panic: in_pcblookup_local (?)
On Thu, May 2, 2013 at 11:32 AM, John Baldwin j...@freebsd.org wrote: On Thursday, May 02, 2013 1:53:47 pm Ian FREISLICH wrote: John Baldwin wrote: On Thursday, May 02, 2013 7:25:08 am Robert N. M. Watson wrote: On 2 May 2013, at 11:42, Glen Barber wrote: Hmm. Perhaps it would be worthwhile for me to rebuild the current kernel with DDB support. It looks like the machine has panicked a few times over the last two weeks or so, but based on the timestamps of the crash dumps and nagios complaints, happened during the middle of the night when I would not have really noticed, or otherwise would have just blamed my ISP. Two of the panics are ath(4) related. One looks similar to the one referenced in this thread, similarly triggered by a CFEngine process. In that case, the backtrace looks like: #4 0x808cdbb3 at calltrap+0x8 #5 0x807371d8 at in_pcb_lport+0x128 #6 0x8073745a at in_pcbbind_setup+0x16a #7 0x80737d8e at in_pcbconnect_setup+0x71e #8 0x80737df9 at in_pcbconnect_mbuf+0x59 #9 0x807bf29f at udp_connect+0x11f #10 0x80680615 at kern_connectat+0x275 Regarding DDB though, it would be rather difficult to access the machine if it drops to a DDB debugger session, since the machine acts as my firewall. Thanks -- will take a look at the attached. FWIW, though, I'm worried by the number of panics you are seeing, especiall y given that they involve multiple subsystems, and in particular, John's observation about a potentially corrupted pointer. This makes me wonder whether (a) you are experiencing hardware faults -- it would be worth running some memory/cpu/etc tests and (b) if we might be seeing a software memory corruption bug of some sort. Other users have reported this (Ian Lepore), and Peter Wemm can now reproduce these at will as well, so I think this is a software bug. What might be easiest if we can't figure this out from the crashdump is just to bisect the offending revision. I've started a binary search. I'll let you know what that turns up. Thanks, and sorry for getting my Ian's mixed up. :-/ -- John Baldwin I forgot to roll back one of the routers at nyi.freebsd.org and it paniced again, the same way as before: Fatal trap 9: general protection fault while in kernel mode^M cpuid = 3; apic id = 03^M instruction pointer = 0x20:0x8067284c^M stack pointer = 0x28:0xff8098688760^M frame pointer = 0x28:0xff80986887a0^M code segment= base 0x0, limit 0xf, type 0x1b^M = DPL 0, pres 1, long 1, def32 0, gran 1^M processor eflags= interrupt enabled, resume, IOPL = 0^M current process = 15041 (svn)^M [ thread pid 15041 tid 100208 ]^M Stopped at in_pcblookup_local+0x5c:cmpw%r12w,0x18(%rax)^M #8 0x80829dff in calltrap () at ../../../amd64/amd64/exception.S:228 #9 0x8067284c in in_pcblookup_local (pcbinfo=0x80c9e180, laddr= {s_addr = 708980576}, lport=607, lookupflags=1, cred=0xfe006956d700) at ../../../netinet/in_pcb.c:1438 #10 0x80672d38 in in_pcb_lport (inp=0xfe00098aa620, laddrp=0xff809845d860, lportp=0xff809845d86e, cred=0xfe006956d700, lookupflags=1) at ../../../netinet/in_pcb.c:457 #11 0x80672fba in in_pcbbind_setup (inp=0xfe00098aa620, nam=0x0, laddrp=0xff809845d900, lportp=0xff809845d90e, cred=0xfe006956d700) at ../../../netinet/in_pcb.c:615 #12 0x806738ee in in_pcbconnect_setup (inp=0xfe00098aa620, nam=value optimized out, laddrp=0xff809845d9b8, lportp=0xff809845d9be, faddrp=0xff809845d9b4, fportp=0xff809845d9bc, oinpp=0x0, cred=0xfe006956d700) at ../../../netinet/in_pcb.c:1019 #13 0x80673959 in in_pcbconnect_mbuf (inp=0xfe00098aa620, nam=value optimized out, cred=value optimized out, m=0x0) at ../../../netinet/in_pcb.c:645 #14 0x806fafcf in udp_connect (so=0xfe002e150d48, nam=0xfe00264df3b0, td=0xfe00091df490) at ../../../netinet/udp_usrreq.c:1530 #15 0x805faea5 in kern_connectat (td=0xfe00091df490, dirfd=-100, fd=value optimized out, sa=0xfe00264df3b0) at ../../../kern/uipc_syscalls.c:593 #16 0x805fafc1 in sys_connect (td=0xfe00091df490, uap=0xff809845db70) at ../../../kern/uipc_syscalls.c:559 #17 0x8083f571 in amd64_syscall (td=0xfe00091df490, traced=0) at subr_syscall.c:134 There's been two separate machines, at least twice each on this exact panic / trace. Always with doing a 'svn update'. Rolling back to April 5th 249172 solves it. (There's nothing particular about that rev, except it was top-of-tree when the last update was done). I see a number locking changes in the area. Note that this is UDP, most likely a dns
Re: panic: in_pcblookup_local (?)
On 2 May 2013, at 01:57, Glen Barber wrote: So, I am admittedly not too familiar with DDB. In fact, I just now realize the kernel is built without DDB... DDB is a very powerful tool in that it's been custom-developed to help debug common kernel panics. It lacks some of the flexibility, and especially the data-type awareness of GDB, but GDB is a less well-suited tool when investigating common crash patterns. I'll usually start out debugging in DDB, and find that 90% of my in-development panics can be debugged with it, resorting to GDB for post-mortem analyses in production or particularly hard debugging cases (usually where DDB's pretty printers for data types fall short). I've wanted, for a long time, to teach DDB how to pretty-print arbitrary types using DTrace's CTF meta-data, which would address the most significant major case where I turn to GDB. Mind you, the limitations I see in GDB are made up for in most part by John's GDB scripts :-). Put those in a dir and do 'source gdb6'. You can then run 'ps' to get a good ps listing that includes threads. You can also use 'thread apply all bt' to get stacktraces of all threads in kgdb. I believe there is an 'allpcpu' command that is similar to 'show allpcpu' in DDB. I have the outputs of 'ps', 'allpcpu', and 'thread apply all bt' saved to separate script(1) files. Is there anything in particular I can look for before uploading the files somewhere public? At quick-ish look though, I did not see anything cf-agent (the current process at time of panic) related. To be honest, it's probably easiest if I just take a look at it and see what I see. In as much as I find interesting things, I'll follow up explaining what they are. We may find we can't track this problem down from the data we have -- but it's worth a try. Robert ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: panic: in_pcblookup_local (?)
On Thu, May 02, 2013 at 10:27:39AM +0100, Robert N. M. Watson wrote: On 2 May 2013, at 01:57, Glen Barber wrote: So, I am admittedly not too familiar with DDB. In fact, I just now realize the kernel is built without DDB... DDB is a very powerful tool in that it's been custom-developed to help debug common kernel panics. It lacks some of the flexibility, and especially the data-type awareness of GDB, but GDB is a less well-suited tool when investigating common crash patterns. I'll usually start out debugging in DDB, and find that 90% of my in-development panics can be debugged with it, resorting to GDB for post-mortem analyses in production or particularly hard debugging cases (usually where DDB's pretty printers for data types fall short). I've wanted, for a long time, to teach DDB how to pretty-print arbitrary types using DTrace's CTF meta-data, which would address the most significant major case where I turn to GDB. Mind you, the limitations I see in GDB are made up for in most part by John's GDB scripts :-). Hmm. Perhaps it would be worthwhile for me to rebuild the current kernel with DDB support. It looks like the machine has panicked a few times over the last two weeks or so, but based on the timestamps of the crash dumps and nagios complaints, happened during the middle of the night when I would not have really noticed, or otherwise would have just blamed my ISP. Two of the panics are ath(4) related. One looks similar to the one referenced in this thread, similarly triggered by a CFEngine process. In that case, the backtrace looks like: #4 0x808cdbb3 at calltrap+0x8 #5 0x807371d8 at in_pcb_lport+0x128 #6 0x8073745a at in_pcbbind_setup+0x16a #7 0x80737d8e at in_pcbconnect_setup+0x71e #8 0x80737df9 at in_pcbconnect_mbuf+0x59 #9 0x807bf29f at udp_connect+0x11f #10 0x80680615 at kern_connectat+0x275 Regarding DDB though, it would be rather difficult to access the machine if it drops to a DDB debugger session, since the machine acts as my firewall. Put those in a dir and do 'source gdb6'. You can then run 'ps' to get a good ps listing that includes threads. You can also use 'thread apply all bt' to get stacktraces of all threads in kgdb. I believe there is an 'allpcpu' command that is similar to 'show allpcpu' in DDB. I have the outputs of 'ps', 'allpcpu', and 'thread apply all bt' saved to separate script(1) files. Is there anything in particular I can look for before uploading the files somewhere public? At quick-ish look though, I did not see anything cf-agent (the current process at time of panic) related. To be honest, it's probably easiest if I just take a look at it and see what I see. In as much as I find interesting things, I'll follow up explaining what they are. We may find we can't track this problem down from the data we have -- but it's worth a try. Sure. The files are available here: https://www.glenbarber.us/stuff/in_pcblookup_local/vmcore.4.ps.txt https://www.glenbarber.us/stuff/in_pcblookup_local/vmcore.4.allpcpu.txt https://www.glenbarber.us/stuff/in_pcblookup_local/vmcore.4.thread_apply_all_bt.txt Thanks to both of you for looking into this. Glen pgpAoAdObxR0p.pgp Description: PGP signature
Re: panic: in_pcblookup_local (?)
On 2 May 2013, at 11:42, Glen Barber wrote: Hmm. Perhaps it would be worthwhile for me to rebuild the current kernel with DDB support. It looks like the machine has panicked a few times over the last two weeks or so, but based on the timestamps of the crash dumps and nagios complaints, happened during the middle of the night when I would not have really noticed, or otherwise would have just blamed my ISP. Two of the panics are ath(4) related. One looks similar to the one referenced in this thread, similarly triggered by a CFEngine process. In that case, the backtrace looks like: #4 0x808cdbb3 at calltrap+0x8 #5 0x807371d8 at in_pcb_lport+0x128 #6 0x8073745a at in_pcbbind_setup+0x16a #7 0x80737d8e at in_pcbconnect_setup+0x71e #8 0x80737df9 at in_pcbconnect_mbuf+0x59 #9 0x807bf29f at udp_connect+0x11f #10 0x80680615 at kern_connectat+0x275 Regarding DDB though, it would be rather difficult to access the machine if it drops to a DDB debugger session, since the machine acts as my firewall. Thanks -- will take a look at the attached. FWIW, though, I'm worried by the number of panics you are seeing, especially given that they involve multiple subsystems, and in particular, John's observation about a potentially corrupted pointer. This makes me wonder whether (a) you are experiencing hardware faults -- it would be worth running some memory/cpu/etc tests and (b) if we might be seeing a software memory corruption bug of some sort. Robert ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: panic: in_pcblookup_local (?)
On Thu, May 02, 2013 at 12:25:08PM +0100, Robert N. M. Watson wrote: On 2 May 2013, at 11:42, Glen Barber wrote: Hmm. Perhaps it would be worthwhile for me to rebuild the current kernel with DDB support. It looks like the machine has panicked a few times over the last two weeks or so, but based on the timestamps of the crash dumps and nagios complaints, happened during the middle of the night when I would not have really noticed, or otherwise would have just blamed my ISP. Two of the panics are ath(4) related. One looks similar to the one referenced in this thread, similarly triggered by a CFEngine process. In that case, the backtrace looks like: #4 0x808cdbb3 at calltrap+0x8 #5 0x807371d8 at in_pcb_lport+0x128 #6 0x8073745a at in_pcbbind_setup+0x16a #7 0x80737d8e at in_pcbconnect_setup+0x71e #8 0x80737df9 at in_pcbconnect_mbuf+0x59 #9 0x807bf29f at udp_connect+0x11f #10 0x80680615 at kern_connectat+0x275 Regarding DDB though, it would be rather difficult to access the machine if it drops to a DDB debugger session, since the machine acts as my firewall. Thanks -- will take a look at the attached. FWIW, though, I'm worried by the number of panics you are seeing, especially given that they involve multiple subsystems, and in particular, John's observation about a potentially corrupted pointer. This makes me wonder whether (a) you are experiencing hardware faults -- it would be worth running some memory/cpu/etc tests and (b) if we might be seeing a software memory corruption bug of some sort. I will run memtest this weekend, once I move some wires around so I do not lose internet access entirely. I'll run some stress tests that do not require the machine to be offline in the meantime. I certainly won't discount hardware issue being the cause. For what it is worth, I just looked through my svn commit logs for that machine's configuration, and the only relatively recent change that was made was enabling powerd(8) - but that was about 3 months ago. Glen pgpJKiZ5dteTa.pgp Description: PGP signature
Re: panic: in_pcblookup_local (?)
On Thursday, May 02, 2013 5:27:39 am Robert N. M. Watson wrote: On 2 May 2013, at 01:57, Glen Barber wrote: So, I am admittedly not too familiar with DDB. In fact, I just now realize the kernel is built without DDB... DDB is a very powerful tool in that it's been custom-developed to help debug common kernel panics. It lacks some of the flexibility, and especially the data-type awareness of GDB, but GDB is a less well-suited tool when investigating common crash patterns. I'll usually start out debugging in DDB, and find that 90% of my in-development panics can be debugged with it, resorting to GDB for post-mortem analyses in production or particularly hard debugging cases (usually where DDB's pretty printers for data types fall short). I've wanted, for a long time, to teach DDB how to pretty-print arbitrary types using DTrace's CTF meta-data, which would address the most significant major case where I turn to GDB. Mind you, the limitations I see in GDB are made up for in most part by John's GDB scripts :-). Heh, I prefer DDB for active development as well, but after being forced to work in an environment where I had to largely do post-mortem analysis, I had to get a gdb environment that was close to as functional. Also, using kgdb on a live system to obtain info is less invasive than ddb (doesn't halt the system), and you can easily add new scripts to generate useful reports without having to recompile or reboot. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: panic: in_pcblookup_local (?)
On Thursday, May 02, 2013 7:25:08 am Robert N. M. Watson wrote: On 2 May 2013, at 11:42, Glen Barber wrote: Hmm. Perhaps it would be worthwhile for me to rebuild the current kernel with DDB support. It looks like the machine has panicked a few times over the last two weeks or so, but based on the timestamps of the crash dumps and nagios complaints, happened during the middle of the night when I would not have really noticed, or otherwise would have just blamed my ISP. Two of the panics are ath(4) related. One looks similar to the one referenced in this thread, similarly triggered by a CFEngine process. In that case, the backtrace looks like: #4 0x808cdbb3 at calltrap+0x8 #5 0x807371d8 at in_pcb_lport+0x128 #6 0x8073745a at in_pcbbind_setup+0x16a #7 0x80737d8e at in_pcbconnect_setup+0x71e #8 0x80737df9 at in_pcbconnect_mbuf+0x59 #9 0x807bf29f at udp_connect+0x11f #10 0x80680615 at kern_connectat+0x275 Regarding DDB though, it would be rather difficult to access the machine if it drops to a DDB debugger session, since the machine acts as my firewall. Thanks -- will take a look at the attached. FWIW, though, I'm worried by the number of panics you are seeing, especially given that they involve multiple subsystems, and in particular, John's observation about a potentially corrupted pointer. This makes me wonder whether (a) you are experiencing hardware faults -- it would be worth running some memory/cpu/etc tests and (b) if we might be seeing a software memory corruption bug of some sort. Other users have reported this (Ian Lepore), and Peter Wemm can now reproduce these at will as well, so I think this is a software bug. What might be easiest if we can't figure this out from the crashdump is just to bisect the offending revision. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: panic: in_pcblookup_local (?)
John Baldwin wrote: On Thursday, May 02, 2013 7:25:08 am Robert N. M. Watson wrote: On 2 May 2013, at 11:42, Glen Barber wrote: Hmm. Perhaps it would be worthwhile for me to rebuild the current kernel with DDB support. It looks like the machine has panicked a few times over the last two weeks or so, but based on the timestamps of the crash dumps and nagios complaints, happened during the middle of the night when I would not have really noticed, or otherwise would have just blamed my ISP. Two of the panics are ath(4) related. One looks similar to the one referenced in this thread, similarly triggered by a CFEngine process. In that case, the backtrace looks like: #4 0x808cdbb3 at calltrap+0x8 #5 0x807371d8 at in_pcb_lport+0x128 #6 0x8073745a at in_pcbbind_setup+0x16a #7 0x80737d8e at in_pcbconnect_setup+0x71e #8 0x80737df9 at in_pcbconnect_mbuf+0x59 #9 0x807bf29f at udp_connect+0x11f #10 0x80680615 at kern_connectat+0x275 Regarding DDB though, it would be rather difficult to access the machine if it drops to a DDB debugger session, since the machine acts as my firewall. Thanks -- will take a look at the attached. FWIW, though, I'm worried by the number of panics you are seeing, especiall y given that they involve multiple subsystems, and in particular, John's observation about a potentially corrupted pointer. This makes me wonder whether (a) you are experiencing hardware faults -- it would be worth running some memory/cpu/etc tests and (b) if we might be seeing a software memory corruption bug of some sort. Other users have reported this (Ian Lepore), and Peter Wemm can now reproduce these at will as well, so I think this is a software bug. What might be easiest if we can't figure this out from the crashdump is just to bisect the offending revision. I've started a binary search. I'll let you know what that turns up. Ian -- Ian Freislich ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: panic: in_pcblookup_local (?)
On Thursday, May 02, 2013 1:53:47 pm Ian FREISLICH wrote: John Baldwin wrote: On Thursday, May 02, 2013 7:25:08 am Robert N. M. Watson wrote: On 2 May 2013, at 11:42, Glen Barber wrote: Hmm. Perhaps it would be worthwhile for me to rebuild the current kernel with DDB support. It looks like the machine has panicked a few times over the last two weeks or so, but based on the timestamps of the crash dumps and nagios complaints, happened during the middle of the night when I would not have really noticed, or otherwise would have just blamed my ISP. Two of the panics are ath(4) related. One looks similar to the one referenced in this thread, similarly triggered by a CFEngine process. In that case, the backtrace looks like: #4 0x808cdbb3 at calltrap+0x8 #5 0x807371d8 at in_pcb_lport+0x128 #6 0x8073745a at in_pcbbind_setup+0x16a #7 0x80737d8e at in_pcbconnect_setup+0x71e #8 0x80737df9 at in_pcbconnect_mbuf+0x59 #9 0x807bf29f at udp_connect+0x11f #10 0x80680615 at kern_connectat+0x275 Regarding DDB though, it would be rather difficult to access the machine if it drops to a DDB debugger session, since the machine acts as my firewall. Thanks -- will take a look at the attached. FWIW, though, I'm worried by the number of panics you are seeing, especiall y given that they involve multiple subsystems, and in particular, John's observation about a potentially corrupted pointer. This makes me wonder whether (a) you are experiencing hardware faults -- it would be worth running some memory/cpu/etc tests and (b) if we might be seeing a software memory corruption bug of some sort. Other users have reported this (Ian Lepore), and Peter Wemm can now reproduce these at will as well, so I think this is a software bug. What might be easiest if we can't figure this out from the crashdump is just to bisect the offending revision. I've started a binary search. I'll let you know what that turns up. Thanks, and sorry for getting my Ian's mixed up. :-/ -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: panic: in_pcblookup_local (?)
On Tuesday, April 30, 2013 5:19:08 pm Glen Barber wrote: On Tue, Apr 30, 2013 at 04:53:13PM -0400, John Baldwin wrote: Try 'p phd' to start. INP_PCBPORTHASH is a macro, so you will have to do it by hand: 'p pcbinfo-ipi_porthashbase[lport pcbinfo-ipi_porthashmask]' (That should be what 'porthash' is.) Thanks for the pointers. (Hah!) Hopefully this is the info you are looking for: Script started on Tue Apr 30 17:16:07 2013 root@orion:/usr/obj/usr/src/sys/ORION # kgdb ./kernel.debug /var/crash/vmcore.4 [...] #0 doadump (textdump=value optimized out) at pcpu.h:231 231 __asm(movq %%gs:%1,%0 : =r (td) (kgdb) frame 6 #6 0x80736cec in in_pcblookup_local (pcbinfo=0x80dc9180, laddr= {s_addr = 50374848}, lport=339, lookupflags=1, cred=0xfe016cdad100) at /usr/src/sys/netinet/in_pcb.c:1438 1438 LIST_FOREACH(phd, porthash, phd_hash) { (kgdb) p phd $1 = (struct inpcbport *) 0x9e17b100fe00 That is odd, that looks word-swapped, as if it should be 0xfe009e17b100 (which would be a more normal pointer in the kernel on amd64). (kgdb) p pcbinfo-ipi_porthashbase[lport pcbinfo-ipi_porthashmask] $2 = {lh_first = 0x0} So the list is now empty. :( This feels like the list was updated out from under the pcbinfo. Looking at your earlier e-mail: (kgdb) p *pcbinfo $1 = {ipi_lock = {lock_object = {lo_name = 0x809d4d82 udp, lo_flags = 69926912, lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, ipi_listhead = 0x80dc9108, ipi_count = 28, ipi_gencnt = 535501, ipi_lastport = 21249, ipi_lastlow = 0, ipi_lasthi = 0, ipi_zone = 0xfe0017b60380, ipi_pcbgroups = 0x0, ipi_npcbgroups = 0, ipi_hashfields = 0, ipi_hash_lock = {lock_object = { lo_name = 0x80a03d80 pcbinfohash, lo_flags = 69402624, lo_data = 0, lo_witness = 0x0}, rw_lock = 18446741877615517696}, ipi_hashbase = 0xfe00120f6000, ipi_hashmask = 127, ipi_porthashbase = 0xfe00120f5c04, ipi_porthashmask = 127, ipi_wildbase = 0x0, ipi_wildmask = 0, ipi_vnet = 0x0, ipi_pspare = {0x0, 0x0}} It looks like the ipi_hash_lock is locked (and udp_connect() locks it), so I think the offending code is somewhere else. Also, I can't find anything that removes an inp without hold the correct pcbinfo lock. Only thing I can think of is if the pcbinfo pointer for an inp could change, so we could maybe lock the wrong one while removing it? Hmm, you know. In in_pcbremlists() and in_pcbdrop(), we read inp_phd without holding the hash lock. I think that probably don't actaully break anything, but this feels like a locking issue of some sort. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: panic: in_pcblookup_local (?)
On 1 May 2013, at 16:56, John Baldwin wrote: It looks like the ipi_hash_lock is locked (and udp_connect() locks it), so I think the offending code is somewhere else. Also, I can't find anything that removes an inp without hold the correct pcbinfo lock. Only thing I can think of is if the pcbinfo pointer for an inp could change, so we could maybe lock the wrong one while removing it? Hmm, you know. In in_pcbremlists() and in_pcbdrop(), we read inp_phd without holding the hash lock. I think that probably don't actaully break anything, but this feels like a locking issue of some sort. I'll need to catch up on this thread later, but a few questions: Do we know if the application in question is multithreaded, and if so, might it be attempting concurrent operations on this socket? The corrupted pointer is worrying ... but interesting, and suggests something else is going on here -- stack corruption earlier in the system call, perhaps? In general, to modify our various hash lists you must lock both the inpcb and the list. It's therefore sufficient to hold either lock to read, so reading inp_phd should be OK with the inpcb lock held, even without the hash lock held. Do we have a dump of *inp, and if so, can we confirm that the inpcb is still properly referenced, if there is an associated socket, likewise a dump of *inp-inp_socket to check things are properly referenced there? Robert ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: panic: in_pcblookup_local (?)
On Wed, May 01, 2013 at 06:45:53PM +0100, Robert N. M. Watson wrote: On 1 May 2013, at 16:56, John Baldwin wrote: It looks like the ipi_hash_lock is locked (and udp_connect() locks it), so I think the offending code is somewhere else. Also, I can't find anything that removes an inp without hold the correct pcbinfo lock. Only thing I can think of is if the pcbinfo pointer for an inp could change, so we could maybe lock the wrong one while removing it? Hmm, you know. In in_pcbremlists() and in_pcbdrop(), we read inp_phd without holding the hash lock. I think that probably don't actaully break anything, but this feels like a locking issue of some sort. I'll need to catch up on this thread later, but a few questions: Do we know if the application in question is multithreaded, and if so, might it be attempting concurrent operations on this socket? I do not know if zabbix-agent is multithreaded, but cf-agent is. The corrupted pointer is worrying ... but interesting, and suggests something else is going on here -- stack corruption earlier in the system call, perhaps? In general, to modify our various hash lists you must lock both the inpcb and the list. It's therefore sufficient to hold either lock to read, so reading inp_phd should be OK with the inpcb lock held, even without the hash lock held. Do we have a dump of *inp, and if so, can we confirm that the inpcb is still properly referenced, if there is an associated socket, likewise a dump of *inp-inp_socket to check things are properly referenced there? I will follow up with this information as soon as possible. Glen pgpKPwolYUmX7.pgp Description: PGP signature
Re: panic: in_pcblookup_local (?)
On 1 May 2013, at 19:03, Glen Barber wrote: I'll need to catch up on this thread later, but a few questions: Do we know if the application in question is multithreaded, and if so, might it be attempting concurrent operations on this socket? I do not know if zabbix-agent is multithreaded, but cf-agent is. If in DDB, it would be useful to do a ps so we can identify threads in the process, and in particular, whether they might be in the kernel around the moment of the panic. I will follow up with this information as soon as possible. Thanks. Do keep around as much information as you can from DDB, crashdumps, etc. A useful set of things to keep from DDB includes the initial panic information and trap frame, show pcpu, show allpcpu, trace, alltrace, ps, and if WITNESS is compiled in, show locks and show alllocks. On busy systems, all the backtraces add up to a lot of space, so you might hold onto that rather than e-mail it, but contain useful information. Often, debugging this sort of race condition involves looking at what other network-centred threads are doing -- e.g., device-driver ithreads, netisr, other involved user threads. You may be able to extract much of that information using ps on the crashdump (not sure if procstat is there yet for crashdumps) -- if so, be sure to use -H (or whatever the argument is to print thread, not just process, information). Off to a formal dinner, but back later! Robert ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: panic: in_pcblookup_local (?)
On Wednesday, May 01, 2013 2:08:57 pm Robert N. M. Watson wrote: On 1 May 2013, at 19:03, Glen Barber wrote: I'll need to catch up on this thread later, but a few questions: Do we know if the application in question is multithreaded, and if so, might it be attempting concurrent operations on this socket? I do not know if zabbix-agent is multithreaded, but cf-agent is. If in DDB, it would be useful to do a ps so we can identify threads in the process, and in particular, whether they might be in the kernel around the moment of the panic. I will follow up with this information as soon as possible. Thanks. Do keep around as much information as you can from DDB, crashdumps, etc. A useful set of things to keep from DDB includes the initial panic information and trap frame, show pcpu, show allpcpu, trace, alltrace, ps, and if WITNESS is compiled in, show locks and show alllocks. On busy systems, all the backtraces add up to a lot of space, so you might hold onto that rather than e-mail it, but contain useful information. Often, debugging this sort of race condition involves looking at what other network-centred threads are doing -- e.g., device-driver ithreads, netisr, other involved user threads. You may be able to extract much of that information using ps on the crashdump (not sure if procstat is there yet for crashdumps) -- if so, be sure to use -H (or whatever the argument is to print thread, not just process, information). You can also grab my kgdb scripts from www.freebsd.org/~jhb/gdb/ Put those in a dir and do 'source gdb6'. You can then run 'ps' to get a good ps listing that includes threads. You can also use 'thread apply all bt' to get stacktraces of all threads in kgdb. I believe there is an 'allpcpu' command that is similar to 'show allpcpu' in DDB. Robert, in this case he has a full crashdump, so we can get quite a bit of information from it. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: panic: in_pcblookup_local (?)
On Wed, May 01, 2013 at 02:30:36PM -0400, John Baldwin wrote: On Wednesday, May 01, 2013 2:08:57 pm Robert N. M. Watson wrote: If in DDB, it would be useful to do a ps so we can identify threads in the process, and in particular, whether they might be in the kernel around the moment of the panic. I will follow up with this information as soon as possible. Thanks. Do keep around as much information as you can from DDB, crashdumps, etc. A useful set of things to keep from DDB includes the initial panic information and trap frame, show pcpu, show allpcpu, trace, alltrace, ps, and if WITNESS is compiled in, show locks and show alllocks. On busy systems, all the backtraces add up to a lot of space, so you might hold onto that rather than e-mail it, but contain useful information. Often, debugging this sort of race condition involves looking at what other network-centred threads are doing -- e.g., device-driver ithreads, netisr, other involved user threads. You may be able to extract much of that information using ps on the crashdump (not sure if procstat is there yet for crashdumps) -- if so, be sure to use -H (or whatever the argument is to print thread, not just process, information). So, I am admittedly not too familiar with DDB. In fact, I just now realize the kernel is built without DDB... Additionally, the kernel is built without WITNESS. You can also grab my kgdb scripts from www.freebsd.org/~jhb/gdb/ Thanks for these. Put those in a dir and do 'source gdb6'. You can then run 'ps' to get a good ps listing that includes threads. You can also use 'thread apply all bt' to get stacktraces of all threads in kgdb. I believe there is an 'allpcpu' command that is similar to 'show allpcpu' in DDB. I have the outputs of 'ps', 'allpcpu', and 'thread apply all bt' saved to separate script(1) files. Is there anything in particular I can look for before uploading the files somewhere public? At quick-ish look though, I did not see anything cf-agent (the current process at time of panic) related. Robert, in this case he has a full crashdump, so we can get quite a bit of information from it. Right, and I can keep anything available for as long as necessary. Glen pgpYzxdKdEa4Y.pgp Description: PGP signature
Re: panic: in_pcblookup_local (?)
On Monday, April 29, 2013 8:35:52 pm Glen Barber wrote: On Mon, Apr 29, 2013 at 12:24:06PM -0400, John Baldwin wrote: On Sunday, April 28, 2013 12:02:56 am Glen Barber wrote: On Sat, Apr 27, 2013 at 10:17:32AM +0200, Ian FREISLICH wrote: Hi I've been getting the following panic on recent current r249717. Sadly the crashdump is useless. I just saw similar panic on 10-CURRENT r249588. Fatal trap 9: general protection fault while in kernel mode cpuid = 15; apic id = 0f instruction pointer = 0x20:0x80546fbc stack pointer = 0x28:0xff846b60 frame pointer = 0x28:0xff846b6777b0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 4361 (zabbix_agentd) Hmm.. This is interests me. In my case, cf-agent was the current process. Backtrace of my panic follows. Any pointers on how to debug this further would be appreciated. Glen Script started on Sat Apr 27 23:53:53 2013 root@orion:/usr/obj/usr/src/sys/ORION # kgdb ./kernel.debug /var/crash/vmcore.4 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd... Unread portion of the kernel message buffer: Fatal trap 9: general protection fault while in kernel mode cpuid = 1; apic id = 01 instruction pointer = 0x20:0x80736cec stack pointer = 0x28:0xff81aad4e760 frame pointer = 0x28:0xff81aad4e7a0 code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 78664 (cf-agent) trap number = 9 panic: general protection fault cpuid = 1 KDB: stack backtrace: #0 0x80642a56 at kdb_backtrace+0x66 #1 0x80606eeb at panic+0x13b #2 0x808e3b10 at trap_fatal+0x290 #3 0x808e4331 at trap+0x241 #4 0x808cdbb3 at calltrap+0x8 #5 0x807371d8 at in_pcb_lport+0x128 #6 0x8073745a at in_pcbbind_setup+0x16a #7 0x80737d8e at in_pcbconnect_setup+0x71e #8 0x80737df9 at in_pcbconnect_mbuf+0x59 #9 0x807bf29f at udp_connect+0x11f #10 0x80680615 at kern_connectat+0x275 #11 0x80680731 at sys_connect+0x41 #12 0x808e32cb at amd64_syscall+0x63b #13 0x808cde97 at Xfast_syscall+0xf7 Uptime: 3d19h38m52s (ada0:ahcich0:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: CCB request is in progress (ada0:ahcich0:0:0:0): Error 5, Retries exhausted (ada0:ahcich0:0:0:0): Synchronize cache failed (ada1:ahcich1:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 (ada1:ahcich1:0:0:0): CAM status: CCB request is in progress (ada1:ahcich1:0:0:0): Error 5, Retries exhausted (ada1:ahcich1:0:0:0): Synchronize cache failed (ada2:ahcich4:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 (ada2:ahcich4:0:0:0): CAM status: CCB request is in progress (ada2:ahcich4:0:0:0): Error 5, Retries exhausted (ada2:ahcich4:0:0:0): Synchronize cache failed (ada3:ahcich5:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 (ada3:ahcich5:0:0:0): CAM status: CCB request is in progress (ada3:ahcich5:0:0:0): Error 5, Retries exhausted (ada3:ahcich5:0:0:0): Synchronize cache failed Dumping 1014 out of 6049 MB:..2%..12%..21%..32%..42%..51%..62%..71%..81%..92% Reading symbols from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for /boot/kernel/zfs.ko.symbols Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for /boot/kernel/opensolaris.ko.symbols #0 doadump (textdump=value optimized out) at pcpu.h:231 231 __asm(movq %%gs:%1,%0 : =r (td) (kgdb) frame 6 #6 0x80736cec in in_pcblookup_local (pcbinfo=0x80dc9180, laddr= {s_addr = 50374848}, lport=339, lookupflags=1, cred=0xfe016cdad100) at /usr/src/sys/netinet/in_pcb.c:1438 1438 LIST_FOREACH(phd, porthash, phd_hash) { (kgdb) list *0x80736cec 0x80736cec is in in_pcblookup_local (/usr/src/sys/netinet/in_pcb.c:1439). 1434 * port hash list. 1435 */ 1436
Re: panic: in_pcblookup_local (?)
On Tue, Apr 30, 2013 at 04:53:13PM -0400, John Baldwin wrote: Try 'p phd' to start. INP_PCBPORTHASH is a macro, so you will have to do it by hand: 'p pcbinfo-ipi_porthashbase[lport pcbinfo-ipi_porthashmask]' (That should be what 'porthash' is.) Thanks for the pointers. (Hah!) Hopefully this is the info you are looking for: Script started on Tue Apr 30 17:16:07 2013 root@orion:/usr/obj/usr/src/sys/ORION # kgdb ./kernel.debug /var/crash/vmcore.4 [...] #0 doadump (textdump=value optimized out) at pcpu.h:231 231 __asm(movq %%gs:%1,%0 : =r (td) (kgdb) frame 6 #6 0x80736cec in in_pcblookup_local (pcbinfo=0x80dc9180, laddr= {s_addr = 50374848}, lport=339, lookupflags=1, cred=0xfe016cdad100) at /usr/src/sys/netinet/in_pcb.c:1438 1438LIST_FOREACH(phd, porthash, phd_hash) { (kgdb) p phd $1 = (struct inpcbport *) 0x9e17b100fe00 (kgdb) p pcbinfo-ipi_porthashbase[lport pcbinfo-ipi_porthashmask] $2 = {lh_first = 0x0} (kgdb) p lport $3 = 339 (kgdb) p pcbinfo-ipi_porthashmask $4 = 127 (kgdb) root@orion:/usr/obj/usr/src/sys/ORION # ^D Script done on Tue Apr 30 17:16:55 2013 Glen pgp_ehcyNbq1m.pgp Description: PGP signature
Re: panic: in_pcblookup_local (?)
On Sunday, April 28, 2013 12:02:56 am Glen Barber wrote: On Sat, Apr 27, 2013 at 10:17:32AM +0200, Ian FREISLICH wrote: Hi I've been getting the following panic on recent current r249717. Sadly the crashdump is useless. I just saw similar panic on 10-CURRENT r249588. Fatal trap 9: general protection fault while in kernel mode cpuid = 15; apic id = 0f instruction pointer = 0x20:0x80546fbc stack pointer = 0x28:0xff846b60 frame pointer = 0x28:0xff846b6777b0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 4361 (zabbix_agentd) Hmm.. This is interests me. In my case, cf-agent was the current process. Backtrace of my panic follows. Any pointers on how to debug this further would be appreciated. Glen Script started on Sat Apr 27 23:53:53 2013 root@orion:/usr/obj/usr/src/sys/ORION # kgdb ./kernel.debug /var/crash/vmcore.4 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd... Unread portion of the kernel message buffer: Fatal trap 9: general protection fault while in kernel mode cpuid = 1; apic id = 01 instruction pointer = 0x20:0x80736cec stack pointer = 0x28:0xff81aad4e760 frame pointer = 0x28:0xff81aad4e7a0 code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 78664 (cf-agent) trap number = 9 panic: general protection fault cpuid = 1 KDB: stack backtrace: #0 0x80642a56 at kdb_backtrace+0x66 #1 0x80606eeb at panic+0x13b #2 0x808e3b10 at trap_fatal+0x290 #3 0x808e4331 at trap+0x241 #4 0x808cdbb3 at calltrap+0x8 #5 0x807371d8 at in_pcb_lport+0x128 #6 0x8073745a at in_pcbbind_setup+0x16a #7 0x80737d8e at in_pcbconnect_setup+0x71e #8 0x80737df9 at in_pcbconnect_mbuf+0x59 #9 0x807bf29f at udp_connect+0x11f #10 0x80680615 at kern_connectat+0x275 #11 0x80680731 at sys_connect+0x41 #12 0x808e32cb at amd64_syscall+0x63b #13 0x808cde97 at Xfast_syscall+0xf7 Uptime: 3d19h38m52s (ada0:ahcich0:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: CCB request is in progress (ada0:ahcich0:0:0:0): Error 5, Retries exhausted (ada0:ahcich0:0:0:0): Synchronize cache failed (ada1:ahcich1:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 (ada1:ahcich1:0:0:0): CAM status: CCB request is in progress (ada1:ahcich1:0:0:0): Error 5, Retries exhausted (ada1:ahcich1:0:0:0): Synchronize cache failed (ada2:ahcich4:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 (ada2:ahcich4:0:0:0): CAM status: CCB request is in progress (ada2:ahcich4:0:0:0): Error 5, Retries exhausted (ada2:ahcich4:0:0:0): Synchronize cache failed (ada3:ahcich5:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 (ada3:ahcich5:0:0:0): CAM status: CCB request is in progress (ada3:ahcich5:0:0:0): Error 5, Retries exhausted (ada3:ahcich5:0:0:0): Synchronize cache failed Dumping 1014 out of 6049 MB:..2%..12%..21%..32%..42%..51%..62%..71%..81%..92% Reading symbols from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for /boot/kernel/zfs.ko.symbols Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for /boot/kernel/opensolaris.ko.symbols #0 doadump (textdump=value optimized out) at pcpu.h:231 231 __asm(movq %%gs:%1,%0 : =r (td) (kgdb) frame 6 #6 0x80736cec in in_pcblookup_local (pcbinfo=0x80dc9180, laddr= {s_addr = 50374848}, lport=339, lookupflags=1, cred=0xfe016cdad100) at /usr/src/sys/netinet/in_pcb.c:1438 1438 LIST_FOREACH(phd, porthash, phd_hash) { (kgdb) list *0x80736cec 0x80736cec is in in_pcblookup_local (/usr/src/sys/netinet/in_pcb.c:1439). 1434 * port hash list. 1435 */ 1436 porthash = pcbinfo-ipi_porthashbase[INP_PCBPORTHASH(lport, 1437 pcbinfo-ipi_porthashmask)]; 1438 LIST_FOREACH(phd, porthash, phd_hash) { 1439 if (phd-phd_port == lport) 1440 break; 1441 } 1442 if (phd != NULL) { 1443 /*
Re: panic: in_pcblookup_local (?)
On Mon, Apr 29, 2013 at 12:24:06PM -0400, John Baldwin wrote: On Sunday, April 28, 2013 12:02:56 am Glen Barber wrote: On Sat, Apr 27, 2013 at 10:17:32AM +0200, Ian FREISLICH wrote: Hi I've been getting the following panic on recent current r249717. Sadly the crashdump is useless. I just saw similar panic on 10-CURRENT r249588. Fatal trap 9: general protection fault while in kernel mode cpuid = 15; apic id = 0f instruction pointer = 0x20:0x80546fbc stack pointer = 0x28:0xff846b60 frame pointer = 0x28:0xff846b6777b0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 4361 (zabbix_agentd) Hmm.. This is interests me. In my case, cf-agent was the current process. Backtrace of my panic follows. Any pointers on how to debug this further would be appreciated. Glen Script started on Sat Apr 27 23:53:53 2013 root@orion:/usr/obj/usr/src/sys/ORION # kgdb ./kernel.debug /var/crash/vmcore.4 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd... Unread portion of the kernel message buffer: Fatal trap 9: general protection fault while in kernel mode cpuid = 1; apic id = 01 instruction pointer = 0x20:0x80736cec stack pointer = 0x28:0xff81aad4e760 frame pointer = 0x28:0xff81aad4e7a0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 78664 (cf-agent) trap number = 9 panic: general protection fault cpuid = 1 KDB: stack backtrace: #0 0x80642a56 at kdb_backtrace+0x66 #1 0x80606eeb at panic+0x13b #2 0x808e3b10 at trap_fatal+0x290 #3 0x808e4331 at trap+0x241 #4 0x808cdbb3 at calltrap+0x8 #5 0x807371d8 at in_pcb_lport+0x128 #6 0x8073745a at in_pcbbind_setup+0x16a #7 0x80737d8e at in_pcbconnect_setup+0x71e #8 0x80737df9 at in_pcbconnect_mbuf+0x59 #9 0x807bf29f at udp_connect+0x11f #10 0x80680615 at kern_connectat+0x275 #11 0x80680731 at sys_connect+0x41 #12 0x808e32cb at amd64_syscall+0x63b #13 0x808cde97 at Xfast_syscall+0xf7 Uptime: 3d19h38m52s (ada0:ahcich0:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: CCB request is in progress (ada0:ahcich0:0:0:0): Error 5, Retries exhausted (ada0:ahcich0:0:0:0): Synchronize cache failed (ada1:ahcich1:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 (ada1:ahcich1:0:0:0): CAM status: CCB request is in progress (ada1:ahcich1:0:0:0): Error 5, Retries exhausted (ada1:ahcich1:0:0:0): Synchronize cache failed (ada2:ahcich4:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 (ada2:ahcich4:0:0:0): CAM status: CCB request is in progress (ada2:ahcich4:0:0:0): Error 5, Retries exhausted (ada2:ahcich4:0:0:0): Synchronize cache failed (ada3:ahcich5:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 (ada3:ahcich5:0:0:0): CAM status: CCB request is in progress (ada3:ahcich5:0:0:0): Error 5, Retries exhausted (ada3:ahcich5:0:0:0): Synchronize cache failed Dumping 1014 out of 6049 MB:..2%..12%..21%..32%..42%..51%..62%..71%..81%..92% Reading symbols from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for /boot/kernel/zfs.ko.symbols Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for /boot/kernel/opensolaris.ko.symbols #0 doadump (textdump=value optimized out) at pcpu.h:231 231 __asm(movq %%gs:%1,%0 : =r (td) (kgdb) frame 6 #6 0x80736cec in in_pcblookup_local (pcbinfo=0x80dc9180, laddr= {s_addr = 50374848}, lport=339, lookupflags=1, cred=0xfe016cdad100) at /usr/src/sys/netinet/in_pcb.c:1438 1438LIST_FOREACH(phd, porthash, phd_hash) { (kgdb) list *0x80736cec 0x80736cec is in in_pcblookup_local (/usr/src/sys/netinet/in_pcb.c:1439). 1434 * port hash list. 1435 */ 1436porthash = pcbinfo-ipi_porthashbase[INP_PCBPORTHASH(lport, 1437pcbinfo-ipi_porthashmask)]; 1438LIST_FOREACH(phd, porthash,
panic: in_pcblookup_local (?)
Hi I've been getting the following panic on recent current r249717. Sadly the crashdump is useless. Fatal trap 9: general protection fault while in kernel mode cpuid = 15; apic id = 0f instruction pointer = 0x20:0x80546fbc stack pointer = 0x28:0xff846b60 frame pointer = 0x28:0xff846b6777b0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 4361 (zabbix_agentd) trap number = 9 panic: general protection fault cpuid = 15 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xff846b677410 panic() at panic+0x13d/frame 0xff846b677510 trap_fatal() at trap_fatal+0x290/frame 0xff846b677570 trap() at trap+0xff/frame 0xff846b6776b0 calltrap() at calltrap+0x8/frame 0xff846b6776b0 --- trap 0x9, rip = 0x80546fbc, rsp = 0xff846b60, rbp = 0xff846b6777b0 --- in_pcblookup_local() at in_pcblookup_local+0x5c/frame 0xff846b6777b0 in_pcb_lport() at in_pcb_lport+0x109/frame 0xff846b677820 in_pcbbind_setup() at in_pcbbind_setup+0x16a/frame 0xff846b6778a0 in_pcbconnect_setup() at in_pcbconnect_setup+0x71e/frame 0xff846b677990 in_pcbconnect_mbuf() at in_pcbconnect_mbuf+0x59/frame 0xff846b6779e0 udp_connect() at udp_connect+0x11e/frame 0xff846b677a30 kern_connectat() at kern_connectat+0x1f5/frame 0xff846b677a90 sys_connect() at sys_connect+0x41/frame 0xff846b677ad0 amd64_syscall() at amd64_syscall+0x572/frame 0xff846b677bf0 Xfast_syscall() at Xfast_syscall+0xf7/frame 0xff846b677bf0 --- syscall (98, FreeBSD ELF64, sys_connect), rip = 0x80127104a, rsp = 0x7fff97a8, rbp = 0x8014f68d4 --- Uptime: 20m13s Dumping 1688 out of 16368 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Dump complete Automatic reboot in 15 seconds - press a key on the console to abort Rebooting... cpu_reset: Restarting BSP cpu_reset_proxy: Stopped CPU 15 Ian -- Ian Freislich ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: panic: in_pcblookup_local (?)
On Sat, Apr 27, 2013 at 10:17:32AM +0200, Ian FREISLICH wrote: Hi I've been getting the following panic on recent current r249717. Sadly the crashdump is useless. I just saw similar panic on 10-CURRENT r249588. Fatal trap 9: general protection fault while in kernel mode cpuid = 15; apic id = 0f instruction pointer = 0x20:0x80546fbc stack pointer = 0x28:0xff846b60 frame pointer = 0x28:0xff846b6777b0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 4361 (zabbix_agentd) Hmm.. This is interests me. In my case, cf-agent was the current process. Backtrace of my panic follows. Any pointers on how to debug this further would be appreciated. Glen Script started on Sat Apr 27 23:53:53 2013 root@orion:/usr/obj/usr/src/sys/ORION # kgdb ./kernel.debug /var/crash/vmcore.4 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd... Unread portion of the kernel message buffer: Fatal trap 9: general protection fault while in kernel mode cpuid = 1; apic id = 01 instruction pointer = 0x20:0x80736cec stack pointer = 0x28:0xff81aad4e760 frame pointer = 0x28:0xff81aad4e7a0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 78664 (cf-agent) trap number = 9 panic: general protection fault cpuid = 1 KDB: stack backtrace: #0 0x80642a56 at kdb_backtrace+0x66 #1 0x80606eeb at panic+0x13b #2 0x808e3b10 at trap_fatal+0x290 #3 0x808e4331 at trap+0x241 #4 0x808cdbb3 at calltrap+0x8 #5 0x807371d8 at in_pcb_lport+0x128 #6 0x8073745a at in_pcbbind_setup+0x16a #7 0x80737d8e at in_pcbconnect_setup+0x71e #8 0x80737df9 at in_pcbconnect_mbuf+0x59 #9 0x807bf29f at udp_connect+0x11f #10 0x80680615 at kern_connectat+0x275 #11 0x80680731 at sys_connect+0x41 #12 0x808e32cb at amd64_syscall+0x63b #13 0x808cde97 at Xfast_syscall+0xf7 Uptime: 3d19h38m52s (ada0:ahcich0:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: CCB request is in progress (ada0:ahcich0:0:0:0): Error 5, Retries exhausted (ada0:ahcich0:0:0:0): Synchronize cache failed (ada1:ahcich1:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 (ada1:ahcich1:0:0:0): CAM status: CCB request is in progress (ada1:ahcich1:0:0:0): Error 5, Retries exhausted (ada1:ahcich1:0:0:0): Synchronize cache failed (ada2:ahcich4:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 (ada2:ahcich4:0:0:0): CAM status: CCB request is in progress (ada2:ahcich4:0:0:0): Error 5, Retries exhausted (ada2:ahcich4:0:0:0): Synchronize cache failed (ada3:ahcich5:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 (ada3:ahcich5:0:0:0): CAM status: CCB request is in progress (ada3:ahcich5:0:0:0): Error 5, Retries exhausted (ada3:ahcich5:0:0:0): Synchronize cache failed Dumping 1014 out of 6049 MB:..2%..12%..21%..32%..42%..51%..62%..71%..81%..92% Reading symbols from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for /boot/kernel/zfs.ko.symbols Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for /boot/kernel/opensolaris.ko.symbols #0 doadump (textdump=value optimized out) at pcpu.h:231 231 __asm(movq %%gs:%1,%0 : =r (td) (kgdb) bt #0 doadump (textdump=value optimized out) at pcpu.h:231 #1 0x80606a56 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:447 #2 0x80606ed5 in panic (fmt=value optimized out) at /usr/src/sys/kern/kern_shutdown.c:754 #3 0x808e3b10 in trap_fatal (frame=0x9, eva=value optimized out) at /usr/src/sys/amd64/amd64/trap.c:872 #4 0x808e4331 in trap (frame=0xff81aad4e6b0) at /usr/src/sys/amd64/amd64/trap.c:605 #5 0x808cdbb3 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:228 #6 0x80736cec in in_pcblookup_local (pcbinfo=0x80dc9180, laddr= {s_addr = 50374848}, lport=339, lookupflags=1, cred=0xfe016cdad100) at /usr/src/sys/netinet/in_pcb.c:1438 #7 0x807371d8 in in_pcb_lport (inp=0xfe016c2fb7a8, laddrp=0xff81aad4e860, lportp=0xff81aad4e86e, cred=0xfe016cdad100, lookupflags=1) at /usr/src/sys/netinet/in_pcb.c:457 #8 0x8073745a in in_pcbbind_setup