Re: panic: page fault - 6.0-RELEASE-p7 (now 6.1-RC2)

2006-05-09 Thread Nick Wood

At 09:14 AM 5/5/2006, you wrote:

Hello,

We have a group of web and mail servers that run under a moderate 
load.  We recently upgraded them from 4/5.x to 6.0.  While we 
thought we had done enough testing, apparently we hadn't and are now 
experiencing panic's on a number of the servers.  Some of our more 
heavily loaded servers have been fine for days, while others will 
crash every 6 to 36 hours.  Below are some pieces of information 
that may be helpful.


Should I be posting this to another list as well?

I know I can decrease NMBCLUSTERS dramatically, and give more memory 
to the kernel if that would help.


I've read a number of similar cases where this panic was related to 
a hardware failure, and while I can't rule that out completely, it 
does seem unusual that several servers are apparently having the 
same problem.  Could it be that hardware problems existed before the 
upgrade, but are now brought out by the increased load caused by the 
new OS version and other installed software?  We have IPMI cards in 
some of the crashing servers and they all report normal 
temperatures, fan speeds, and voltages.  Nothing unusual in the event logs.


I'm willing to dig deeper and do more testing if anyone has suggestions.


As suggested, we have upgraded to 6.1-RC2 and are experiencing the 
same or a very similar panic.  Some things have changed on the 
system, so I'll re-post our config, backtrace and dmesg again.


Also, in the past I have seen kgdb report the process that caused the 
panic, as well as some other information when first loaded, but 
sometimes it doesn't show that information - am I doing something 
wrong there?  It has shown that information before, and it has always 
been tcpserver from the ucspi-tcp-0.88_2 port.



Differences from the 6.1-RC2 GENERIC kernel config
---
#cpuI486_CPU
cpu I586_CPU
cpu I686_CPU
ident   MAIL_6_1

options SUIDDIR
options QUOTA
options IPFIREWALL
options IPFIREWALL_VERBOSE
options IPFIREWALL_VERBOSE_LIMIT=10
options NMBCLUSTERS=65536
options KVA_PAGES=640
options VM_KMEM_SIZE_MAX=(512*1048576)
options VM_KMEM_SIZE_SCALE=2

options ASR_COMPAT

options SHMMAXPGS=131072
options SEMMNI=128
options SEMMNS=512
options SEMUME=100
options SEMMNU=256
---

---
mail-da-5# kgdb /boot/kernel/kernel.debug vmcore.8
[GDB will not be able to debug user-mode threads: 
/usr/lib/libthread_db.so: Undefined symbol ps_pglobal_lookup]

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-marcel-freebsd.

Unread portion of the kernel message buffer:


#0  doadump () at pcpu.h:165
165 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) bt
#0  doadump () at pcpu.h:165
#1  0x6064e239 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:402
#2  0x6064e4d0 in panic (fmt=0x60894857 %s) at 
/usr/src/sys/kern/kern_shutdown.c:558
#3  0x608496d4 in trap_fatal (frame=0x9c8f7ad8, eva=172) at 
/usr/src/sys/i386/i386/trap.c:836
#4  0x6084943b in trap_pfault (frame=0x9c8f7ad8, usermode=0, eva=172) 
at /usr/src/sys/i386/i386/trap.c:744

#5  0x60849079 in trap (frame=
  {tf_fs = 1619263496, tf_es = 1627652136, tf_ds = 40, tf_edi = 
55, tf_esi = 0, tf_ebp = -1668318412, tf_isp = -1668318460, tf_ebx = 
-1668318064, tf_edx = 1738397568, tf_ecx = 0, tf_eax = 4, tf_trapno = 
12, tf_err = 2, tf_eip = 1617891744, tf_cs = 32, tf_eflags = 66182, 
tf_esp = 1835631104, tf_ss = 0})

at /usr/src/sys/i386/i386/trap.c:434
#6  0x6083890a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#7  0x606f11a0 in ip_ctloutput (so=0x4, sopt=0x9c8f7c90) at atomic.h:146
#8  0x6070123b in tcp_ctloutput (so=0x66b34858, sopt=0x9c8f7c90) at 
/usr/src/sys/netinet/tcp_usrreq.c:1038
#9  0x60687c04 in sosetopt (so=0x66b34858, sopt=0x9c8f7c90) at 
/usr/src/sys/kern/uipc_socket.c:1560
#10 0x6068ce95 in kern_setsockopt (td=0x679dd780, s=0, level=4, 
name=4, val=0x679dd780, valseg=UIO_USERSPACE,

valsize=0) at /usr/src/sys/kern/uipc_syscalls.c:1351
#11 0x6068cdc6 in setsockopt (td=0x679dd780, uap=0x4) at 
/usr/src/sys/kern/uipc_syscalls.c:1307

#12 0x608499eb in syscall (frame=
  {tf_fs = 1606352955, tf_es = 59, tf_ds = 1606352955, tf_edi = 
1606413432, tf_esi = 3, tf_ebp = 1606413224, tf_isp = -1668317852, 
tf_ebx = 0, tf_edx = 2, tf_ecx = 134545464, tf_eax = 105, tf_trapno = 
12, tf_err = 2, tf_eip = 672065711, tf_cs = 51, tf_eflags = 514, 
tf_esp = 1606413180, tf_ss = 59})

at /usr/src/sys/i386/i386/trap.c:981
#13 0x6083895f 

panic: page fault - 6.0-RELEASE-p7

2006-05-05 Thread Nick Wood

Hello,

We have a group of web and mail servers that run under a moderate 
load.  We recently upgraded them from 4/5.x to 6.0.  While we thought 
we had done enough testing, apparently we hadn't and are now 
experiencing panic's on a number of the servers.  Some of our more 
heavily loaded servers have been fine for days, while others will 
crash every 6 to 36 hours.  Below are some pieces of information that 
may be helpful.


Should I be posting this to another list as well?

I know I can decrease NMBCLUSTERS dramatically, and give more memory 
to the kernel if that would help.


I've read a number of similar cases where this panic was related to a 
hardware failure, and while I can't rule that out completely, it does 
seem unusual that several servers are apparently having the same 
problem.  Could it be that hardware problems existed before the 
upgrade, but are now brought out by the increased load caused by the 
new OS version and other installed software?  We have IPMI cards in 
some of the crashing servers and they all report normal temperatures, 
fan speeds, and voltages.  Nothing unusual in the event logs.


I'm willing to dig deeper and do more testing if anyone has suggestions.

Differences from GENERIC:
--
#cpuI486_CPU
#cpuI586_CPU
cpu I686_CPU
ident   PAYMAIL

options SUIDDIR
options QUOTA
options IPFIREWALL
options IPFIREWALL_VERBOSE
options IPFIREWALL_VERBOSE_LIMIT=10
options NMBCLUSTERS=65536
options KVA_PAGES=640
options VM_KMEM_SIZE_MAX=(512*1048576)
options VM_KMEM_SIZE_SCALE=2

options ASR_COMPAT

options SHMMAXPGS=131072
options SEMMNI=128
options SEMMNS=512
options SEMUME=100
options SEMMNU=256
--

--
mail-da-2# kgdb /boot/kernel/kernel.debug vmcore.2
[GDB will not be able to debug user-mode threads: 
/usr/lib/libthread_db.so: Undefined symbol ps_pglobal_lookup]

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-marcel-freebsd.

Unread portion of the kernel message buffer:
ber = 12
panic: page fault
Uptime: 1d6h4m36s
Dumping 2047 MB (3 chunks)
  chunk 0: 1MB (158 pages) ... ok
  chunk 1: 2046MB (523773 pages) 2031 2015 1999 1983 1967 1951 1935 
1919 1903 1887 1871 1855 1839 1823 1807 1791 1775 1759 1743 1727 1711 
1695 1679 1663 1647 1631 1615 1599 1583 1567 1551 1535 1519 1503 1487 
1471 1455 1439 1423 1407 1391 1375 1359 1343 1327 1311 1295 1279 1263 
1247 1231 1215 1199 1183 1167 1151 1135 1119 1103 1087 1071 1055 1039 
1023 1007 991 975 959 943 927 911 895 879 863 847 831 815 799 783 767 
751 735 719 703 687 671 655 639 623 607 591 575 559 543 527 511 495 
479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 
207 191 175 159 143 127 111 95 79 63 47 31 15 ... ok

  chunk 2: 1MB (128 pages)

#0  doadump () at pcpu.h:165
165 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) bt
#0  doadump () at pcpu.h:165
#1  0x606384aa in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399
#2  0x60638740 in panic (fmt=0x6085598b %s) at 
/usr/src/sys/kern/kern_shutdown.c:555
#3  0x6080ebf8 in trap_fatal (frame=0x9c497ad8, eva=172) at 
/usr/src/sys/i386/i386/trap.c:831
#4  0x6080e963 in trap_pfault (frame=0x9c497ad8, usermode=0, eva=172) 
at /usr/src/sys/i386/i386/trap.c:742

#5  0x6080e5c1 in trap (frame=
  {tf_fs = 1692663816, tf_es = 1680080936, tf_ds = 40, tf_edi = 
55, tf_esi = 0, tf_ebp = -1672905932, tf_isp = -1672905980, tf_ebx = 
-1672905584, tf_edx = 1677080448, tf_ecx = 0, tf_eax = 4, tf_trapno = 
12, tf_err = 2, tf_eip = 1617791092, tf_cs = 32, tf_eflags = 66182, 
tf_esp = 1773435648, tf_ss = 0}) at /usr/src/sys/i386/i386/trap.c:432

#6  0x607fe6aa in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#7  0x606d8874 in ip_ctloutput (so=0x4, sopt=0x9c497c90) at atomic.h:146
#8  0x606e88ef in tcp_ctloutput (so=0x64e419bc, sopt=0x9c497c90) at 
/usr/src/sys/netinet/tcp_usrreq.c:1036
#9  0x60671c00 in sosetopt (so=0x64e419bc, sopt=0x9c497c90) at 
/usr/src/sys/kern/uipc_socket.c:1553
#10 0x60676e5d in kern_setsockopt (td=0x63f63780, s=0, level=4, 
name=4, val=0x63f63780, valseg=UIO_USERSPACE, valsize=0)

at /usr/src/sys/kern/uipc_syscalls.c:1331
#11 0x60676d8e in setsockopt (td=0x63f63780, uap=0x4) at 
/usr/src/sys/kern/uipc_syscalls.c:1287

#12 0x6080ef0f in syscall (frame=
  {tf_fs = 1606352955, tf_es = 59, tf_ds = 1606352955, tf_edi = 
1606413432, tf_esi = 3, tf_ebp = 1606413224, tf_isp = -1672905372, 
tf_ebx = 0, tf_edx = 

Re: panic: page fault - 6.0-RELEASE-p7

2006-05-05 Thread Kris Kennaway
On Fri, May 05, 2006 at 09:14:04AM -0600, Nick Wood wrote:
 Hello,
 
 We have a group of web and mail servers that run under a moderate 
 load.  We recently upgraded them from 4/5.x to 6.0.  While we thought 
 we had done enough testing, apparently we hadn't and are now 
 experiencing panic's on a number of the servers.  Some of our more 
 heavily loaded servers have been fine for days, while others will 
 crash every 6 to 36 hours.  Below are some pieces of information that 
 may be helpful.

Try 6.1 first in case the bug is already fixed.

Kris


pgpPzXZAJeLJa.pgp
Description: PGP signature