Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-11-24 Thread Michael Grant
On Thu, Sep 11, 2008 at 11:56 AM, Jeremy Chadwick [EMAIL PROTECTED] wrote:
 On Thu, Sep 11, 2008 at 12:08:47PM +0200, Michael Grant wrote:
 On Thu, Sep 11, 2008 at 11:20 AM, Jeremy Chadwick [EMAIL PROTECTED] wrote:
  On Thu, Sep 11, 2008 at 10:38:36AM +0200, Michael Grant wrote:
  My box crashed again:
 
  panic: kmem_malloc(4096): kmem_map too small: 1073741824 total allocated
  cpuid = 0
  Uptime: 33d11h12m58s
  Dumping 3327 MB (2 chunks)
chunk 0: 1MB (151 pages) ... ok
chunk 1: 3327MB (851568 pages)  ---hung here
 
  Still no valid dump.
 
  There is 4gig of physical memory in the machine.
 
  In /boot/loader.conf, I currently have the following:
 
  vm.kmem_size=1G
  vm.kmem_size_max=1G
  vm.kmem_size_scale=2
 
  and in my kernel conf file I have:
 
  options KVA_PAGES=512
 
  It stayed up for 33 days this time.  Is there anything else I can do?
 
  First and foremost: are you using ZFS on this machine?  If so, there are
  many tunables you can apply to try and limit this; I'm willing to bet
  it's ARC which is doing it.  See below.
 
  In general, it appears that you need to increase the maximum range of
  kmem.  The kernel attempted to utilise more than 1GB, and your limit is
  1G.  My machines running RELENG_7 on amd64, with only 2GB of RAM
  installed, use the following tunables in loader.conf:
 
  vm.kmem_size=1536M
  vm.kmem_size_max=1536M
 
  If ZFS is in use, I recommend these as well:
 
  vfs.zfs.arc_min=16M
  vfs.zfs.arc_max=64M
  vfs.zfs.prefetch_disable=1
 
  Do not increase kmem_size any larger than 1.5GB; the amount of RAM you
  have in the machine, with regards to RELENG_7, will not help.  This is a
  known limitation which has been fixed in HEAD/CURRENT (where the limit
  has been increased to 512GB).  See the Kernel section below; you'll
  see the applicable item.
 
  http://wiki.freebsd.org/JeremyChadwick/Commonly_reported_issues
 
  Your only solution may be to run HEAD/CURRENT.

 I am not running ZFS.  My file systems are ufs.

 This feels like some sort of memory leak in the kernel.  Giving it
 more and more memory just seems to delay the crash.  Are you saying
 the crash is fixed in HEAD/CURRENT?

 It's an intentional crash, not the program tried to access NULL, which
 crashed the machine crash.  The kernel wants more memory to accomplish
 a certain thing, and it's not available.  kris@ can explain this in
 better terms than I can.

 First and foremost, it would be good to find out what all you are
 running on this machine (process-wise).  A process could be tickling
 something in the kernel which requires a large amount of memory to be
 required.  I can imagine something like MySQL would require this.

 Ideally what needs to happen is to debug the kernel or get a full map
 of kmem to find out what's using what.  I believe vmstat -m or vmstat -z
 output might help.

 Obviously since the machine panics, you won't be able to run those
 commands after the fact.  I would recommend you set up a cronjob that
 runs every 1-2 minutes and logs the output of both of those commands
 to a file.  When the panic happens, restart the system and look at
 the logfile to see if you can figure out if anything suddenly starts
 taking up a large amount of memory, or if it's a gradual thing
 (indicating a memory leak).

 If you can figure out what might be tickling the problem, you can
 ultimately figure out if increasing kmem is the right thing to do, or if
 there's a greater problem here.

 I'm running 6.3 by the way.

 I have put your changes into my loader.conf, we'll see how long it
 goes this time.  I'm not qute in position to update everything to 7.x
 at the moment.

 Our production webservers run RELENG_6 and RELENG_7, and we don't
 encounter this kind of problem.  I'm not saying what you're experiencing
 is indicative of hardware issues or something like that -- I'm simply
 saying I have loaded systems which don't ever hit that condition.  So
 figuring out what's causing it in your case would be good.


This appears to be too high as the machine reboots immediately after the fsck:

  vm.kmem_size=1536M
  vm.kmem_size_max=1536M

Returning it to 1G, it panics again about a month later.

Here's vmstat -m and -z roughly 1 minute before it crashed (I was
logging to a file every minute via cron):

Fri Nov 21 15:15:00 EST 2008
 Type InUse MemUse HighUse Requests  Size(s)
  pfs_vncache 2 1K   -   864205  32
 GEOM   16824K   -   416279  16,32,64,128,256,512,1024,2048,4096
   isadev17 2K   -   17  64
   CAM periph 1 1K   -1  128
 cdev26 4K   -   26  128
CAM queue 3 1K   -3  16
file desc   739   474K   - 284943537  16,32,64,256,512,1024,2048,4096
sigio 3 1K   - 4802  32
 kenv   116 8K   -  118  16,32,64,4096
   kqueue   246   154K   - 17652506  256,1024
proc-args   15310K   - 107101480  

Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-09-11 Thread Michael Grant
My box crashed again:

panic: kmem_malloc(4096): kmem_map too small: 1073741824 total allocated
cpuid = 0
Uptime: 33d11h12m58s
Dumping 3327 MB (2 chunks)
  chunk 0: 1MB (151 pages) ... ok
  chunk 1: 3327MB (851568 pages)  ---hung here

Still no valid dump.

There is 4gig of physical memory in the machine.

In /boot/loader.conf, I currently have the following:

vm.kmem_size=1G
vm.kmem_size_max=1G
vm.kmem_size_scale=2

and in my kernel conf file I have:

options KVA_PAGES=512

It stayed up for 33 days this time.  Is there anything else I can do?

Michael Grant
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-09-11 Thread Jeremy Chadwick
On Thu, Sep 11, 2008 at 10:38:36AM +0200, Michael Grant wrote:
 My box crashed again:
 
 panic: kmem_malloc(4096): kmem_map too small: 1073741824 total allocated
 cpuid = 0
 Uptime: 33d11h12m58s
 Dumping 3327 MB (2 chunks)
   chunk 0: 1MB (151 pages) ... ok
   chunk 1: 3327MB (851568 pages)  ---hung here
 
 Still no valid dump.
 
 There is 4gig of physical memory in the machine.
 
 In /boot/loader.conf, I currently have the following:
 
 vm.kmem_size=1G
 vm.kmem_size_max=1G
 vm.kmem_size_scale=2
 
 and in my kernel conf file I have:
 
 options KVA_PAGES=512
 
 It stayed up for 33 days this time.  Is there anything else I can do?

First and foremost: are you using ZFS on this machine?  If so, there are
many tunables you can apply to try and limit this; I'm willing to bet
it's ARC which is doing it.  See below.

In general, it appears that you need to increase the maximum range of
kmem.  The kernel attempted to utilise more than 1GB, and your limit is
1G.  My machines running RELENG_7 on amd64, with only 2GB of RAM
installed, use the following tunables in loader.conf:

vm.kmem_size=1536M
vm.kmem_size_max=1536M

If ZFS is in use, I recommend these as well:

vfs.zfs.arc_min=16M
vfs.zfs.arc_max=64M
vfs.zfs.prefetch_disable=1

Do not increase kmem_size any larger than 1.5GB; the amount of RAM you
have in the machine, with regards to RELENG_7, will not help.  This is a
known limitation which has been fixed in HEAD/CURRENT (where the limit
has been increased to 512GB).  See the Kernel section below; you'll
see the applicable item.

http://wiki.freebsd.org/JeremyChadwick/Commonly_reported_issues

Your only solution may be to run HEAD/CURRENT.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-09-11 Thread Michael Grant
On Thu, Sep 11, 2008 at 11:20 AM, Jeremy Chadwick [EMAIL PROTECTED] wrote:
 On Thu, Sep 11, 2008 at 10:38:36AM +0200, Michael Grant wrote:
 My box crashed again:

 panic: kmem_malloc(4096): kmem_map too small: 1073741824 total allocated
 cpuid = 0
 Uptime: 33d11h12m58s
 Dumping 3327 MB (2 chunks)
   chunk 0: 1MB (151 pages) ... ok
   chunk 1: 3327MB (851568 pages)  ---hung here

 Still no valid dump.

 There is 4gig of physical memory in the machine.

 In /boot/loader.conf, I currently have the following:

 vm.kmem_size=1G
 vm.kmem_size_max=1G
 vm.kmem_size_scale=2

 and in my kernel conf file I have:

 options KVA_PAGES=512

 It stayed up for 33 days this time.  Is there anything else I can do?

 First and foremost: are you using ZFS on this machine?  If so, there are
 many tunables you can apply to try and limit this; I'm willing to bet
 it's ARC which is doing it.  See below.

 In general, it appears that you need to increase the maximum range of
 kmem.  The kernel attempted to utilise more than 1GB, and your limit is
 1G.  My machines running RELENG_7 on amd64, with only 2GB of RAM
 installed, use the following tunables in loader.conf:

 vm.kmem_size=1536M
 vm.kmem_size_max=1536M

 If ZFS is in use, I recommend these as well:

 vfs.zfs.arc_min=16M
 vfs.zfs.arc_max=64M
 vfs.zfs.prefetch_disable=1

 Do not increase kmem_size any larger than 1.5GB; the amount of RAM you
 have in the machine, with regards to RELENG_7, will not help.  This is a
 known limitation which has been fixed in HEAD/CURRENT (where the limit
 has been increased to 512GB).  See the Kernel section below; you'll
 see the applicable item.

 http://wiki.freebsd.org/JeremyChadwick/Commonly_reported_issues

 Your only solution may be to run HEAD/CURRENT.

I am not running ZFS.  My file systems are ufs.

This feels like some sort of memory leak in the kernel.  Giving it
more and more memory just seems to delay the crash.  Are you saying
the crash is fixed in HEAD/CURRENT?

I'm running 6.3 by the way.

I have put your changes into my loader.conf, we'll see how long it
goes this time.  I'm not qute in position to update everything to 7.x
at the moment.

Michael Grant
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-09-11 Thread Jeremy Chadwick
On Thu, Sep 11, 2008 at 12:08:47PM +0200, Michael Grant wrote:
 On Thu, Sep 11, 2008 at 11:20 AM, Jeremy Chadwick [EMAIL PROTECTED] wrote:
  On Thu, Sep 11, 2008 at 10:38:36AM +0200, Michael Grant wrote:
  My box crashed again:
 
  panic: kmem_malloc(4096): kmem_map too small: 1073741824 total allocated
  cpuid = 0
  Uptime: 33d11h12m58s
  Dumping 3327 MB (2 chunks)
chunk 0: 1MB (151 pages) ... ok
chunk 1: 3327MB (851568 pages)  ---hung here
 
  Still no valid dump.
 
  There is 4gig of physical memory in the machine.
 
  In /boot/loader.conf, I currently have the following:
 
  vm.kmem_size=1G
  vm.kmem_size_max=1G
  vm.kmem_size_scale=2
 
  and in my kernel conf file I have:
 
  options KVA_PAGES=512
 
  It stayed up for 33 days this time.  Is there anything else I can do?
 
  First and foremost: are you using ZFS on this machine?  If so, there are
  many tunables you can apply to try and limit this; I'm willing to bet
  it's ARC which is doing it.  See below.
 
  In general, it appears that you need to increase the maximum range of
  kmem.  The kernel attempted to utilise more than 1GB, and your limit is
  1G.  My machines running RELENG_7 on amd64, with only 2GB of RAM
  installed, use the following tunables in loader.conf:
 
  vm.kmem_size=1536M
  vm.kmem_size_max=1536M
 
  If ZFS is in use, I recommend these as well:
 
  vfs.zfs.arc_min=16M
  vfs.zfs.arc_max=64M
  vfs.zfs.prefetch_disable=1
 
  Do not increase kmem_size any larger than 1.5GB; the amount of RAM you
  have in the machine, with regards to RELENG_7, will not help.  This is a
  known limitation which has been fixed in HEAD/CURRENT (where the limit
  has been increased to 512GB).  See the Kernel section below; you'll
  see the applicable item.
 
  http://wiki.freebsd.org/JeremyChadwick/Commonly_reported_issues
 
  Your only solution may be to run HEAD/CURRENT.
 
 I am not running ZFS.  My file systems are ufs.
 
 This feels like some sort of memory leak in the kernel.  Giving it
 more and more memory just seems to delay the crash.  Are you saying
 the crash is fixed in HEAD/CURRENT?

It's an intentional crash, not the program tried to access NULL, which
crashed the machine crash.  The kernel wants more memory to accomplish
a certain thing, and it's not available.  kris@ can explain this in
better terms than I can.

First and foremost, it would be good to find out what all you are
running on this machine (process-wise).  A process could be tickling
something in the kernel which requires a large amount of memory to be
required.  I can imagine something like MySQL would require this.

Ideally what needs to happen is to debug the kernel or get a full map
of kmem to find out what's using what.  I believe vmstat -m or vmstat -z
output might help.

Obviously since the machine panics, you won't be able to run those
commands after the fact.  I would recommend you set up a cronjob that
runs every 1-2 minutes and logs the output of both of those commands
to a file.  When the panic happens, restart the system and look at
the logfile to see if you can figure out if anything suddenly starts
taking up a large amount of memory, or if it's a gradual thing
(indicating a memory leak).

If you can figure out what might be tickling the problem, you can
ultimately figure out if increasing kmem is the right thing to do, or if
there's a greater problem here.

 I'm running 6.3 by the way.
 
 I have put your changes into my loader.conf, we'll see how long it
 goes this time.  I'm not qute in position to update everything to 7.x
 at the moment.

Our production webservers run RELENG_6 and RELENG_7, and we don't
encounter this kind of problem.  I'm not saying what you're experiencing
is indicative of hardware issues or something like that -- I'm simply
saying I have loaded systems which don't ever hit that condition.  So
figuring out what's causing it in your case would be good.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-07-24 Thread John Sullivan
 
 Removing KDB_UNATTENDED from your kernel will allow you 
 to interact with the debugger and obtain backtraces etc, 
 which is useful when dumps are not being saved.
 
 Easier said than done, this cause a few panics - no dumps 
 though ...g!!
 
 Still the same result ... the system seems to panic twice 
 then hang.  I will keep trying unless you have some other ideas??

Right, after trying for a number of days the system still just hung without 
letting me get either a dump or to interactively debug
in the failed state, I reverted back to the Generic kernel, removed half the 
memory (2 of the 4 1GB sticks) and the system became
stable.  I inserted 1 of the 2 removed sticks and all was fine.  I swapped that 
stick with the remaining stick and all was fine.  I
put them both back in and I started to see the crashes again - the first of 
which, gave me this dump --

server251# kgdb /boot/kernel/kernel /var/crash/vmcore.1
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
Undefined symbol ps_pglobal_lookup]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as amd64-marcel-freebsd.

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address= 0xb0
fault code= supervisor read data, page not present
instruction pointer= 0x8:0x8068d4bd
stack pointer= 0x10:0xb20738e0
frame pointer= 0x10:0x0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process= 72836 (objdump)
trap number= 12
panic: page fault
cpuid = 1
Uptime: 28m4s
Physical memory: 4082 MB
Dumping 518 MB: 503 487 471 455 439 423 407 391 375 359 343 327 311 295 279 263 
247 231 215 199 183 167 151 135 119 103 87 71 55 39
23 7

#0  doadump () at pcpu.h:194
194pcpu.h: No such file or directory.
in pcpu.h
(kgdb) backtrace
#0  doadump () at pcpu.h:194
#1  0x0004 in ?? ()
#2  0x80477699 in boot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:409
#3  0x80477a9d in panic (fmt=0x104 Address 0x104 out of bounds)
at /usr/src/sys/kern/kern_shutdown.c:563
#4  0x8072ed44 in trap_fatal (frame=0xff003c39c000, 
eva=18446742974629017808) at /usr/src/sys/amd64/amd64/trap.c:724
#5  0x8072f115 in trap_pfault (frame=0xb2073830, usermode=0)
at /usr/src/sys/amd64/amd64/trap.c:641
#6  0x8072fa58 in trap (frame=0xb2073830)
at /usr/src/sys/amd64/amd64/trap.c:410
#7  0x807156be in calltrap ()
at /usr/src/sys/amd64/amd64/exception.S:169
#8  0x8068d4bd in vm_page_cache_remove (m=0xff00da9ec3b8)
at /usr/src/sys/vm/vm_page.c:896
#9  0x8068e1b5 in vm_page_alloc (object=0xff00374ffc30, pindex=14, 
req=64) at /usr/src/sys/vm/vm_page.c:1080
#10 0x8067fa77 in vm_fault (map=0xff0005f23d00, vaddr=34365804544, 
fault_type=1 '\001', fault_flags=0) at /usr/src/sys/vm/vm_fault.c:432
#11 0x8072efaf in trap_pfault (frame=0xb2073c70, usermode=1)
at /usr/src/sys/amd64/amd64/trap.c:618
#12 0x8072fbf8 in trap (frame=0xb2073c70)
at /usr/src/sys/amd64/amd64/trap.c:309
#13 0x807156be in calltrap ()
at /usr/src/sys/amd64/amd64/exception.S:169
#14 0x00080059c54f in ?? ()
Previous frame inner to this frame (corrupt stack?)

So to answer your question are the backtraces always the same, no, they are 
not.  But I am still confused as to what this means??

I would appreciate any further insight anyone can give.

Thanks

John



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-07-24 Thread Kris Kennaway

John Sullivan wrote:
 
Removing KDB_UNATTENDED from your kernel will allow you 
to interact with the debugger and obtain backtraces etc, 
which is useful when dumps are not being saved.
Easier said than done, this cause a few panics - no dumps 
though ...g!!


Still the same result ... the system seems to panic twice 
then hang.  I will keep trying unless you have some other ideas??


Right, after trying for a number of days the system still just hung without 
letting me get either a dump or to interactively debug
in the failed state, I reverted back to the Generic kernel, removed half the 
memory (2 of the 4 1GB sticks) and the system became
stable.  I inserted 1 of the 2 removed sticks and all was fine.  I swapped that 
stick with the remaining stick and all was fine.  I
put them both back in and I started to see the crashes again - the first of which, 
gave me this dump --

server251# kgdb /boot/kernel/kernel /var/crash/vmcore.1
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined 
symbol ps_pglobal_lookup]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as amd64-marcel-freebsd.

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address= 0xb0
fault code= supervisor read data, page not present
instruction pointer= 0x8:0x8068d4bd
stack pointer= 0x10:0xb20738e0
frame pointer= 0x10:0x0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process= 72836 (objdump)
trap number= 12
panic: page fault
cpuid = 1
Uptime: 28m4s
Physical memory: 4082 MB
Dumping 518 MB: 503 487 471 455 439 423 407 391 375 359 343 327 311 295 279 263 
247 231 215 199 183 167 151 135 119 103 87 71 55 39
23 7

#0  doadump () at pcpu.h:194
194pcpu.h: No such file or directory.
in pcpu.h
(kgdb) backtrace
#0  doadump () at pcpu.h:194
#1  0x0004 in ?? ()
#2  0x80477699 in boot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:409
#3  0x80477a9d in panic (fmt=0x104 Address 0x104 out of bounds)
at /usr/src/sys/kern/kern_shutdown.c:563
#4  0x8072ed44 in trap_fatal (frame=0xff003c39c000, 
eva=18446742974629017808) at /usr/src/sys/amd64/amd64/trap.c:724

#5  0x8072f115 in trap_pfault (frame=0xb2073830, usermode=0)
at /usr/src/sys/amd64/amd64/trap.c:641
#6  0x8072fa58 in trap (frame=0xb2073830)
at /usr/src/sys/amd64/amd64/trap.c:410
#7  0x807156be in calltrap ()
at /usr/src/sys/amd64/amd64/exception.S:169
#8  0x8068d4bd in vm_page_cache_remove (m=0xff00da9ec3b8)
at /usr/src/sys/vm/vm_page.c:896
#9  0x8068e1b5 in vm_page_alloc (object=0xff00374ffc30, pindex=14, 
req=64) at /usr/src/sys/vm/vm_page.c:1080
#10 0x8067fa77 in vm_fault (map=0xff0005f23d00, vaddr=34365804544, 
fault_type=1 '\001', fault_flags=0) at /usr/src/sys/vm/vm_fault.c:432

#11 0x8072efaf in trap_pfault (frame=0xb2073c70, usermode=1)
at /usr/src/sys/amd64/amd64/trap.c:618
#12 0x8072fbf8 in trap (frame=0xb2073c70)
at /usr/src/sys/amd64/amd64/trap.c:309
#13 0x807156be in calltrap ()
at /usr/src/sys/amd64/amd64/exception.S:169
#14 0x00080059c54f in ?? ()
Previous frame inner to this frame (corrupt stack?)

So to answer your question are the backtraces always the same, no, they are 
not.  But I am still confused as to what this means??

I would appreciate any further insight anyone can give.


That's another corrupted backtrace that doesn't point to an actual 
software problem.  Still sounds like bad RAM, or bad hardware.


Kris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-07-24 Thread Chuck Swiger

On Jul 24, 2008, at 9:15 AM, John Sullivan wrote:
Right, after trying for a number of days the system still just hung  
without letting me get either a dump or to interactively debug
in the failed state, I reverted back to the Generic kernel, removed  
half the memory (2 of the 4 1GB sticks) and the system became
stable.  I inserted 1 of the 2 removed sticks and all was fine.  I  
swapped that stick with the remaining stick and all was fine.  I
put them both back in and I started to see the crashes again - the  
first of which, gave me this dump --


You might want to double-check the detailed documentation about your  
motherboard.


There are a fair number of consumer-grade motherboards that can't  
reliably handle 4 double-sided DIMMs at full speed.  Some of them  
require you to downgrade the memory clock from, say, PC3200 (aka  
200MHz DDR) down to PC2700 speed (aka 166MHz DDR); others may work,  
but only if you install the more expensive buffered type of RAM (which  
also tend to include ECC) rather than generic unbuffered RAM.


Regards,
--
-Chuck

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-07-24 Thread Michael Grant
I have been having what seems like similar panics.  I too cannot
manage to get a crash dump, neither classic style nor minidump.  Nor
can I get it to work with DDB, there seems to be a problem with DDB
and my Geom mirror.

Kris recommended I up kmem_size which I have done (twice now) and
since the last time I upped it, the machine has not crashed again
(yet?).  For the moment, I'm hoping things are stable.

In /boot/loader.conf, I currently have the following:

vm.kmem_size=1G
vm.kmem_size_max=1G
vm.kmem_size_scale=2

and in my kernel conf file I have:

options KVA_PAGES=512

Here's what top says currently:

last pid: 57367;  load averages:  0.56,  0.54,  0.61
up 2+10:16:57  15:50:55
407 processes: 6 running, 378 sleeping, 2 zombie, 21 waiting
CPU states:  0.1% user,  0.0% nice,  2.3% system,  0.7% interrupt, 97.0% idle
Mem: 1309M Active, 1291M Inact, 497M Wired, 155M Cache, 199M Buf, 7408K Free
Swap: 9541M Total, 1628K Used, 9540M Free

Is this a heavily loaded machine?  It's using a lot of memory, but
it's mostly idle.

I have 2 sticks of double-sided memory (4gig total) in the box.  The
SuperMicro documentation recommends using single sided sticks for 6 or
more sticks.

I feel for you John, I've lost many nights sleep in the last couple
weeks trying to understand why this production box was crashing.  I
was really surprised to see this start happening, normally my freebsd
boxes have uptimes in terms of years, not hours.

Michael Grant
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-07-24 Thread Kris Kennaway

Michael Grant wrote:

I have been having what seems like similar panics.  I too cannot
manage to get a crash dump, neither classic style nor minidump.  Nor
can I get it to work with DDB, there seems to be a problem with DDB
and my Geom mirror.


They're not at all similar, please don't confuse the issue :)

Kris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-07-24 Thread john



   I feel for you John, I've lost many nights sleep in the last couple

weeks trying to understand why this production box was crashing.  I
was really surprised to see this start happening, normally my freebsd
boxes have uptimes in terms of years, not hours.


  Thanks for the sentiment, at last I have been able to smile about  
this problem - maybe we should start a support group ... I'll start  
... Hi, I'm John and I'm a failing sys admin, I haven't had a panic  
for 2 hours now and I'm taking it just 1 tick at a time ;-)


Just to share with the group, I had an email from Kris off of the list  
that made a lot of sense.  I'm beginning to agree with him that it is  
probably a hardware issue.  I'll go quiet now and spend some money on  
different hardware.  For anyone who finds this thread on Google, I can  
only echo Michael's comments - the thing that makes these panics so  
infuriating is that even with dodgy old hardware FreeBSD has always  
proven to be a very stable OS for me and as you can see, the community  
is always willing to help.


Thanks to all that have spent time on this issue for me.

  John


This message was sent using IMP, the Internet Messaging Program.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-07-16 Thread John Sullivan

 Could be memory, but I'd also suggest looking at 
 temperatures. I've had overheating systems produce lots of 
 such errors.

Temperature is fine - it never get's that hot here in the UK ;-)  Seriously, I 
put my hand in the box, touched a few heat sync's, it
is not running hot enough to cause a problem.  The BIOS reports that all is 
well with the temperature inside the box of just over 30
degrees C.

John


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-07-16 Thread John Sullivan
 
 John, a question, how is swap set up on your system?  I was 
 swapping to a file (a memory disk device /dev/md0).  I was 
 doing this because for some reason lost in ancient history, 
 this machine was not set up with a real swap partition.  
 Hence, no crash dump.

Swap is a partition on the 1st disk.

 Last night I repartitioned a second disk, set up a real swap 
 partition and now I'm currently waiting for this to happen 
 again so I can get a crash dump.

I will try creating a swap partition on my second drive to see if that improves 
things ... I am able to cause a panic on demand
but a crash dump is rarely written (presumably because the system believes the 
device is not accessible?).  I must have crashed it
10-20 times now  with various corruptions of the panic screen - once it had 
blue text with trap 12 trap 12 all over the screen, I
liked that one ;-).

I did manage to complete a make index while the background FSCK was running, 
once it had finished, performing the same task caused
a panic locking the machine up again with no crash dump.

John


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-07-16 Thread Kris Kennaway

John Sullivan wrote:
 
John, a question, how is swap set up on your system?  I was 
swapping to a file (a memory disk device /dev/md0).  I was 
doing this because for some reason lost in ancient history, 
this machine was not set up with a real swap partition.  
Hence, no crash dump.


Swap is a partition on the 1st disk.

Last night I repartitioned a second disk, set up a real swap 
partition and now I'm currently waiting for this to happen 
again so I can get a crash dump.


I will try creating a swap partition on my second drive to see if that improves things 
... I am able to cause a panic on demand
but a crash dump is rarely written (presumably because the system believes the 
device is not accessible?).  I must have crashed it
10-20 times now  with various corruptions of the panic screen - once it had blue text 
with trap 12 trap 12 all over the screen, I
liked that one ;-).

I did manage to complete a make index while the background FSCK was running, 
once it had finished, performing the same task caused
a panic locking the machine up again with no crash dump.


OK, the first thing to do is disable bg fsck, then force a full fsck of 
all filesystems.  bg fsck does a poor job of fixing arbitrary filesystem 
corruption (it's not designed to do so, in fact), and you can get into a 
situation where corrupted filesystems cause further panics.


Removing KDB_UNATTENDED from your kernel will allow you to interact with 
the debugger and obtain backtraces etc, which is useful when dumps are 
not being saved.


Kris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-07-16 Thread Michael Grant
On Wed, Jul 16, 2008 at 10:38 AM, John Sullivan [EMAIL PROTECTED] wrote:

 Could be memory, but I'd also suggest looking at
 temperatures. I've had overheating systems produce lots of
 such errors.

 Temperature is fine - it never get's that hot here in the UK ;-)  Seriously, 
 I put my hand in the box, touched a few heat sync's, it
 is not running hot enough to cause a problem.  The BIOS reports that all is 
 well with the temperature inside the box of just over 30
 degrees C.

 John


This looks like the same panic I reported yesterday but I'm running
6.3 patch 2.  I have seen these crashes on my box since 6.3
pre-release, randomly, but under load.  My box is based on a
SuperMicro motherboard running Intel Xeon processors.  The only
commonality is that we're both using Sata drives.

John, a question, how is swap set up on your system?  I was swapping
to a file (a memory disk device /dev/md0).  I was doing this because
for some reason lost in ancient history, this machine was not set up
with a real swap partition.  Hence, no crash dump.

Last night I repartitioned a second disk, set up a real swap partition
and now I'm currently waiting for this to happen again so I can get a
crash dump.

Michael Grant
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-07-16 Thread Kris Kennaway

Michael Grant wrote:

On Wed, Jul 16, 2008 at 10:38 AM, John Sullivan [EMAIL PROTECTED] wrote:

Could be memory, but I'd also suggest looking at
temperatures. I've had overheating systems produce lots of
such errors.

Temperature is fine - it never get's that hot here in the UK ;-)  Seriously, I 
put my hand in the box, touched a few heat sync's, it
is not running hot enough to cause a problem.  The BIOS reports that all is 
well with the temperature inside the box of just over 30
degrees C.

John



This looks like the same panic I reported yesterday but I'm running
6.3 patch 2.


Unless you have information you haven't yet shared, no it doesn't :) 
Fatal trap 12 is an effect, not a cause.  We still need your backtrace 
to make progress understanding the cause of your panic.


Kris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-07-16 Thread Kevin Oberman
 From: John Sullivan [EMAIL PROTECTED]
 Date: Wed, 16 Jul 2008 09:38:26 +0100
 
 
  Could be memory, but I'd also suggest looking at 
  temperatures. I've had overheating systems produce lots of 
  such errors.
 
 Temperature is fine - it never get's that hot here in the UK ;-)
 Seriously, I put my hand in the box, touched a few heat sync's, it is
 not running hot enough to cause a problem.  The BIOS reports that all
 is well with the temperature inside the box of just over 30 degrees C.

It's not the heat sink temperature that I am concerned with. It is the
temperature of the CPU and (if it's not AMD) the north bridge. I have
encountered several cases of improper heat sink installation which
resulted in poor transfer from the chip to the heat sink. Cleaning and
properly applying heat transfer grease made a huge difference.

You say that BIOS is reporting a 30C temperature. If this is the CPU
temperature when the CPU is busy, I don't believe it. I have a system
where the BIOS (via ACPI) reports the temperature as 35C, regardless of
how long the system has been under power or what it is doing.

I'm not at all sure that the problem is thermal, but I don't think you
should dismiss the possibility too quickly.
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: [EMAIL PROTECTED]   Phone: +1 510 486-8634
Key fingerprint:059B 2DDF 031C 9BA3 14A4  EADA 927D EBB3 987B 3751


pgpoh1jzjnO0A.pgp
Description: PGP signature


Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-07-16 Thread john



   OK, the first thing to do is disable bg fsck, then force a full fsck of

all filesystems.  bg fsck does a poor job of fixing arbitrary
filesystem corruption (it's not designed to do so, in fact), and you
can get into a situation where corrupted filesystems cause further
panics.


  Done, nothing really found wrong size in superblock which it corrected.

   Removing KDB_UNATTENDED from your kernel will allow you to interact

with the debugger and obtain backtraces etc, which is useful when dumps
are not being saved.


  Easier said than done, this cause a few panics - no dumps though ...g!!

  Still the same result ... the system seems to panic twice then  
hang.  I will keep trying unless you have some other ideas??


  Thanks for your support

John




This message was sent using IMP, the Internet Messaging Program.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-07-15 Thread John Sullivan
I am experiencing 'random' reboots interspersed with panics whenever I put a 
newly installed system under load (make index in
/usr/ports is enough).  A sample panic is at the end of this email.
 
I have updated to 7.0-RELEASE-p2 using the GENERIC amd64 kernel and it is still 
the same.  The system is a Gigabyte GA-M56S-S3
motherboard with 4GB of RAM, an Athlon X2 6400+ and 3 x Maxtor SATA 750GB HDD's 
(only the first is currently in use).  The first
disk is all allocated to FreeBSD using UFS.  There is also a Linksys 
802.11a/b/g card installed.  I have flashed the BIOS to the
latest revision (F4e).  The onboard RAID is disabled.
 
At the moment there is no exotic software installed.
 
Although I have been using FreeBSD for a number of years this is the first time 
I have experienced regular panics and am at a
complete loss trying to work out what is wrong.  I would be grateful for any 
advice anyone is willing to give to help me
troubleshoot this issue.
 
Thanks in advance
 
John
 
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x80b0
fault code - supervisor write data, page not present
instruction pointer = 0x8:0x804db18c
stack pointer = 0x10:b1e92450
frame pointer = 0x10:ffec
code segment = base 0x0, limit 0xf, type 0x16, DPL 0, pres 1, long 1, def32 
0, gran 1
processor eflags = interupt enabled, resume, IOPL = 0
current processkernel trap 12 with interrupts disabled
 
#nm -n /boot/kernel/kernel | grep 804db
804dbac0 t flushbufqueues
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-07-15 Thread Jeremy Chadwick
On Tue, Jul 15, 2008 at 10:58:19AM +0100, John Sullivan wrote:
 I am experiencing 'random' reboots interspersed with panics whenever I put a 
 newly installed system under load (make index in
 /usr/ports is enough).  A sample panic is at the end of this email.
  
 I have updated to 7.0-RELEASE-p2 using the GENERIC amd64 kernel and it is 
 still the same.  The system is a Gigabyte GA-M56S-S3
 motherboard with 4GB of RAM, an Athlon X2 6400+ and 3 x Maxtor SATA 750GB 
 HDD's (only the first is currently in use).  The first
 disk is all allocated to FreeBSD using UFS.  There is also a Linksys 
 802.11a/b/g card installed.  I have flashed the BIOS to the
 latest revision (F4e).  The onboard RAID is disabled.
  
 At the moment there is no exotic software installed.
  
 Although I have been using FreeBSD for a number of years this is the first 
 time I have experienced regular panics and am at a
 complete loss trying to work out what is wrong.  I would be grateful for any 
 advice anyone is willing to give to help me
 troubleshoot this issue.

Can the system in question run memtest86+ successfully (no errors)
for an hour?  It would help diminish (but not entirely rule out)
hardware (memory or chipset) issues.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-07-15 Thread John Sullivan
 
 Can the system in question run memtest86+ successfully (no 
 errors) for an hour?  It would help diminish (but not 
 entirely rule out) hardware (memory or chipset) issues.

Sorry, forgot to mention, I ran memtest over night without any problem 
reported.  I ran Fedora 9 for a month without any issue -
FreeBSD 7.0 crashes within an hour.

John


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-07-15 Thread Kris Kennaway

John Sullivan wrote:
 
Can the system in question run memtest86+ successfully (no 
errors) for an hour?  It would help diminish (but not 
entirely rule out) hardware (memory or chipset) issues.


Sorry, forgot to mention, I ran memtest over night without any problem 
reported.  I ran Fedora 9 for a month without any issue -
FreeBSD 7.0 crashes within an hour.


Well, that doesn't rule out hardware failure.  Different OSes may use 
different capabilities of the hardware, or just use it in a different 
way, and that can provoke failures from marginal hardware.


Please collect kgdb/ddb backtraces.

Kris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-07-15 Thread john



   Please collect kgdb/ddb backtraces.

  kgdb backtrace:

  server251# kgdb -c /var/crash/vmcore.0
kgdb: couldn't find a suitable kernel image
server251# kgdb /boot/kernel/kernel /var/crash/vmcore.0
kgdb: kvm_read: invalid address (0xff00010e5468)
[GDB will not be able to debug user-mode threads:  
/usr/lib/libthread_db.so: Unde

fined symbol ps_pglobal_lookup]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as amd64-marcel-freebsd.

  Unread portion of the kernel message buffer:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x64
fault code  = supervisor read instruction, page not present
instruction pointer = 0x8:0x64
stack pointer   = 0x10:0xb1d7f590
frame pointer   = 0x10:0xff0035d2dcc0
code segment    = base 0x0, limit 0xf, type 0x1b
    = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process = 88622 (make)
trap number = 12
panic: page fault
cpuid = 0
Uptime: 5h57m22s
Physical memory: 4082 MB
Dumping 444 MB: 429 413 397 381 365 349 333 317 301 285 269 253 237  
221 205 189

173 157 141 125 109 93 77 61 45 29 13

  #0  doadump () at pcpu.h:194
194 pcpu.h: No such file or directory.
    in pcpu.h
(kgdb)
(kgdb) list *0x64
No source file for address 0x64.
(kgdb) backtrace
#0  doadump () at pcpu.h:194
#1  0xff0004742440 in ?? ()
#2  0x80477699 in boot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:409
#3  0x80477a9d in panic (fmt=0x104 Address 0x104 out of bounds)
    at /usr/src/sys/kern/kern_shutdown.c:563
#4  0x8072ed44 in trap_fatal (frame=0xff00048ee000,
    eva=18446742974275512528) at /usr/src/sys/amd64/amd64/trap.c:724
#5  0x8072f115 in trap_pfault (frame=0xb1d7f4e0, usermode=0)
    at /usr/src/sys/amd64/amd64/trap.c:641
#6  0x8072fa58 in trap (frame=0xb1d7f4e0)
    at /usr/src/sys/amd64/amd64/trap.c:410
#7  0x807156be in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:169
#8  0x0064 in ?? ()
#9  0x8067d3ee in uma_zalloc_arg (zone=0xff00bfed07e0, udata=0x0,
    flags=-256) at /usr/src/sys/vm/uma_core.c:1835
#10 0x80661ecf in ffs_vget (mp=0xff00047f4978, ino=47884512,
    flags=2, vpp=0xb1d7f728) at uma.h:277
#11 0x8066d010 in ufs_lookup (ap=0xb1d7f780)
    at /usr/src/sys/ufs/ufs/ufs_lookup.c:573
#12 0x804dfa89 in vfs_cache_lookup (ap=Variable ap is not available.
) at vnode_if.h:83
#13 0x8077235f in VOP_LOOKUP_APV (vop=0x809e7de0,
    a=0xb1d7f840) at vnode_if.c:99
---Type return to continue, or q return to quit---
#14 0x804e6394 in lookup (ndp=0xb1d7f950) at vnode_if.h:57
#15 0x804e7228 in namei (ndp=0xb1d7f950)
    at /usr/src/sys/kern/vfs_lookup.c:219
#16 0x804f4717 in kern_stat (td=0xff00048ee000,
    path=0x8006f7040 Address 0x8006f7040 out of bounds,  
pathseg=Variable path

seg is not available.
)
    at /usr/src/sys/kern/vfs_syscalls.c:2109
#17 0x804f4987 in stat (td=Variable td is not available.
) at /usr/src/sys/kern/vfs_syscalls.c:2093
#18 0x8072f397 in syscall (frame=0xb1d7fc70)
    at /usr/src/sys/amd64/amd64/trap.c:852
#19 0x807158cb in Xfast_syscall ()
    at /usr/src/sys/amd64/amd64/exception.S:290
#20 0x0043127c in ?? ()
Previous frame inner to this frame (corrupt stack?)

  I really don't understand this -any advice you can give would  
really be appreciated.


  John


This message was sent using IMP, the Internet Messaging Program.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-07-15 Thread Kris Kennaway

[EMAIL PROTECTED] wrote:


(kgdb) backtrace
#0  doadump () at pcpu.h:194
#1  0xff0004742440 in ?? ()
#2  0x80477699 in boot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:409
#3  0x80477a9d in panic (fmt=0x104 Address 0x104 out of bounds)
at /usr/src/sys/kern/kern_shutdown.c:563
#4  0x8072ed44 in trap_fatal (frame=0xff00048ee000,
eva=18446742974275512528) at /usr/src/sys/amd64/amd64/trap.c:724
#5  0x8072f115 in trap_pfault (frame=0xb1d7f4e0, 
usermode=0)

at /usr/src/sys/amd64/amd64/trap.c:641
#6  0x8072fa58 in trap (frame=0xb1d7f4e0)
at /usr/src/sys/amd64/amd64/trap.c:410
#7  0x807156be in calltrap ()
at /usr/src/sys/amd64/amd64/exception.S:169
#8  0x0064 in ?? ()
#9  0x8067d3ee in uma_zalloc_arg (zone=0xff00bfed07e0, 
udata=0x0,

flags=-256) at /usr/src/sys/vm/uma_core.c:1835


OK, that is

if (zone-uz_ctor != NULL) {
if (zone-uz_ctor(item, 
zone-uz_keg-uk_size,


uz_ctor is indeed not null, but it's got 3 bits set.  Not impossible 
that it's bad RAM still.  I didn't spot anything that could cause it 
otherwise but I don't know this code in detail.


Do all of the panics have the same backtrace?

Kris


#10 0x80661ecf in ffs_vget (mp=0xff00047f4978, ino=47884512,
flags=2, vpp=0xb1d7f728) at uma.h:277
#11 0x8066d010 in ufs_lookup (ap=0xb1d7f780)
at /usr/src/sys/ufs/ufs/ufs_lookup.c:573
#12 0x804dfa89 in vfs_cache_lookup (ap=Variable ap is not 
available.

) at vnode_if.h:83
#13 0x8077235f in VOP_LOOKUP_APV (vop=0x809e7de0,
a=0xb1d7f840) at vnode_if.c:99
---Type return to continue, or q return to quit---
#14 0x804e6394 in lookup (ndp=0xb1d7f950) at vnode_if.h:57
#15 0x804e7228 in namei (ndp=0xb1d7f950)
at /usr/src/sys/kern/vfs_lookup.c:219
#16 0x804f4717 in kern_stat (td=0xff00048ee000,
path=0x8006f7040 Address 0x8006f7040 out of bounds, 
pathseg=Variable path

seg is not available.
)
at /usr/src/sys/kern/vfs_syscalls.c:2109
#17 0x804f4987 in stat (td=Variable td is not available.
) at /usr/src/sys/kern/vfs_syscalls.c:2093
#18 0x8072f397 in syscall (frame=0xb1d7fc70)
at /usr/src/sys/amd64/amd64/trap.c:852
#19 0x807158cb in Xfast_syscall ()
at /usr/src/sys/amd64/amd64/exception.S:290
#20 0x0043127c in ?? ()
Previous frame inner to this frame (corrupt stack?)

  I really don't understand this -any advice you can give would really 
be appreciated.


  John


This message was sent using IMP, the Internet Messaging Program.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-07-15 Thread john



   #9  0x8067d3ee in uma_zalloc_arg (zone=0xff00bfed07e0,

udata=0x0,
flags=-256) at /usr/src/sys/vm/uma_core.c:1835

From the frame #9, please do
p *zone
I am esp. interested in the value of the uz_ctor member.

It seems that it becomes corrupted, it value should be 0, as this seems
to be ffs inode zone.  I suspect that gdb would show 0x64 instead.


  I am afraid that you may need to spell out each step for me :-(

  (kgdb) p *zone
No symbol zone in current context.
(kgdb) list *0x8067d3ee
0x8067d3ee is in uma_zalloc_arg (/usr/src/sys/vm/uma_core.c:1835).
1830    (uma_zalloc: Bucket pointer mangled.));
1831    cache-uc_allocs++;
1832    critical_exit();
1833    #ifdef INVARIANTS
1834    ZONE_LOCK(zone);
1835    uma_dbg_alloc(zone, NULL, item);
1836    ZONE_UNLOCK(zone);
1837    #endif
1838    if (zone-uz_ctor != NULL) {
1839    if (zone-uz_ctor(item,  
zone-uz_keg-uk_size,


  Is this that you were looking for?

  John


This message was sent using IMP, the Internet Messaging Program.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-07-15 Thread Kostik Belousov
On Tue, Jul 15, 2008 at 08:19:15PM +0100, [EMAIL PROTECTED] wrote:
 
 
Please collect kgdb/ddb backtraces.
 
   kgdb backtrace:
 
   server251# kgdb -c /var/crash/vmcore.0
 kgdb: couldn't find a suitable kernel image
 server251# kgdb /boot/kernel/kernel /var/crash/vmcore.0
 kgdb: kvm_read: invalid address (0xff00010e5468)
 [GDB will not be able to debug user-mode threads:  
 /usr/lib/libthread_db.so: Unde
 fined symbol ps_pglobal_lookup]
 GNU gdb 6.1.1 [FreeBSD]
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain 
 conditions.
 Type show copying to see the conditions.
 There is absolutely no warranty for GDB.  Type show warranty for details.
 This GDB was configured as amd64-marcel-freebsd.
 
   Unread portion of the kernel message buffer:
 
 Fatal trap 12: page fault while in kernel mode
 cpuid = 0; apic id = 00
 fault virtual address   = 0x64
 fault code  = supervisor read instruction, page not present
 instruction pointer = 0x8:0x64
 stack pointer   = 0x10:0xb1d7f590
 frame pointer   = 0x10:0xff0035d2dcc0
 code segment    = base 0x0, limit 0xf, type 0x1b
 = DPL 0, pres 1, long 1, def32 0, gran 1
 processor eflags    = interrupt enabled, resume, IOPL = 0
 current process = 88622 (make)
 trap number = 12
 panic: page fault
 cpuid = 0
 Uptime: 5h57m22s
 Physical memory: 4082 MB
 Dumping 444 MB: 429 413 397 381 365 349 333 317 301 285 269 253 237  
 221 205 189
 173 157 141 125 109 93 77 61 45 29 13
 
   #0  doadump () at pcpu.h:194
 194 pcpu.h: No such file or directory.
 in pcpu.h
 (kgdb)
 (kgdb) list *0x64
 No source file for address 0x64.
 (kgdb) backtrace
 #0  doadump () at pcpu.h:194
 #1  0xff0004742440 in ?? ()
 #2  0x80477699 in boot (howto=260)
 at /usr/src/sys/kern/kern_shutdown.c:409
 #3  0x80477a9d in panic (fmt=0x104 Address 0x104 out of bounds)
 at /usr/src/sys/kern/kern_shutdown.c:563
 #4  0x8072ed44 in trap_fatal (frame=0xff00048ee000,
 eva=18446742974275512528) at /usr/src/sys/amd64/amd64/trap.c:724
 #5  0x8072f115 in trap_pfault (frame=0xb1d7f4e0, usermode=0)
 at /usr/src/sys/amd64/amd64/trap.c:641
 #6  0x8072fa58 in trap (frame=0xb1d7f4e0)
 at /usr/src/sys/amd64/amd64/trap.c:410
 #7  0x807156be in calltrap ()
 at /usr/src/sys/amd64/amd64/exception.S:169
 #8  0x0064 in ?? ()
 #9  0x8067d3ee in uma_zalloc_arg (zone=0xff00bfed07e0, 
 udata=0x0,
 flags=-256) at /usr/src/sys/vm/uma_core.c:1835
From the frame #9, please do
p *zone
I am esp. interested in the value of the uz_ctor member.

It seems that it becomes corrupted, it value should be 0, as this seems
to be ffs inode zone.  I suspect that gdb would show 0x64 instead.

That may be kernel memory corruption, but might be a bad memory
as well (double bit inversion ?).

 #10 0x80661ecf in ffs_vget (mp=0xff00047f4978, ino=47884512,
 flags=2, vpp=0xb1d7f728) at uma.h:277
 #11 0x8066d010 in ufs_lookup (ap=0xb1d7f780)
 at /usr/src/sys/ufs/ufs/ufs_lookup.c:573
 #12 0x804dfa89 in vfs_cache_lookup (ap=Variable ap is not 
 available.
 ) at vnode_if.h:83
 #13 0x8077235f in VOP_LOOKUP_APV (vop=0x809e7de0,
 a=0xb1d7f840) at vnode_if.c:99
 ---Type return to continue, or q return to quit---
 #14 0x804e6394 in lookup (ndp=0xb1d7f950) at vnode_if.h:57
 #15 0x804e7228 in namei (ndp=0xb1d7f950)
 at /usr/src/sys/kern/vfs_lookup.c:219
 #16 0x804f4717 in kern_stat (td=0xff00048ee000,
 path=0x8006f7040 Address 0x8006f7040 out of bounds,  
 pathseg=Variable path
 seg is not available.
 )
 at /usr/src/sys/kern/vfs_syscalls.c:2109
 #17 0x804f4987 in stat (td=Variable td is not available.
 ) at /usr/src/sys/kern/vfs_syscalls.c:2093
 #18 0x8072f397 in syscall (frame=0xb1d7fc70)
 at /usr/src/sys/amd64/amd64/trap.c:852
 #19 0x807158cb in Xfast_syscall ()
 at /usr/src/sys/amd64/amd64/exception.S:290
 #20 0x0043127c in ?? ()
 Previous frame inner to this frame (corrupt stack?)
 
   I really don't understand this -any advice you can give would  
 really be appreciated.


pgpRxQ8vDk9c9.pgp
Description: PGP signature


Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-07-15 Thread Kostik Belousov
On Tue, Jul 15, 2008 at 08:47:03PM +0100, [EMAIL PROTECTED] wrote:
 
 
#9  0x8067d3ee in uma_zalloc_arg (zone=0xff00bfed07e0,
 udata=0x0,
 flags=-256) at /usr/src/sys/vm/uma_core.c:1835
 From the frame #9, please do
 p *zone
 I am esp. interested in the value of the uz_ctor member.
 
 It seems that it becomes corrupted, it value should be 0, as this seems
 to be ffs inode zone.  I suspect that gdb would show 0x64 instead.
 
   I am afraid that you may need to spell out each step for me :-(
 
   (kgdb) p *zone
 No symbol zone in current context.
Do the frame 9 before p *zone.

 (kgdb) list *0x8067d3ee
 0x8067d3ee is in uma_zalloc_arg (/usr/src/sys/vm/uma_core.c:1835).
 1830    (uma_zalloc: Bucket pointer 
 mangled.));
 1831    cache-uc_allocs++;
 1832    critical_exit();
 1833    #ifdef INVARIANTS
 1834    ZONE_LOCK(zone);
 1835    uma_dbg_alloc(zone, NULL, item);
 1836    ZONE_UNLOCK(zone);
 1837    #endif
 1838    if (zone-uz_ctor != NULL) {
 1839    if (zone-uz_ctor(item,  
 zone-uz_keg-uk_size,
 
   Is this that you were looking for?
No, see above.


pgpvrqCJe6SDX.pgp
Description: PGP signature


Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-07-15 Thread john



   Do the frame 9 before p *zone.

  It's obvious now you say it ;-)

  You are indeed right:

  (kgdb) frame 9
#9  0x8067d3ee in uma_zalloc_arg (zone=0xff00bfed07e0, udata=0x0,
    flags=-256) at /usr/src/sys/vm/uma_core.c:1835
1835    uma_dbg_alloc(zone, NULL, item);
(kgdb) p *zone
$1 = {uz_name = 0x808084cd FFS inode, uz_lock = 0xff00bfecf7f0,
  uz_keg = 0xff00bfecf7e0, uz_link = {le_next = 0x0,
    le_prev = 0xff00bfecf830}, uz_full_bucket = {
    lh_first = 0xffe01a74c830}, uz_free_bucket = {
    lh_first = 0xff00469bf830}, uz_ctor = 0x64, uz_dtor = 0,
  uz_init = 0x9a, uz_fini = 0, uz_allocs = 17180460407,
  uz_frees = 504673, uz_fails = 0, uz_fills = 0, uz_count = 128, uz_cpu = {{
  uc_freebucket = 0xff000e5d6830, uc_allocbucket = 0xff003a5f7000,
  uc_allocs = 97, uc_frees = 0}}}

  Now what does that mean??

  I just experienced another panic, but it failed to writ to disk  
:-(.  I will force another one and check that the details are the same.


  John


This message was sent using IMP, the Internet Messaging Program.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-07-15 Thread Kris Kennaway

[EMAIL PROTECTED] wrote:



   #9  0x8067d3ee in uma_zalloc_arg (zone=0xff00bfed07e0,

udata=0x0,
flags=-256) at /usr/src/sys/vm/uma_core.c:1835

From the frame #9, please do
p *zone
I am esp. interested in the value of the uz_ctor member.

It seems that it becomes corrupted, it value should be 0, as this seems
to be ffs inode zone.  I suspect that gdb would show 0x64 
instead.


  I am afraid that you may need to spell out each step for me :-(

  (kgdb) p *zone
No symbol zone in current context.
(kgdb) list *0x8067d3ee
0x8067d3ee is in uma_zalloc_arg (/usr/src/sys/vm/uma_core.c:1835).
1830(uma_zalloc: Bucket pointer 
mangled.));

1831cache-uc_allocs++;
1832critical_exit();
1833#ifdef INVARIANTS
1834ZONE_LOCK(zone);
1835uma_dbg_alloc(zone, NULL, item);
1836ZONE_UNLOCK(zone);
1837#endif
1838if (zone-uz_ctor != NULL) {
1839if (zone-uz_ctor(item, 
zone-uz_keg-uk_size,


  Is this that you were looking for?


Are you sure that is the same source tree you are running?  The 
7.0-RELEASE source has the zone-uz_ctor on line 1835, which is 
consistent with your backtrace.


Kris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load

2008-07-15 Thread Kevin Oberman
 From: John Sullivan [EMAIL PROTECTED]
 Date: Tue, 15 Jul 2008 10:58:19 +0100
 Sender: [EMAIL PROTECTED]
 
 I am experiencing 'random' reboots interspersed with panics whenever I put a 
 newly installed system under load (make index in
 /usr/ports is enough).  A sample panic is at the end of this email.
  
 I have updated to 7.0-RELEASE-p2 using the GENERIC amd64 kernel and it is 
 still the same.  The system is a Gigabyte GA-M56S-S3
 motherboard with 4GB of RAM, an Athlon X2 6400+ and 3 x Maxtor SATA 750GB 
 HDD's (only the first is currently in use).  The first
 disk is all allocated to FreeBSD using UFS.  There is also a Linksys 
 802.11a/b/g card installed.  I have flashed the BIOS to the
 latest revision (F4e).  The onboard RAID is disabled.
  
 At the moment there is no exotic software installed.
  
 Although I have been using FreeBSD for a number of years this is the first 
 time I have experienced regular panics and am at a
 complete loss trying to work out what is wrong.  I would be grateful for any 
 advice anyone is willing to give to help me
 troubleshoot this issue.
  
 Thanks in advance
  
 John
  
 Fatal trap 12: page fault while in kernel mode
 cpuid = 0; apic id = 00
 fault virtual address = 0x80b0
 fault code - supervisor write data, page not present
 instruction pointer = 0x8:0x804db18c
 stack pointer = 0x10:b1e92450
 frame pointer = 0x10:ffec
 code segment = base 0x0, limit 0xf, type 0x16, DPL 0, pres 1, long 1, 
 def32 0, gran 1
 processor eflags = interupt enabled, resume, IOPL = 0
 current processkernel trap 12 with interrupts disabled
  
 #nm -n /boot/kernel/kernel | grep 804db
 804dbac0 t flushbufqueues

Could be memory, but I'd also suggest looking at temperatures. I've had
overheating systems produce lots of such errors.
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: [EMAIL PROTECTED]   Phone: +1 510 486-8634
Key fingerprint:059B 2DDF 031C 9BA3 14A4  EADA 927D EBB3 987B 3751


pgpnWWuBCVU7i.pgp
Description: PGP signature