Re: FreeBSD 6.2 Repeating Crash - Sleeping thread; Fatal trap 12: page fault; warning: 'T2' might be used uninitialized - SOLUTION

2007-07-06 Thread Worth Bishop
We finally determined the root of this problem. One of the system's memory 
modules was apparently going bad. When it failed permanently, the system 
crashed and would not reboot. We swapped out the memory (all Regsistered 
memory) and have not had problems since.


Thanks to the list for the efforts!

WB

(Below is a reply drafted a  long while back - included mostly to thank 
Beto...the rest of it is no longer relevant.)


Thank you very much for your reply, Beto.

I appreciate your point re:  drive age. I only mentioned it because I had
stated the age of the server at 5 years and hoped to forestall suggestions
that an older drive might be likely to have issues. However, I did follow
your suggestion and smartctl reports the drive to be in good health.

I misspoke - we did not upgrade, really, but did a fresh install of 6.1 on
the new drive and manually copied all user files, databases, PERL scripts,
etc. to the new drive. We had been running 4.7 and, since there was not a
direct route for upgrading, we did it the hard way.

Your advice re: copying  renaming GENERIC is well taken - that is, in fact,
exactly what we did. Further, as advised in the manual, we moved it from
/usr/src to a different directory and created a sym link to avoid
inadvertently overwriting it. We did not rename the ident line, but it seems
unlikely that that oversight would prevent the kernel from making.

Were any of the errors described familiar?


- Original Message - 
From: Norberto Meijome [EMAIL PROTECTED]

To: Worth Bishop [EMAIL PROTECTED]
Cc: freebsd-questions@freebsd.org
Sent: Wednesday, June 13, 2007 12:52 PM
Subject: Re: FreeBSD 6.2 Repeating Crash - Sleeping thread; Fatal trap 12:
page fault; warning: 'T2' might be used uninitialized



On Tue, 12 Jun 2007 11:38:19 -0400
Worth Bishop [EMAIL PROTECTED] wrote:


Addendum: For what it's worth, the 250Gb Samsung drive was added when the
system was upgraded - it's only 3-4 months old.


Worth,
that doesnt mean much - drives can (and do) fail anyway. I suggest you run
smartctl ( sysutils/smartmontools ) to run tests on your drive and ensure
you
don't have any actual problems with it

btw, you don't mention from what version you had upgraded to 6.1. Did you
do a
full world upgrade as well as kernel?


from your previous email, you ended up having some kernel build problems.
1) it is good practise to rename your kernel file (and ident line inside
it)
from GENERIC once you've modified it. It makes it obvious to see whether
you
are truly running the same GENERIC as everyone else.

2) make sure you have the latest and proper code for your line of src you
need
(eg, -STABLE , or RELEASE-p5 ,etc). You should use cvsup for this. If you
need
them, the default config files are in /usr/share/examples/cvsup/ .

B
_
{Beto|Norberto|Numard} Meijome

Never offend people with style when you can offend them with substance.
 Sam Brown

I speak for myself, not my employer. Contents may be hot. Slippery when
wet.
Reading disclaimers makes you go blind. Writing them is worse. You have
been
Warned.



___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD 6.2 Repeating Crash - Sleeping thread; Fatal trap 12: page fault; warning: 'T2' might be used uninitialized

2007-06-13 Thread Norberto Meijome
On Tue, 12 Jun 2007 11:38:19 -0400
Worth Bishop [EMAIL PROTECTED] wrote:

 Addendum: For what it's worth, the 250Gb Samsung drive was added when the 
 system was upgraded - it's only 3-4 months old.

Worth,
that doesnt mean much - drives can (and do) fail anyway. I suggest you run
smartctl ( sysutils/smartmontools ) to run tests on your drive and ensure you
don't have any actual problems with it

btw, you don't mention from what version you had upgraded to 6.1. Did you do a
full world upgrade as well as kernel?


from your previous email, you ended up having some kernel build problems.
1) it is good practise to rename your kernel file (and ident line inside it)
from GENERIC once you've modified it. It makes it obvious to see whether you
are truly running the same GENERIC as everyone else.

2) make sure you have the latest and proper code for your line of src you need
(eg, -STABLE , or RELEASE-p5 ,etc). You should use cvsup for this. If you need
them, the default config files are in /usr/share/examples/cvsup/ .

B
_
{Beto|Norberto|Numard} Meijome

Never offend people with style when you can offend them with substance.
  Sam Brown

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD 6.2 Repeating Crash - Sleeping thread; Fatal trap 12: page fault; warning: 'T2' might be used uninitialized

2007-06-13 Thread Brian A. Seklecki
Read:

http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html


Also, is your /usr/src tagged RELENG_6_2 ?  You can remove DEBUG=-g and
that problem does not occur?

You didn't try to update your src to tree to STABLE or CURRENT?

~~BAS

On Tue, 2007-06-12 at 11:33 -0400, Worth Bishop wrote:
 ed GENERIC and edited it, noting that options   ddb was 
 already enabled. We added 'makeoptions DEBUG=-g#
 Build 
 kernel with gdb(1) debug symbols' as suggested and tried to make 
 buildkernel which errored out stating that KDB must be enabled to use
 DDB. 
 We edited KERNEL.DEBUG to add 'options KDB
 # 
 Enable kernel debugger' and attempted to make buildkernel again.
 This 
 time, the process stopped again with the message:
 
 THIRD ERROR EVENT
 
 [snip]
 inline-unit-growth=100 --param 
 arge-function-growth=1000  -mno-align-long-strings
 -mpreferred-stack-bounda 
-- 
Brian A. Seklecki [EMAIL PROTECTED]
Collaborative Fusion, Inc.




IMPORTANT: This message contains confidential information and is intended only 
for the individual named. If the reader of this message is not an intended 
recipient (or the individual responsible for the delivery of this message to an 
intended recipient), please be advised that any re-use, dissemination, 
distribution or copying of this message is prohibited.  Please notify the 
sender immediately by e-mail if you have received this e-mail by mistake and 
delete this e-mail from your system.


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD 6.2 Repeating Crash - Sleeping thread; Fatal trap 12: page fault; warning: 'T2' might be used uninitialized

2007-06-13 Thread Brian A. Seklecki
Hardware can be eliminated by running memtest86 bootable ISOs from the
web site.  A bad sector test on the drives would be less ambiguous
(kernel messages preceeding a panic).  Smart can be helpful.

Overheating CPUs and underpowered/overheated Power Supplies can cause
problems, but they would normally manifest in memtest86+ failures

www.memtest.org/

On Thu, 2007-06-14 at 02:52 +1000, Norberto Meijome wrote:
 On Tue, 12 Jun 2007 11:38:19 -0400
 Worth Bishop [EMAIL PROTECTED] wrote:
 
  Addendum: For what it's worth, the 250Gb Samsung drive was added when the 
  system was upgraded - it's only 3-4 months old.
 
 Worth,
 that doesnt mean much - drives can (and do) fail anyway. I suggest you run
 smartctl ( sysutils/smartmontools ) to run tests on your drive and ensure you
 don't have any actual problems with it
 
 btw, you don't mention from what version you had upgraded to 6.1. Did you do a
 full world upgrade as well as kernel?
 
 
 from your previous email, you ended up having some kernel build problems.
 1) it is good practise to rename your kernel file (and ident line inside it)
 from GENERIC once you've modified it. It makes it obvious to see whether you
 are truly running the same GENERIC as everyone else.
 
 2) make sure you have the latest and proper code for your line of src you need
 (eg, -STABLE , or RELEASE-p5 ,etc). You should use cvsup for this. If you need
 them, the default config files are in /usr/share/examples/cvsup/ .
 
 B
 _
 {Beto|Norberto|Numard} Meijome
 
 Never offend people with style when you can offend them with substance.
   Sam Brown
 
 I speak for myself, not my employer. Contents may be hot. Slippery when wet.
 Reading disclaimers makes you go blind. Writing them is worse. You have been
 Warned.
 ___
 freebsd-questions@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-questions
 To unsubscribe, send any mail to [EMAIL PROTECTED]
-- 
Brian A. Seklecki [EMAIL PROTECTED]
Collaborative Fusion, Inc.




IMPORTANT: This message contains confidential information and is intended only 
for the individual named. If the reader of this message is not an intended 
recipient (or the individual responsible for the delivery of this message to an 
intended recipient), please be advised that any re-use, dissemination, 
distribution or copying of this message is prohibited.  Please notify the 
sender immediately by e-mail if you have received this e-mail by mistake and 
delete this e-mail from your system.


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


FreeBSD 6.2 Repeating Crash - Sleeping thread; Fatal trap 12: page fault; warning: 'T2' might be used uninitialized

2007-06-12 Thread Worth Bishop

Please help if you can...

BACKGROUND

This crash is occurring on a dual-AMD 1.6Ghz cpu white-box system with 1 Gb 
ram, 250Gb storage running GENERIC kernel. The system has been in production 
use as a web server for nearly five years.


About 3 - 4 months ago, the system was upgraded from an earlier FreeBSD 
version to 6.1. At the same time, all supporting applications (Apache 
webserver, PERL, PostgreSQL, PHP, countless other applications  libraries) 
were upgraded to the current releases. The system was stable up until a 
couple of weeks ago.


FIRST ERROR EVENT

The system crashed during normal usage. The following message was displayed 
on the console which was not responsive to keyboard input:


Sleeping thread (tid 100122, pid 11099)
 owns a non-sleepable lock

panic:  sleeping thread
cpuid=1

The system was restarted, an fsck routine was completed (answering yes to 
all the Do you want to salvage type questions) and the server ran fine. 
For about a week. It then crashed again several times, at intervals varying 
from a few minutes of uptime to a few days.


SECOND ERROR EVENT

After some crashes, a message similar to that above was displayed. However, 
at other times a message similar to this was displayed:


kernel trap 12 with interrupts disabled

Fatal trap 12:  page fault while in kernel mode
cpuid=0; apic id=01
fault virtual address 
=0x100
fault code 
=supervisor read, page not present
instruction pointer 
=0x20:0xc066c731
stack pointer 
=0x28:0xe432ebf0
framepointer 
=0x28:0xe432ebfc
code segment 
=base 0x0, limit0xf, type 0x1b

=DPL 0, pres 1, def32 1, gran1
processor eflags 
=resume, IOPL=0
current process 
=36 (syncer)
trap number 
= 12

panic: page fault
cpuid=0
uptime:  3d10h11m44s
Dumping 1535 Mb (2 chunks)  [NOTE:  the system had 1.5Gb memory at that 
time. Memory was removed, reseated, swapped, etc., now 1Gb]

  chunk 0:1Mb (159 pages)

CORRECTIONS ATTEMPTED

Somewhere during this ordeal, a Google search revealed a number of other 
people experiencing the Sleeping thread problem. One of these was 
apparently experienced in a FreeBSD 6.x development version stress test. No 
definitive solution was identified in anything we say, except a single 
reference to the problem being a kernel bug fixed in FreeBSD 6.2.


Accordingly, we upgraded from 6.1  to 6.2 but have still experienced the 
problem.


We reviewed the 'messages' file and found references to several things which 
led us to check FreeBSD 6.2 ERRATA 
(http://www.freebsd.org/releases/6.2R/errata.html). This suggested adding 
'kern.ipc.nmbclusters=0' to the /boot/loader.conf file which might avoid a 
known issue. We tried this, but saw no relief.


We also found a reference in the manual that suggested the issue might be a 
problem with the APIC in 6.x. This recommended adding 
'hint.apic.0.disabled=1' to loader.conf. Tried this; no help.


In order to try to get more information about the system dumps we added: 
dumpdev=AUTO and dumpdir=/usr/crash [to get more storage space than 
available in /var/] and have generated several vmcore.# files of ~1 Gb each 
(all identical size).


We attempted to use DDB to analyze the dumps (struggling now, unfamiliar 
with kernel debugging process) with no success. Research suggested we needed 
to create a debug version of the kernel (i.e., KERNEL.DEBUG) with debugging 
options enabled.


We duly copied GENERIC and edited it, noting that options   ddb was 
already enabled. We added 'makeoptions DEBUG=-g# Build 
kernel with gdb(1) debug symbols' as suggested and tried to make 
buildkernel which errored out stating that KDB must be enabled to use DDB. 
We edited KERNEL.DEBUG to add 'options KDB # 
Enable kernel debugger' and attempted to make buildkernel again. This 
time, the process stopped again with the message:


THIRD ERROR EVENT

[snip]
inline-unit-growth=100 --param 
arge-function-growth=1000  -mno-align-long-strings -mpreferred-stack-boundary=2 
 -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -ffreestanding -Werror 
/usr/src/sys/crypto/sha2/sha2.c

/usr/src/sys/crypto/sha2/sha2.c: In function `SHA512_Transform':
/usr/src/sys/crypto/sha2/sha2.c:753: warning: 'T2' might be used 
uninitialized in this function

*** Error code 1

Stop in /usr/obj/usr/src/sys/KERNEL.DEBUG.
*** Error code 1

Stop in /usr/src.
*** Error code 1

Stop in /usr/src.
www:/usr/src#

With this, we are stumped.

HELP PLEASE!

Can anyone:

-  lead us to a solution based on these error messages?
-  help us understand why the GENERIC kernel with only the debugging options 
added failed to make?
-  help us understand what '/usr/src/crypto/sha2/sha2.c' has to do with 
anything?
-  help us understand what we need to do to extract useful information from 
the vmcore.# files?

-  offer any other suggestions?

Thanks in advance!









___

Fw: FreeBSD 6.2 Repeating Crash - Sleeping thread; Fatal trap 12: page fault; warning: 'T2' might be used uninitialized

2007-06-12 Thread Worth Bishop
Addendum: For what it's worth, the 250Gb Samsung drive was added when the 
system was upgraded - it's only 3-4 months old.


- Original Message - 
From: Worth Bishop [EMAIL PROTECTED]

To: freebsd-questions@freebsd.org
Sent: Tuesday, June 12, 2007 11:33 AM
Subject: FreeBSD 6.2 Repeating Crash - Sleeping thread; Fatal trap 12: page 
fault; warning: 'T2' might be used uninitialized




Please help if you can...

BACKGROUND

This crash is occurring on a dual-AMD 1.6Ghz cpu white-box system with 1 
Gb ram, 250Gb storage running GENERIC kernel. The system has been in 
production use as a web server for nearly five years.


About 3 - 4 months ago, the system was upgraded from an earlier FreeBSD 
version to 6.1. At the same time, all supporting applications (Apache 
webserver, PERL, PostgreSQL, PHP, countless other applications  
libraries) were upgraded to the current releases. The system was stable up 
until a couple of weeks ago.


FIRST ERROR EVENT

The system crashed during normal usage. The following message was 
displayed on the console which was not responsive to keyboard input:


Sleeping thread (tid 100122, pid 11099)
 owns a non-sleepable lock

panic:  sleeping thread
cpuid=1

The system was restarted, an fsck routine was completed (answering yes 
to all the Do you want to salvage type questions) and the server ran 
fine. For about a week. It then crashed again several times, at intervals 
varying from a few minutes of uptime to a few days.


SECOND ERROR EVENT

After some crashes, a message similar to that above was displayed. 
However, at other times a message similar to this was displayed:


kernel trap 12 with interrupts disabled

Fatal trap 12:  page fault while in kernel mode
cpuid=0; apic id=01
fault virtual address =0x100
fault code =supervisor read, page not present
instruction pointer =0x20:0xc066c731
stack pointer =0x28:0xe432ebf0
framepointer =0x28:0xe432ebfc
code segment =base 0x0, limit0xf, type 0x1b

=DPL 0, pres 1, def32 1, gran1
processor eflags =resume, IOPL=0
current process =36 (syncer)
trap number = 12
panic: page fault
cpuid=0
uptime:  3d10h11m44s
Dumping 1535 Mb (2 chunks)  [NOTE:  the system had 1.5Gb memory at that 
time. Memory was removed, reseated, swapped, etc., now 1Gb]

  chunk 0:1Mb (159 pages)

CORRECTIONS ATTEMPTED

Somewhere during this ordeal, a Google search revealed a number of other 
people experiencing the Sleeping thread problem. One of these was 
apparently experienced in a FreeBSD 6.x development version stress test. 
No definitive solution was identified in anything we say, except a single 
reference to the problem being a kernel bug fixed in FreeBSD 6.2.


Accordingly, we upgraded from 6.1  to 6.2 but have still experienced the 
problem.


We reviewed the 'messages' file and found references to several things 
which led us to check FreeBSD 6.2 ERRATA 
(http://www.freebsd.org/releases/6.2R/errata.html). This suggested adding 
'kern.ipc.nmbclusters=0' to the /boot/loader.conf file which might avoid 
a known issue. We tried this, but saw no relief.


We also found a reference in the manual that suggested the issue might be 
a problem with the APIC in 6.x. This recommended adding 
'hint.apic.0.disabled=1' to loader.conf. Tried this; no help.


In order to try to get more information about the system dumps we added: 
dumpdev=AUTO and dumpdir=/usr/crash [to get more storage space than 
available in /var/] and have generated several vmcore.# files of ~1 Gb 
each (all identical size).


We attempted to use DDB to analyze the dumps (struggling now, unfamiliar 
with kernel debugging process) with no success. Research suggested we 
needed to create a debug version of the kernel (i.e., KERNEL.DEBUG) with 
debugging options enabled.


We duly copied GENERIC and edited it, noting that options   ddb was 
already enabled. We added 'makeoptions DEBUG=-g# Build 
kernel with gdb(1) debug symbols' as suggested and tried to make 
buildkernel which errored out stating that KDB must be enabled to use 
DDB. We edited KERNEL.DEBUG to add 'options KDB 
# Enable kernel debugger' and attempted to make buildkernel again. This 
time, the process stopped again with the message:


THIRD ERROR EVENT

[snip]
inline-unit-growth=100 --param 
rge-function-growth=1000  -mno-align-long-strings -mpreferred-stack-boundary=2 
 -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -ffreestanding -Werror 
/usr/src/sys/crypto/sha2/sha2.c

/usr/src/sys/crypto/sha2/sha2.c: In function `SHA512_Transform':
/usr/src/sys/crypto/sha2/sha2.c:753: warning: 'T2' might be used 
uninitialized in this function

*** Error code 1

Stop in /usr/obj/usr/src/sys/KERNEL.DEBUG.
*** Error code 1

Stop in /usr/src.
*** Error code 1

Stop in /usr/src.
www:/usr/src#

With this, we are stumped.

HELP PLEASE!

Can anyone:

-  lead us to a solution based on these error messages?
-  help us understand why the GENERIC kernel with only the debugging 
options added failed to make?
-  help us understand what