Re: How to diagnose hardware problem?

2009-04-14 Thread Graham Bentley

The first two utils I run if I suspect hardware issues
both independant of resident os ;
http://www.memtest.org/
http://www.hitachigst.com/hdd/technolo/dft/dft.htm
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


How to diagnose hardware problem?

2009-04-13 Thread John Almberg
I have what looks like a hardware problem with an Intel 1U server,  
which I am using mainly as a mysql database server for some of my  
bigger website clients.


The server went down last week with a badly corrupted file system.

After spending a day trying to fix the file system, we gave up and  
did a fresh install of FreeBSD, PF, and mysql, using our daily  
backups to restore the database. It all seemed to work fine until I  
switched the websites from the temporary database server that I had  
been using, onto the restored server.


The database ran well for about 2 minutes, then the server crashed  
again. The filesystem was again corrupted so badly that we could not  
even log in to look at the logs.


We've reinstalled FreeBSD again, just to be able to SSH into the box.  
It looks like there is probably a hardware problem, like a bad power  
supply or overheating CPU that fails when the load of the database is  
applied.


Problem is, I have no idea how to determine which bits are failing.  
Can anyone suggest a favorite book or website that focuses on how to  
troubleshoot hardware issues?


Thanks: John

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: How to diagnose hardware problem?

2009-04-13 Thread John Almberg


First things first; if the machine is still in warranty, don't mess  
with

it but send it back to the manufacturer and demand a replacement.


It is in warranty and I am following their process. I'm hoping to  
short-circuit that process by finding the problem on my own, if  
possible. Plus, I've never really had to deal with a hardware failure  
before, so it's a good learning process.




If the machine is out of warranty, you might consider replacing it
altogether. My employer's IT department ditches PC's and servers at  
the first
failure after the warranty runs out. Accordinf to them it's cheaper  
than

repairing them.


But if you want to have a go, this might help:
http://www.daileyint.com/hmdpc/manual.htm

Basically, it's just a problem of elimination.

First check if your machine is the only one having problems at the
hosting site. Maybe they have unstable electrical power.

Then make sure that all expansion cards and RAM are well-seated, and
that all connectors are OK. Also check that there is no dust build- 
up on

e.g. fans and heatsinks. If necessary, clean carefully with (dry, oil
free) compressed air. Dust can lead to short circuits or reduced
cooling. Next, look for capacitors that have leaked fluid, or have
bulging metal end plates on the motherboard; those are dead or
dying. It's a leading cause of motherboard failure. It is possible to
replace them, but you'll need the right equipment:
http://www.tomshardware.com/reviews/fixing-motherboard,1606.html

Install a monitoring program like mbmon or healthd, and have it log to
another machine or a USB stick mounted syncronously. Monitor CPU
temperature, fan speeds and the different voltages. Not all power
supplies are created equally. See the articles at tom's hardware:
  http://www.tomshardware.com/reviews/Components,1/Power-Supplies,6/

If you've found nothing so far, it's time to start swapping out
components, starting with the power supply.


This is all good stuff to try. Thanks.

-- John

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Fwd: How to diagnose hardware problem?

2009-04-13 Thread John Almberg

On Apr 13, 2009, at 2:32 PM, Wojciech Puchar wrote:



The database ran well for about 2 minutes, then the server crashed  
again. The filesystem was again corrupted so badly that we could  
not even log in to look at the logs.


did memtest? it looks like it's fine until you stress your hardware


I didn't, but I just installed it and am running it at the moment. So  
far, so good.


The machine has 1G of memory, but I could not get an mlock unless I  
request 100 Meg or less. That is, I need to run something like:


# memtest 100

Does this sound right? If I run with 125 Meg, I get the following:

# memtest 125
memtester version 4.0.8 (64-bit)
Copyright (C) 2007 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xf000
want 125MB (131072000 bytes)
got  125MB (131072000 bytes), trying mlock ...failed for unknown reason.
Continuing with unlocked memory; testing will be slower and less  
reliable.

Loop 1:
  Stuck Address   : ok
  Random Value: ok
  Compare XOR : ok
  Compare SUB : ok
  Compare MUL : ok
etc...


-- John
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: How to diagnose hardware problem?

2009-04-13 Thread Wojciech Puchar


The database ran well for about 2 minutes, then the server crashed again. The 
filesystem was again corrupted so badly that we could not even log in to look 
at the logs.


did memtest? it looks like it's fine until you stress your hardware



We've reinstalled FreeBSD again, just to be able to SSH into the box. It 
looks like there is probably a hardware problem, like a bad power supply or 
overheating CPU that fails when the load of the database is applied.


Problem is, I have no idea how to determine which bits are failing. Can 
anyone suggest a favorite book or website that focuses on how to troubleshoot 
hardware issues?


Thanks: John

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: How to diagnose hardware problem?

2009-04-13 Thread Roland Smith
On Mon, Apr 13, 2009 at 12:07:25PM -0400, John Almberg wrote:
 I have what looks like a hardware problem with an Intel 1U server,  
 which I am using mainly as a mysql database server for some of my  
 bigger website clients.
 
 The server went down last week with a badly corrupted file system.
 
 After spending a day trying to fix the file system, we gave up and  
 did a fresh install of FreeBSD, PF, and mysql, using our daily  
 backups to restore the database. It all seemed to work fine until I  
 switched the websites from the temporary database server that I had  
 been using, onto the restored server.
 
 The database ran well for about 2 minutes, then the server crashed  
 again. The filesystem was again corrupted so badly that we could not  
 even log in to look at the logs.
 
 We've reinstalled FreeBSD again, just to be able to SSH into the box.  
 It looks like there is probably a hardware problem, like a bad power  
 supply or overheating CPU that fails when the load of the database is  
 applied.
 
 Problem is, I have no idea how to determine which bits are failing.  
 Can anyone suggest a favorite book or website that focuses on how to  
 troubleshoot hardware issues?

First things first; if the machine is still in warranty, don't mess with
it but send it back to the manufacturer and demand a replacement.

If the machine is out of warranty, you might consider replacing it
altogether. My employer's IT department ditches PC's and servers at the first
failure after the warranty runs out. Accordinf to them it's cheaper than
repairing them.


But if you want to have a go, this might help:
http://www.daileyint.com/hmdpc/manual.htm 

Basically, it's just a problem of elimination.

First check if your machine is the only one having problems at the
hosting site. Maybe they have unstable electrical power.

Then make sure that all expansion cards and RAM are well-seated, and
that all connectors are OK. Also check that there is no dust build-up on
e.g. fans and heatsinks. If necessary, clean carefully with (dry, oil
free) compressed air. Dust can lead to short circuits or reduced
cooling. Next, look for capacitors that have leaked fluid, or have
bulging metal end plates on the motherboard; those are dead or
dying. It's a leading cause of motherboard failure. It is possible to
replace them, but you'll need the right equipment:
http://www.tomshardware.com/reviews/fixing-motherboard,1606.html

Install a monitoring program like mbmon or healthd, and have it log to
another machine or a USB stick mounted syncronously. Monitor CPU
temperature, fan speeds and the different voltages. Not all power
supplies are created equally. See the articles at tom's hardware:
  http://www.tomshardware.com/reviews/Components,1/Power-Supplies,6/ 

If you've found nothing so far, it's time to start swapping out
components, starting with the power supply.


Roland
-- 
R.F.Smith   http://www.xs4all.nl/~rsmith/
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)


pgpOGV68CCS4P.pgp
Description: PGP signature