Re: Dell PE4600 RAID5 server failing

2007-11-14 Thread Derek Ragona

At 09:00 AM 11/14/2007, Barnaby Scott wrote:
I suspect I already know the answer to this, which is that the trouble I 
am having is nothing to do with the OS at all, but I have to ask, because 
I am otherwise up against a total brick wall!


I bought a second-hand Dell Poweredge 4600 and installed FreeBSD 6.2 
earlier this year. I had it set up with RAID5 using its PERC3/DC 
controller, with 7 x 73GB disks (+ 1 hot spare). So far so good, and it 
worked faultlessly as a Samba server for several months.


At the beginning of October, it went down, reporting a mismatch between 
the configuration on the NVRAM and the disks. With help from Dell support, 
I managed to recreate the RAID array and it worked again for a month.


In early November it happened again, and has kept happening since. At one 
point it appeared that the backplane was faulty, so I replaced that, but I 
cannot keep the server up for more than a day or so without this 
'mismatch' poblem.


What about diagnostics on the hardware you may ask? I have run all the 
diagnostic tools that Dell can supply - several times - and the server 
declares itself to be totally fault-free.


My specific questions therefore:

Is there any way at all that FreeBSD could be invloved with this problem? 
(I did notice for example that the Dell PERC3/DC controller was not in the 
list of supported hardware - but then again, why did it work for several 
months?)


Can I use FreeBSD to tell me anything about the fault that Dell's 
diagnostic tools haven't found?


(I do hope someone might be able to help - Dell are trying to get me to 
switch to a 'supported' OS!)



Thanks

Barnaby Scott


It doesn't sound like any OS issue as you set up the RAID outside the 
OS.  It may be a bad drive or drive(s).  Most RAID drives have RAID 
information written to the drives, and if this becomes unreadable you will 
have RAID faults.


Another likely culprit is heat.  Overheating drives often fail.  Are you 
sure the temperatures in the drive enclosure is OK?


If you can, run diagnostics on the drives, this usually requires running 
these with the drives taken out of the RAID array though.


-Derek

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
MailScanner thanks transtec Computers for their support.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Dell PE4600 RAID5 server failing

2007-11-14 Thread Barnaby Scott

Derek Ragona wrote:

At 09:00 AM 11/14/2007, Barnaby Scott wrote:
I suspect I already know the answer to this, which is that the trouble 
I am having is nothing to do with the OS at all, but I have to ask, 
because I am otherwise up against a total brick wall!


I bought a second-hand Dell Poweredge 4600 and installed FreeBSD 6.2 
earlier this year. I had it set up with RAID5 using its PERC3/DC 
controller, with 7 x 73GB disks (+ 1 hot spare). So far so good, and 
it worked faultlessly as a Samba server for several months.


At the beginning of October, it went down, reporting a mismatch 
between the configuration on the NVRAM and the disks. With help from 
Dell support, I managed to recreate the RAID array and it worked again 
for a month.


In early November it happened again, and has kept happening since. At 
one point it appeared that the backplane was faulty, so I replaced 
that, but I cannot keep the server up for more than a day or so 
without this 'mismatch' poblem.


What about diagnostics on the hardware you may ask? I have run all the 
diagnostic tools that Dell can supply - several times - and the server 
declares itself to be totally fault-free.


My specific questions therefore:

Is there any way at all that FreeBSD could be invloved with this 
problem? (I did notice for example that the Dell PERC3/DC controller 
was not in the list of supported hardware - but then again, why did it 
work for several months?)


Can I use FreeBSD to tell me anything about the fault that Dell's 
diagnostic tools haven't found?


(I do hope someone might be able to help - Dell are trying to get me 
to switch to a 'supported' OS!)



Thanks

Barnaby Scott


It doesn't sound like any OS issue as you set up the RAID outside the 
OS.  It may be a bad drive or drive(s).  Most RAID drives have RAID 
information written to the drives, and if this becomes unreadable you 
will have RAID faults.


Another likely culprit is heat.  Overheating drives often fail.  Are you 
sure the temperatures in the drive enclosure is OK?


If you can, run diagnostics on the drives, this usually requires running 
these with the drives taken out of the RAID array though.


-Derek



Thanks for replying - as I said, this is a long shot trying to see if 
there is any OS involvement.


The drives are fine - I have used two different tools to analyse them 
while the computer is booted from a live CD and the RAID configuration 
cleared on the controller. Besides, you would expect one drive to fail 
at a time, and if this happened, the hot spare would surely be pressed 
into service. Nothing like this has happened though - the controller is 
reporting several drives (not always the same ones) failed 
simultaneously, but when the array is re-created from the disks, 
everything works fine. Problem is, it goes down again a day or so later.


As for heat, there is nothing being reported there and the fans that 
cool that area are working.


Any other ideas gratefully received!

Barnaby Scott
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: Dell PE4600 RAID5 server failing

2007-11-14 Thread Tamouh H.
 
 Derek Ragona wrote:
  At 09:00 AM 11/14/2007, Barnaby Scott wrote:
  I suspect I already know the answer to this, which is that the 
  trouble I am having is nothing to do with the OS at all, 
 but I have 
  to ask, because I am otherwise up against a total brick wall!
 
  I bought a second-hand Dell Poweredge 4600 and installed 
 FreeBSD 6.2 
  earlier this year. I had it set up with RAID5 using its PERC3/DC 
  controller, with 7 x 73GB disks (+ 1 hot spare). So far so 
 good, and 
  it worked faultlessly as a Samba server for several months.
 
  At the beginning of October, it went down, reporting a mismatch 
  between the configuration on the NVRAM and the disks. With 
 help from 
  Dell support, I managed to recreate the RAID array and it worked 
  again for a month.
 
  In early November it happened again, and has kept 
 happening since. At 
  one point it appeared that the backplane was faulty, so I replaced 
  that, but I cannot keep the server up for more than a day or so 
  without this 'mismatch' poblem.
 
  What about diagnostics on the hardware you may ask? I have run all 
  the diagnostic tools that Dell can supply - several times 
 - and the 
  server declares itself to be totally fault-free.
 
  My specific questions therefore:
 
  Is there any way at all that FreeBSD could be invloved with this 
  problem? (I did notice for example that the Dell PERC3/DC 
 controller 
  was not in the list of supported hardware - but then 
 again, why did 
  it work for several months?)
 
  Can I use FreeBSD to tell me anything about the fault that Dell's 
  diagnostic tools haven't found?
 
  (I do hope someone might be able to help - Dell are trying 
 to get me 
  to switch to a 'supported' OS!)
 
 
  Thanks
 
  Barnaby Scott
  
  It doesn't sound like any OS issue as you set up the RAID 
 outside the 
  OS.  It may be a bad drive or drive(s).  Most RAID drives have RAID 
  information written to the drives, and if this becomes 
 unreadable you 
  will have RAID faults.
  
  Another likely culprit is heat.  Overheating drives often 
 fail.  Are 
  you sure the temperatures in the drive enclosure is OK?
  
  If you can, run diagnostics on the drives, this usually requires 
  running these with the drives taken out of the RAID array though.
  
  -Derek
  
 
 Thanks for replying - as I said, this is a long shot trying 
 to see if there is any OS involvement.
 
 The drives are fine - I have used two different tools to 
 analyse them while the computer is booted from a live CD and 
 the RAID configuration cleared on the controller. Besides, 
 you would expect one drive to fail at a time, and if this 
 happened, the hot spare would surely be pressed into service. 
 Nothing like this has happened though - the controller is 
 reporting several drives (not always the same ones) failed 
 simultaneously, but when the array is re-created from the 
 disks, everything works fine. Problem is, it goes down again 
 a day or so later.
 
 As for heat, there is nothing being reported there and the 
 fans that cool that area are working.
 
 Any other ideas gratefully received!
 
 Barnaby Scott

This is very unlikely to be OS related. But here are few pointers:

1) Check the make/model of the drives. Certain types of make/model SCSI drives 
had a glitch in them a while ago with a certain firmware that they'd disconnect 
from a RAID. I had a personal experience with these ones (Seagate U320).

2) What did happen in October? Anything hardware, software, power wise has 
occurred ?

3) NVRAM and Disk mismatch, I'd say check the controller, backup battery 
present but weak ?

4) Unlikely to be the source, but run a test on your physical RAM using 
MEMTEST86+ and check the power supply is sufficient and working properly.

 


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: Dell PE4600 RAID5 server failing

2007-11-14 Thread Derek Ragona

At 12:12 PM 11/14/2007, Tamouh H. wrote:


 Derek Ragona wrote:
  At 09:00 AM 11/14/2007, Barnaby Scott wrote:
  I suspect I already know the answer to this, which is that the
  trouble I am having is nothing to do with the OS at all,
 but I have
  to ask, because I am otherwise up against a total brick wall!
 
  I bought a second-hand Dell Poweredge 4600 and installed
 FreeBSD 6.2
  earlier this year. I had it set up with RAID5 using its PERC3/DC
  controller, with 7 x 73GB disks (+ 1 hot spare). So far so
 good, and
  it worked faultlessly as a Samba server for several months.
 
  At the beginning of October, it went down, reporting a mismatch
  between the configuration on the NVRAM and the disks. With
 help from
  Dell support, I managed to recreate the RAID array and it worked
  again for a month.
 
  In early November it happened again, and has kept
 happening since. At
  one point it appeared that the backplane was faulty, so I replaced
  that, but I cannot keep the server up for more than a day or so
  without this 'mismatch' poblem.
 
  What about diagnostics on the hardware you may ask? I have run all
  the diagnostic tools that Dell can supply - several times
 - and the
  server declares itself to be totally fault-free.
 
  My specific questions therefore:
 
  Is there any way at all that FreeBSD could be invloved with this
  problem? (I did notice for example that the Dell PERC3/DC
 controller
  was not in the list of supported hardware - but then
 again, why did
  it work for several months?)
 
  Can I use FreeBSD to tell me anything about the fault that Dell's
  diagnostic tools haven't found?
 
  (I do hope someone might be able to help - Dell are trying
 to get me
  to switch to a 'supported' OS!)
 
 
  Thanks
 
  Barnaby Scott
 
  It doesn't sound like any OS issue as you set up the RAID
 outside the
  OS.  It may be a bad drive or drive(s).  Most RAID drives have RAID
  information written to the drives, and if this becomes
 unreadable you
  will have RAID faults.
 
  Another likely culprit is heat.  Overheating drives often
 fail.  Are
  you sure the temperatures in the drive enclosure is OK?
 
  If you can, run diagnostics on the drives, this usually requires
  running these with the drives taken out of the RAID array though.
 
  -Derek
 

 Thanks for replying - as I said, this is a long shot trying
 to see if there is any OS involvement.

 The drives are fine - I have used two different tools to
 analyse them while the computer is booted from a live CD and
 the RAID configuration cleared on the controller. Besides,
 you would expect one drive to fail at a time, and if this
 happened, the hot spare would surely be pressed into service.
 Nothing like this has happened though - the controller is
 reporting several drives (not always the same ones) failed
 simultaneously, but when the array is re-created from the
 disks, everything works fine. Problem is, it goes down again
 a day or so later.

 As for heat, there is nothing being reported there and the
 fans that cool that area are working.

 Any other ideas gratefully received!

 Barnaby Scott

This is very unlikely to be OS related. But here are few pointers:

1) Check the make/model of the drives. Certain types of make/model SCSI 
drives had a glitch in them a while ago with a certain firmware that 
they'd disconnect from a RAID. I had a personal experience with these ones 
(Seagate U320).


2) What did happen in October? Anything hardware, software, power wise has 
occurred ?


3) NVRAM and Disk mismatch, I'd say check the controller, backup battery 
present but weak ?


4) Unlikely to be the source, but run a test on your physical RAM using 
MEMTEST86+ and check the power supply is sufficient and working properly.





I've had some raid drives disconnect and go missing, which all cleared and 
was rebuilt on a full power-off reboot.  I belive this is due to some power 
issues in my area.  Specifically my line power from the utility was running 
high, over 127 volts, making over-voltage spikes prevalent.  On a couple 
spikes I saw the drives disconnect.


So it could be power related.

On temperature, I would put in a temperature probe and check it from the 
external probe.  Some remote KVM solutions now include temperature probes.


-Derek

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
MailScanner thanks transtec Computers for their support.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]