Re: Complete hang during boot at boot2 prompt

2007-07-11 Thread Feargal Reilly
On Wed, 11 Jul 2007 07:34:02 -1000
NetOpsCenter <[EMAIL PROTECTED]> wrote:

> Feargal Reilly wrote:
> 
> >Hi,
> >
> >There, I yanked a memory module, and it booted fine, albeit
> >complaining about the degraded RAID array. However, when I
> >reinserted the memory, it continued to boot. I didn't have the
> >foresight to try it before I fiddled with the disks, but I
> >can't imagine that it had been seated incorrectly as the
> >server had been up for two months without problem. Also, the
> >BIOS tests passed, although I know they aren't too in depth.
> >I'll run sysutils/memtest anyway, and see what that throws up.
> >
> >Any other suggestions as to what caused the failure? I know
> >I've changed the conditions and may never be able to
> >reproduce it (nor do I want to), but if I've failing
> >hardware, I'd like a best guess as to where it is.
> >
>
> I have had memory chips walk out of the slots on several
> occasions. Sometimes its vibration or in Hawaii we have
> humidity issues occasionally that tend to cause this too.
> I have learned to spray the sockets and card connections with
> contact cleaner about every 6 months to avaid this problem.
> Especially in areas where servers are not in a cool
> environment.
> 

It was operating in a climate controlled server room, so in
theory the sub-optimal, being stacked with three other servers,
all without rails, so I guess vibration is one plausible
explanation.

-fr.

-- 
Feargal Reilly, Chief Techie, FBI.
PGP Key: 0xBD252C01 (expires: 2006-11-30)
Web: http://www.fbi.ie/ | Tel: +353.14988588 | Fax: +353.14988489
Communications House, 11 Sallymount Avenue, Ranelagh, Dublin 6.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Complete hang during boot at boot2 prompt

2007-07-11 Thread NetOpsCenter

Feargal Reilly wrote:


Hi,

I have a server which went down overnight, and
would not subsequently boot. A reboot was performed by
facilities staff before I got to look at it so I don't know what
was showing on the console. The reason for the outage is
unknown, and nothing showed in /var/log/messages, other than
routine ntpd time sync messages.

The server in question is a Intel SR1425BK1 server running
FreeBSD 6.2 amd64 GENERIC with a SATA RAID-1 array
provided by an onboard LSILogic MegaRAID controller.

When booted, it would pass the various BIOS screens without
problem, the RAID utility would say that the array was optimal,
and then FreeBSD would start to boot, but it couldn't get past
boot2:

 


FreeBSD/amd64 BOOT
 


Default: 0:ad(0,a)/boot/loader
boot:

At this point, the server emitted a single continous beep, and
nothing else happened. Keyboard input did nothing, although
Ctrl-Alt-Del still worked, and at one point a heart symbol
appeared after I hit keys randomly for a while.

My question is, what could have caused this failure? 


My initial guesses were either a memory failure or a really
badly corrupted boot sector, but I'm not convinced by either
explanation, for reasons outlined below.

I urgently needed the data to be online again, so I yanked one 
disk out of the machine and inserted it into another host, and

took the server back to the office.

There, I yanked a memory module, and it booted fine, albeit
complaining about the degraded RAID array. However, when I
reinserted the memory, it continued to boot. I didn't have the
foresight to try it before I fiddled with the disks, but I can't
imagine that it had been seated incorrectly as the server had
been up for two months without problem. Also, the BIOS tests
passed, although I know they aren't too in depth. I'll run
sysutils/memtest anyway, and see what that throws up.

Meanwhile, I inserted a replacement disk and rebuilt the RAID-1
array, and it is still booting fine, so my best guess now is a
corrupted boot sector. The disk that I removed to insert into
another host was ad4, which I'm guessing is the disk that it
would have being trying to boot from in the first place. So a
bad sector could be responsible, but it would seem to be very
convenient, as there does not appear to be any other data
corruption on the disk.

Also, I've run a short SMART test, and everything is okay as far
as it is concerned. I'm in the process of running a long test,
but that won't finish before I leave the office. If it were a
corrupted sector, would it be able to get to boot2?

Any other suggestions as to what caused the failure? I know I've
changed the conditions and may never be able to reproduce it
(nor do I want to), but if I've failing hardware, I'd like a
best guess as to where it is.

Thanks for your time,

-fr.

 


Aloha,

I have had memory chips walk out of the slots on several occasions. 
Sometimes its vibration or in Hawaii we have humidity issues 
occasionally that tend to cause this too.
I have learned to spray the sockets and card connections with contact 
cleaner about every 6 months to avaid this problem. Especially in areas 
where servers are not in a cool environment.




~Al Plant - Honolulu, Hawaii -  Phone:  808-284-2740
 + http://hawaiidakine.com + http://freebsdinfo.org + [EMAIL PROTECTED] +
 + http://internetohana.org   - Supporting - FreeBSD 6.* - 7.* +
"All that's really worth doing is what we do for others."- Lewis Carrol


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Complete hang during boot at boot2 prompt

2007-07-11 Thread Feargal Reilly

Hi,

I have a server which went down overnight, and
would not subsequently boot. A reboot was performed by
facilities staff before I got to look at it so I don't know what
was showing on the console. The reason for the outage is
unknown, and nothing showed in /var/log/messages, other than
routine ntpd time sync messages.

The server in question is a Intel SR1425BK1 server running
FreeBSD 6.2 amd64 GENERIC with a SATA RAID-1 array
provided by an onboard LSILogic MegaRAID controller.

When booted, it would pass the various BIOS screens without
problem, the RAID utility would say that the array was optimal,
and then FreeBSD would start to boot, but it couldn't get past
boot2:

>> FreeBSD/amd64 BOOT
Default: 0:ad(0,a)/boot/loader
boot:

At this point, the server emitted a single continous beep, and
nothing else happened. Keyboard input did nothing, although
Ctrl-Alt-Del still worked, and at one point a heart symbol
appeared after I hit keys randomly for a while.

My question is, what could have caused this failure? 

My initial guesses were either a memory failure or a really
badly corrupted boot sector, but I'm not convinced by either
explanation, for reasons outlined below.

I urgently needed the data to be online again, so I yanked one 
disk out of the machine and inserted it into another host, and
took the server back to the office.

There, I yanked a memory module, and it booted fine, albeit
complaining about the degraded RAID array. However, when I
reinserted the memory, it continued to boot. I didn't have the
foresight to try it before I fiddled with the disks, but I can't
imagine that it had been seated incorrectly as the server had
been up for two months without problem. Also, the BIOS tests
passed, although I know they aren't too in depth. I'll run
sysutils/memtest anyway, and see what that throws up.

Meanwhile, I inserted a replacement disk and rebuilt the RAID-1
array, and it is still booting fine, so my best guess now is a
corrupted boot sector. The disk that I removed to insert into
another host was ad4, which I'm guessing is the disk that it
would have being trying to boot from in the first place. So a
bad sector could be responsible, but it would seem to be very
convenient, as there does not appear to be any other data
corruption on the disk.

Also, I've run a short SMART test, and everything is okay as far
as it is concerned. I'm in the process of running a long test,
but that won't finish before I leave the office. If it were a
corrupted sector, would it be able to get to boot2?

Any other suggestions as to what caused the failure? I know I've
changed the conditions and may never be able to reproduce it
(nor do I want to), but if I've failing hardware, I'd like a
best guess as to where it is.

Thanks for your time,

-fr.

-- 
Feargal Reilly, Chief Techie, FBI.
PGP Key: 0xBD252C01 (expires: 2006-11-30)
Web: http://www.fbi.ie/ | Tel: +353.14988588 | Fax: +353.14988489
Communications House, 11 Sallymount Avenue, Ranelagh, Dublin 6.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"