I recently had to deal with a brand new PowerEdge 1950 (dual Core2 Xeons) which began to reboot itself at random, and eventually could not get all the way through the boot sequence before rebooting again.

It worked OK from a SL boot CD, and passed all the Dell diags fine. However, no 'operational' kernel would work. Turned out one of the CPUs was bad and had to be replaced. The Dell diags clearly do not do significant testing on the CPU in all cases - my guess is some specific sequence of instructions triggered the error in this case.

If you have a similar (ideally identical) machine and can swap the hard drives you should be able to tell if this is a software problem by booting from the 'known good' disk. If that doesn't solve it then there is a hardware problem. The usual things to try would be to swap the RAM and PSU, remove all cards but the graphics one and try swapping it, etc. If that doesn't help then it's either the mainboard or CPU, and at that point it's time to call Dell and tell them to send someone out with spare parts. I've found them to be perfectly open to doing that sort of thing provided the simpler possibilities have been eliminated.

Robert


On Thu, Mar 29, 2007 at 06:14:24PM -0700, Michael Hannon wrote:
Greetings. One of the profs here has got a Dell Optiplex 620 running SL
4.4.  It has an Intel Pentium D chip that is dual-core-capable, and it
has the capability enabled.

From time to time the owner has had problems with the system hanging.
He usually solves the problem with some stupid computer trick, such as
cycling the power, etc.  But yesterday he had one of the usual hangs,
except that it was one from which he could not recover.

The problem is very similar to one that was reported on the SL-users
list not too long ago.  In more detail, the system either has a kernel
panic during the boot sequence, or it boots all the way and allows a
login, but almost immediately has a "hard freeze" that requires a power
cycle to thaw.

We've run the Dell diagnostic utilities to test processor, memory, and
video, but we didn't find any problems.

The system is running the latest kernel, but it will not boot reliably
with any of the four SMP kernels currently installed on it.

We've tried all of the voodoo that I saw mentioned in the previous
discussion (run-level 3, no "rhgb quiet" on the command line), but the
only thing that seems to work reliably is to boot with the uni-processor
kernel.

This is probably an acceptable work-around for the time being, and my
hope is that when we do a fresh install with SL 5, we'll all be happy
again.  But I wonder if any of y'all can provide any further insight
into this.

Could the machine be overheating, by any chance?  I've had similar
difficulties with a couple of Dell Optiplex 620 Small Form Factor
machines running SL305.  They would get into a state where they would
only boot with the uniprocessor kernel, but if you left them off
overnight to cool down, they would boot the SMP kernel again.  Several
other identical machines kept on working fine, though.
Eva.

Reply via email to