Re: random hangs/reboots with Dell servers

2007-04-21 Thread Dimitris Zilaskos


Thnx to everyone for your replies,

A colleague has provided me with his hand notes of an older crash screen, 
it has the following(however i cant guarantee it is accurate, it is 
handnotes).


Fatal trap 12: page fault while in kernel mode
cpuid=0; apicid=00
fault virtual address=0xac
fault code=supervisor write,page not present
instruction pointer=0x20:0x
current process 79962
trap numbers : 12
panic: pagefault
cpuid=1
uptime=6d7423m55


	I do not believe the problems are related to envriroment or 
electricity, since during the period the problems occured we have switched  data center, 
and in addition to dell systems there are 150 more nodes from various 
vendors (HP mostly, but also IBM, supermicro, SUN, and various assembled 
towers), and none has shown similar behaviour. We dont run FreeBSD on them 
though. We have a Dell 2850 with Windows 2003 that has been running rock 
solid for at least 1 year. And the 1750 that under FreeBSD 5 would 
sometimes crash even under no load, with RHEL 4 pushes 60 Mbps of ftp data 
24/7 with ease for the last year without any problems.


	Disabling everything from BIOS was one of our first moves, though 
we havent disabled usb since sometimes we need to connect a keyboard. And 
no IPMI is running on a public interface:)


	Apart from all the nodes being SMP and Dell, I cannot think of 
anything else in common. Some are SCSI, some are SATA. All have a number 
of jails. Memory size is 2 GB (the 1750), the others have 4 GB.


	I have also asked Dell for some help, though they told me freebsd 
is not certified by Dell, they will try to look into it.



--


Dimitris Zilaskos

Department of Physics @ Aristotle University of Thessaloniki , Greece
PGP key : http://tassadar.physics.auth.gr/~dzila/pgp_public_key.asc
  http://egnatia.ee.auth.gr/~dzila/pgp_public_key.asc
MD5sum  : de2bd8f73d545f0e4caf3096894ad83f  pgp_public_key.asc


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: random hangs/reboots with Dell servers

2007-04-19 Thread Derek Ragona

At 05:54 AM 4/19/2007, Dimitris Zilaskos wrote:


Dear all,

I am trying to understand some long standing issues we have with freebsd 
and Dell servers.


Over the last 3 year we have installed freebsd 5.x and 6.x, with currently 
deployed version being 6.1, to a variety of of Dell rack mounted systems.


The Dell systems used so far are Poweredge 1750, 2950 (both scsi), and 
sc1425 (sata). All of them are dual CPU Xeon systems.


All these systems serve as mail/web servers, with 2 to 15 jails.

Installation has always proceeded normally without problems. However, 
after a few months of operation, all of these systems, purchased at 
different moments during the last 3 years, will begin rebooting randomly 
or freezing completely.


These reboots/freezes will at first occur once per 6 months, then 
gradually will move to to once per month, to normally stabilize around 
once per week, but in the case of the 1750 system once it even happened 
twice a day.


Load does not seem to matter, since even after shutting down all services 
in the servers, still random reboots occured.


So far we tried various tricks digged from the archives, like disabling 
ACPI, HT, but nothing changed.


We have migrated some systems that had these issues to RHEL compatible OS, 
and they run rock solid under heavy load.


Right now I have enabled kernel crash dumps and I am waiting for the next 
crash. But I understad a lot of people use FreeBSD with Dell servers, and 
I would like to listen on how to tackle this situation we are facing.


First make sure you are up-to-date on the FreeBSD version you are running, 
also make sure it is still a supported release.  If not, update your src 
and rebuild everything.


For the hardware I'd run complete diagnostics from dell on one of these 
servers, and any stress tests available as well.  If the hardware all 
checks out OK, I would look for either an environmental cause such as 
heat.  Heat can cause hardware problems that wouldn't show up 
otherwise.  If neither of these looks like the cause, then you may need to 
swap-out a system board, or RAM as it must be a hardware issue.


-Derek

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
MailScanner thanks transtec Computers for their support.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: random hangs/reboots with Dell servers

2007-04-19 Thread Bill Moran
In response to Derek Ragona [EMAIL PROTECTED]:

 At 05:54 AM 4/19/2007, Dimitris Zilaskos wrote:
 
  Dear all,
 
 I am trying to understand some long standing issues we have with freebsd 
 and Dell servers.
 
 Over the last 3 year we have installed freebsd 5.x and 6.x, with currently 
 deployed version being 6.1, to a variety of of Dell rack mounted systems.
 
 The Dell systems used so far are Poweredge 1750, 2950 (both scsi), and 
 sc1425 (sata). All of them are dual CPU Xeon systems.
 
 All these systems serve as mail/web servers, with 2 to 15 jails.
 
 Installation has always proceeded normally without problems. However, 
 after a few months of operation, all of these systems, purchased at 
 different moments during the last 3 years, will begin rebooting randomly 
 or freezing completely.
 
 These reboots/freezes will at first occur once per 6 months, then 
 gradually will move to to once per month, to normally stabilize around 
 once per week, but in the case of the 1750 system once it even happened 
 twice a day.
 
 Load does not seem to matter, since even after shutting down all services 
 in the servers, still random reboots occurred.
 
 So far we tried various tricks digged from the archives, like disabling 
 ACPI, HT, but nothing changed.
 
 We have migrated some systems that had these issues to RHEL compatible OS, 
 and they run rock solid under heavy load.
 
 Right now I have enabled kernel crash dumps and I am waiting for the next 
 crash. But I understand a lot of people use FreeBSD with Dell servers, and 
 I would like to listen on how to tackle this situation we are facing.

Sorry, I missed the original post on this.

We run a variety of Dell stuff where I work.  Lots of 1850 and 2850 units,
and some 1950 and 2950s, in addition to a few 850s and the like.

We're not having any problems.  We routinely see uptimes that span from
one maintenance window to the next without any unplanned reboots.

One thing we've had fun with is that Dell has issued a LOT of firmware
upgrades over the last year, and those are a pain to get applied to remote
systems.  However, I don't recall any stability problems prior to the
upgrades.

I know this isn't answering your question, but I thought I'd point out
that your experience is not typical.  Somewhere, you are having a problem
that _can_ be solved.

The various units you describe come in various configurations, I wonder if
you're picking a specific hardware combination that FreeBSD has trouble
with?

Otherwise, you're on the right path with the crash dumps.  Once you have
more details, post them to this or the -hackers list to see if you can
get the problem narrowed down.  However, if these systems are spontaneously
rebooting without a panic, crash dumps might not help.

You don't have IPMI enabled on a public interface, do you?

-- 
Bill Moran
http://www.potentialtech.com
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: random hangs/reboots with Dell servers

2007-04-19 Thread Chuck Swiger

On Apr 19, 2007, at 3:54 AM, Dimitris Zilaskos wrote:
Over the last 3 year we have installed freebsd 5.x and 6.x, with  
currently deployed version being 6.1, to a variety of of Dell rack  
mounted systems.


The Dell systems used so far are Poweredge 1750, 2950 (both scsi),  
and sc1425 (sata). All of them are dual CPU Xeon systems.


I've got a large number of Dell PowerEdge 1750, 1850, 2900, 2950  
deployed in various production environments, whereas some other  
clients are using HP ProLiant 360/370 boxen.  Both seem to be rock  
solid under either 5.4/5.5, or 6.1/6.2.  I've even got a pair of  
firewall boxes running nothing but NAT and SSHd, which are at 600+  
days of uptime:


FreeBSD 5.4-STABLE (FW) #0: Tue Jul 12 11:10:14 EDT 2005

Welcome to FreeBSD!
12:24PM  up 636 days, 19:26, 3 users, load averages: 0.25, 0.14, 0.04

(Machines running more services get OS or service related updates  
more frequently-- typically every month to every 3 months-- but I  
don't like to make changes to a running machine unless I expect the  
change to make an improvement which justifies the disruption.  For a  
non-SMP firewall which would involve loss of external network  
connectivity to update, nothing in 6.x is worth the cost to update to  
as yet, IMHO.)



All these systems serve as mail/web servers, with 2 to 15 jails.

Installation has always proceeded normally without problems.  
However, after a few months of operation, all of these systems,  
purchased at different moments during the last 3 years, will begin  
rebooting randomly or freezing completely.


These reboots/freezes will at first occur once per 6 months, then  
gradually will move to to once per month, to normally stabilize  
around once per week, but in the case of the 1750 system once it  
even happened twice a day.


Load does not seem to matter, since even after shutting down all  
services in the servers, still random reboots occured.


Sounds to be something hardware-related like a power-supply problem,  
if the failure rate is gradually getting shorter and is not  
correlated with load at all.


So far we tried various tricks digged from the archives, like  
disabling ACPI, HT, but nothing changed.


We have migrated some systems that had these issues to RHEL  
compatible OS, and they run rock solid under heavy load.


Hmm.  Well, you might have to wait for a few weeks or months to be  
able to get reasonable comparison of longer-term stability, but this  
at least implies that something like cooling or a failed fan aren't  
likely causes.


Right now I have enabled kernel crash dumps and I am waiting for  
the next crash. But I understad a lot of people use FreeBSD with  
Dell servers, and I would like to listen on how to tackle this  
situation we are facing.


Try to get a crash dump.  Also, you might find reviewing the BIOS  
options and disabling everything which is not needed, hopefully  
including USB, will help.


--
-Chuck

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]