FreeBSD and ECC memory?

2008-07-25 Thread Nejc Škoberne

Hello,

I am buying hardware for a FreeBSD server and me and my friend argue about
whether or not to by ECC RAM for the server. It is a HP ProLiant ML110 G4
machine and currently it has 2 x 512 HP DDR2 ECC memory.

My friend says buying ECC memory is not wise, because we would not profit from
it since this server will not need very high availability (but still we'd like
to make it a solid server). And also that ECC memory slows down memory
operations by 2-3% all together. Also, we would profit from buying non-ECC
memory because we already have 2 x 1GB non-ECC memory and if we:

 - buy extra 2 x 1GB non-ECC memory we'll have 4GB all together (4 x 1GB)
 - buy extra 2 x 1GB ECC memory we'll have 3GB all together (2 x 512MB + 2 x 
1GB)

1. So, what would you base your decision on? Is getting ECC worth losing 1GB of
   non-ECC memory?
2. What are your experiences with ECC?
3. Did self-halt because of a memory error (having ECC memory) ever happen to 
someone
   here?
4. If there is non-ECC memory installed, how does FreeBSD recognizes (corrects?)
   memory errors?

Thanks,
Nejc
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD and ECC memory?

2008-07-25 Thread Kris Kennaway

Nejc Škoberne wrote:

4. If there is non-ECC memory installed, how does FreeBSD recognizes 
(corrects?)

   memory errors?


By crashing or corrupting data, of course.  Not doing this is what ECC 
is for :)


Kris
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD and ECC memory?

2008-07-25 Thread Michael Powell
Nejc Škoberne wrote:

 Hello,
 
 I am buying hardware for a FreeBSD server and me and my friend argue about
 whether or not to by ECC RAM for the server. It is a HP ProLiant ML110 G4
 machine and currently it has 2 x 512 HP DDR2 ECC memory.
 
 My friend says buying ECC memory is not wise, because we would not profit
 from it since this server will not need very high availability (but still
 we'd like to make it a solid server). And also that ECC memory slows down
 memory operations by 2-3% all together. Also, we would profit from buying
 non-ECC memory because we already have 2 x 1GB non-ECC memory and if we:
 
   - buy extra 2 x 1GB non-ECC memory we'll have 4GB all together (4 x 1GB)
   - buy extra 2 x 1GB ECC memory we'll have 3GB all together (2 x 512MB +
   2 x 1GB)
 
 1. So, what would you base your decision on? Is getting ECC worth losing
 1GB of non-ECC memory?

My decision would be based upon what the server was going to be used for.
Home use, or non mission critical I'd say non-ECC is just fine. At work
for mission critical database, mail, etc I stick with ECC. Especially
when it comes to Windows, as Windows has a nasty habit of trying to mask
what's going on behind the scene. No way I'd run a large SQL database or
Exchange server without ECC.

I'd be more concerned with trying to buy all the memory at the same time so
the sticks were all identical, especially with regard to timing and speed
ratings. You can create a problem when you have stick(s) from one
manufacturer then add in different ones later. IMHO, in this particular
situation, my gut feeling from your description would be to go with the
4GB of non-ECC as it sounds like the scenario doesn't match the criteria I
use for justifying ECC as a must have.

 2. What are your experiences with ECC?
 3. Did self-halt because of a memory error (having ECC memory) ever happen
 to someone here?

If it does you have defective hardware that is in need of replacement. Yes,
I have had bad RAM; whether it's ECC or non-ECC isn't the issue when it is
simply defective.

 4. If there is non-ECC memory installed, how does FreeBSD recognizes
 (corrects?) memory errors?
 

Generally speaking this occurs more at the hardware level. Non-ECC RAM can
correct single bit errors while ECC is capable of fixing multi-bit errors.
However, should I become aware that ECC was fixing too many errors too
often I would consider there to be defective hardware present.

The purpose of these schemes is to compensate for the fact that in every so
many (some large number) of memory transactions there may be a bit that
gets flipped. If this is happening more often than (some large number) then
there is a defect present. ECC just buys you uptime in the event there
are more errors than there should be. 

In either case these bit flips should only happen extremely infrequently, if
ever at all. Consider that these schemes are sort of a fallback to an
extreme what if situation and really shouldn't come into play during most
nominal operations. I would go with ECC for something that just had to
stay up even in the face or errors. In either case I'd still replace the
defective component(s), irregardless of whether they were ECC or not. I've
seen thousands of machines with non-ECC RAM over the last 15 years that
worked just fine.

Just my $.02 here. YMMV and all other standard disclaimers apply.  :-)
-Mike




___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD and ECC memory?

2008-07-25 Thread Wojciech Puchar
4. If there is non-ECC memory installed, how does FreeBSD recognizes 
(corrects?)

  memory errors?


it's not OS job, but hardware.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD and ECC memory?

2008-07-25 Thread Erik Trulsson
On Fri, Jul 25, 2008 at 08:42:54AM -0400, Michael Powell wrote:
 Nejc ?koberne wrote:
 
  Hello,
  
  I am buying hardware for a FreeBSD server and me and my friend argue about
  whether or not to by ECC RAM for the server. It is a HP ProLiant ML110 G4
  machine and currently it has 2 x 512 HP DDR2 ECC memory.
  
  My friend says buying ECC memory is not wise, because we would not profit
  from it since this server will not need very high availability (but still
  we'd like to make it a solid server). And also that ECC memory slows down
  memory operations by 2-3% all together. Also, we would profit from buying
  non-ECC memory because we already have 2 x 1GB non-ECC memory and if we:
  
- buy extra 2 x 1GB non-ECC memory we'll have 4GB all together (4 x 1GB)
- buy extra 2 x 1GB ECC memory we'll have 3GB all together (2 x 512MB +
2 x 1GB)
  
  1. So, what would you base your decision on? Is getting ECC worth losing
  1GB of non-ECC memory?
 
 My decision would be based upon what the server was going to be used for.
 Home use, or non mission critical I'd say non-ECC is just fine. At work
 for mission critical database, mail, etc I stick with ECC. Especially
 when it comes to Windows, as Windows has a nasty habit of trying to mask
 what's going on behind the scene. No way I'd run a large SQL database or
 Exchange server without ECC.
 
 I'd be more concerned with trying to buy all the memory at the same time so
 the sticks were all identical, especially with regard to timing and speed
 ratings. You can create a problem when you have stick(s) from one
 manufacturer then add in different ones later. IMHO, in this particular
 situation, my gut feeling from your description would be to go with the
 4GB of non-ECC as it sounds like the scenario doesn't match the criteria I
 use for justifying ECC as a must have.
 
  2. What are your experiences with ECC?
  3. Did self-halt because of a memory error (having ECC memory) ever happen
  to someone here?
 
 If it does you have defective hardware that is in need of replacement. Yes,
 I have had bad RAM; whether it's ECC or non-ECC isn't the issue when it is
 simply defective.
 
  4. If there is non-ECC memory installed, how does FreeBSD recognizes
  (corrects?) memory errors?
  
 
 Generally speaking this occurs more at the hardware level. Non-ECC RAM can
 correct single bit errors while ECC is capable of fixing multi-bit errors.

No, non-ECC RAM cannot detect or correct any errors at all. (Old parity-RAM
could detect, but not correct, single-bit errors.)
ECC is generally capable of detecting multi-bit errors and fixing single-bit
errors. (There are different ways of implementing ECC. Some of them might
well be able to fix multi-bit errors too.)

 However, should I become aware that ECC was fixing too many errors too
 often I would consider there to be defective hardware present.
 
 The purpose of these schemes is to compensate for the fact that in every so
 many (some large number) of memory transactions there may be a bit that
 gets flipped. If this is happening more often than (some large number) then
 there is a defect present. ECC just buys you uptime in the event there
 are more errors than there should be. 

Note that random, spontaneous bit flips can happen (infrequently) even in
perfectly good RAM. (Due to cosmic rays, radioactive decay in surrounding
material, and similar stuff. (No, I am not joking.))  ECC will handle
such errors just fine, and that is the main reason why I would want ECC.

You can also get defective memory modules, but such can usually be detected
by running memtest86 or similar.  ECC can usually handle memory modules that
have some bits more or less permanently wrong, but such modules should be
replaced as soon as possible.

 
 In either case these bit flips should only happen extremely infrequently, if
 ever at all. Consider that these schemes are sort of a fallback to an
 extreme what if situation and really shouldn't come into play during most
 nominal operations. I would go with ECC for something that just had to
 stay up even in the face or errors. In either case I'd still replace the
 defective component(s), irregardless of whether they were ECC or not. I've
 seen thousands of machines with non-ECC RAM over the last 15 years that
 worked just fine.




-- 
Insert your favourite quote here.
Erik Trulsson
[EMAIL PROTECTED]
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD and ECC memory?

2008-07-25 Thread Michael Powell
Michael Powell wrote:

[snip]
 
 1. So, what would you base your decision on? Is getting ECC worth losing
 1GB of non-ECC memory?

Oh - and the other criterion I forgot to mention. If the box in question is
only being used by 1 or 2 people and can have downtime to fix defects
whenever you want, non-ECC is a consideration.

That being said, if it is a box depended upon by many people and expected to
be reliable I'd spend the money on 4GB of ECC from the outset. 

The difference being I need to put up a box and move on to other things.
Having to return and muck with complaints is a counter productive waste of
time that could be better spent with new projects.

[snip]
 
 Just my $.02 here. YMMV and all other standard disclaimers apply.  :-)
 -Mike
 



___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD and ECC memory?

2008-07-25 Thread Michael Powell
Erik Trulsson wrote:
[snip]
 
 No, non-ECC RAM cannot detect or correct any errors at all. (Old
 parity-RAM could detect, but not correct, single-bit errors.)

Actually quite true. The old parity bit functionality that was removed from
RAM and then called non-ECC actually migrated to the memory controller.
So yes, it isn't the RAM that does it. Poor choice of wording on my part.

 ECC is generally capable of detecting multi-bit errors and fixing
 single-bit errors. (There are different ways of implementing ECC. Some of
 them might well be able to fix multi-bit errors too.)

These cost lots of money. Common on Big Iron. In fact, non-ECC as an
option isn't even offerred on B.I.
 
[snip] 
 The purpose of these schemes is to compensate for the fact that in every
 so many (some large number) of memory transactions there may be a bit
 that gets flipped. If this is happening more often than (some large
 number) then there is a defect present. ECC just buys you uptime in the
 event there are more errors than there should be.
 
 Note that random, spontaneous bit flips can happen (infrequently) even in
 perfectly good RAM. (Due to cosmic rays, radioactive decay in surrounding
 material, and similar stuff. (No, I am not joking.))  ECC will handle
 such errors just fine, and that is the main reason why I would want ECC.

Especially true in satellites. The RAM in a satellite, or other spacecraft
must be radiation hardened to be usuable at all. And yes, it is no joke but
the truth what you say.

For me the dividing line is when lots of people depend on a box 24/7 it must
be ECC. A storage server in someones basement doesn't necessarily fit into
this category.
 
 You can also get defective memory modules, but such can usually be
 detected
 by running memtest86 or similar.  ECC can usually handle memory modules
 that have some bits more or less permanently wrong, but such modules
 should be replaced as soon as possible.


I agree - I was kind of harping on the defective idea. If it's defective
the manufacturer owes me a replacement, as in yesterday. 
 
[snip] 


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD and ECC memory?

2008-07-25 Thread Chad Leigh -- Shire.Net LLC




If you can afford it, always buy the ECC.  Saves your bacon more often  
than not in the long run.


My Mac Pro personal desktop has it.  It developed an issue in one of  
the sticks.  The system detected that many errors were getting  
corrected, and disabled the whole stick.  Sure I lost 2GB but the  
system did not go down.  I can shut it down and replace the memory at  
my leisure.


A Solaris 10 server I run has a memory stick creating many errors.   
System is still up and I can replace the stick when I can without a  
hard crash.


ECC cannot necessarily protect you from every memory issue but it can  
protect you from many sorts of memory issues and can keep you from  
having hard crashes and allow you to fix problems on your schedule  
instead of in a panic.  First time you have a hard crash due to memory  
issues you will wish you had ECC.  (And a motherboard that supports  
ChipKill)


Chad


---
Chad Leigh -- Shire.Net LLC
Your Web App and Email hosting provider
chad at shire.net



___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD and ECC memory?

2008-07-25 Thread Erik Trulsson
On Fri, Jul 25, 2008 at 09:28:11AM -0400, Michael Powell wrote:
 Erik Trulsson wrote:
 [snip]
  
  No, non-ECC RAM cannot detect or correct any errors at all. (Old
  parity-RAM could detect, but not correct, single-bit errors.)
 
 Actually quite true. The old parity bit functionality that was removed from
 RAM and then called non-ECC actually migrated to the memory controller.
 So yes, it isn't the RAM that does it. Poor choice of wording on my part.

Not quite.
Old parity-RAM usually had an extra parity bit for every 8 data bits.
By computing the parity (odd or even number of 1s) in the data bits
and comparing it with the value of the parity bit (which got set when you
wrote to memory) you could see if any single bit had been flipped.
(ECC also uses these extra bits, but uses them in a smarter way.)

Non-ECC RAM (as well as older non-parity RAM) does not have these extra bits
and therefore you cannot detect any spontaneous bit-flips inside the RAM,
since you have nothing to compare the data read against.

(The reason non-ECC RAM is more common than ECC RAM is simply that these
extra bits require extra chips on the memory module and therefore cost more
money - money which most people are not prepared to pay.)
(If you count the number of chips on a non-ECC memory module you will find
that the number of chips on it is usually a multiple of 8, while on ECC- or
parity-RAM it is usually a multiple of 9.)


Many modern memory controllers do have parity checking (or even ECC) on the
busses between controller and RAM and between controller and CPU.  This lets
you detect (or even fix) any errors may happen as data is transferred from
RAM to CPU.  It does not let you detect random errors inside the RAM, which
parity or ECC can let you do.


 
  ECC is generally capable of detecting multi-bit errors and fixing
  single-bit errors. (There are different ways of implementing ECC. Some of
  them might well be able to fix multi-bit errors too.)
 
 These cost lots of money. Common on Big Iron. In fact, non-ECC as an
 option isn't even offerred on B.I.
  
 [snip] 
  The purpose of these schemes is to compensate for the fact that in every
  so many (some large number) of memory transactions there may be a bit
  that gets flipped. If this is happening more often than (some large
  number) then there is a defect present. ECC just buys you uptime in the
  event there are more errors than there should be.
  
  Note that random, spontaneous bit flips can happen (infrequently) even in
  perfectly good RAM. (Due to cosmic rays, radioactive decay in surrounding
  material, and similar stuff. (No, I am not joking.))  ECC will handle
  such errors just fine, and that is the main reason why I would want ECC.
 
 Especially true in satellites. The RAM in a satellite, or other spacecraft
 must be radiation hardened to be usuable at all. And yes, it is no joke but
 the truth what you say.
 
 For me the dividing line is when lots of people depend on a box 24/7 it must
 be ECC. A storage server in someones basement doesn't necessarily fit into
 this category.

It depends also on what kind of data is stored on the server.  One of the
really nasty problems that can occur with random bit-flips in non-ECC RAM is
that important data can get silently corrupted.  You can get an error in
your database or spreadsheet or payroll data or whatever without noticing
until it is too late (by which time all your backups will probably have this
wrong data too.)  Depending on the data this can be VERY bad, even if it is
a system that is only used occasionally by a few people.

Memory errors which cause the computer to crash can be quite disruptive, but
they are at least easily noticed, and can then be handled.

  
  You can also get defective memory modules, but such can usually be
  detected
  by running memtest86 or similar.  ECC can usually handle memory modules
  that have some bits more or less permanently wrong, but such modules
  should be replaced as soon as possible.
 
 
 I agree - I was kind of harping on the defective idea. If it's defective
 the manufacturer owes me a replacement, as in yesterday. 

Yes, and in the (luckily fairly uncommon) case that one of the chips on a
memory module suddenly decides to stop working, then ECC can serve the same
purpose as RAID does for disks - it allows the system to keep going until
you have time to replace the broken part. (Which should be done ASAP since
if you get random bit-flips in addition to a broken chip, ECC will not be
able to correct those bits.)


-- 
Insert your favourite quote here.
Erik Trulsson
[EMAIL PROTECTED]
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]