FreeBSD and ECC memory?
Hello, I am buying hardware for a FreeBSD server and me and my friend argue about whether or not to by ECC RAM for the server. It is a HP ProLiant ML110 G4 machine and currently it has 2 x 512 HP DDR2 ECC memory. My friend says buying ECC memory is not wise, because we would not profit from it since this server will not need very high availability (but still we'd like to make it a solid server). And also that ECC memory slows down memory operations by 2-3% all together. Also, we would profit from buying non-ECC memory because we already have 2 x 1GB non-ECC memory and if we: - buy extra 2 x 1GB non-ECC memory we'll have 4GB all together (4 x 1GB) - buy extra 2 x 1GB ECC memory we'll have 3GB all together (2 x 512MB + 2 x 1GB) 1. So, what would you base your decision on? Is getting ECC worth losing 1GB of non-ECC memory? 2. What are your experiences with ECC? 3. Did self-halt because of a memory error (having ECC memory) ever happen to someone here? 4. If there is non-ECC memory installed, how does FreeBSD recognizes (corrects?) memory errors? Thanks, Nejc ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: FreeBSD and ECC memory?
Nejc Škoberne wrote: 4. If there is non-ECC memory installed, how does FreeBSD recognizes (corrects?) memory errors? By crashing or corrupting data, of course. Not doing this is what ECC is for :) Kris ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: FreeBSD and ECC memory?
Nejc Škoberne wrote: Hello, I am buying hardware for a FreeBSD server and me and my friend argue about whether or not to by ECC RAM for the server. It is a HP ProLiant ML110 G4 machine and currently it has 2 x 512 HP DDR2 ECC memory. My friend says buying ECC memory is not wise, because we would not profit from it since this server will not need very high availability (but still we'd like to make it a solid server). And also that ECC memory slows down memory operations by 2-3% all together. Also, we would profit from buying non-ECC memory because we already have 2 x 1GB non-ECC memory and if we: - buy extra 2 x 1GB non-ECC memory we'll have 4GB all together (4 x 1GB) - buy extra 2 x 1GB ECC memory we'll have 3GB all together (2 x 512MB + 2 x 1GB) 1. So, what would you base your decision on? Is getting ECC worth losing 1GB of non-ECC memory? My decision would be based upon what the server was going to be used for. Home use, or non mission critical I'd say non-ECC is just fine. At work for mission critical database, mail, etc I stick with ECC. Especially when it comes to Windows, as Windows has a nasty habit of trying to mask what's going on behind the scene. No way I'd run a large SQL database or Exchange server without ECC. I'd be more concerned with trying to buy all the memory at the same time so the sticks were all identical, especially with regard to timing and speed ratings. You can create a problem when you have stick(s) from one manufacturer then add in different ones later. IMHO, in this particular situation, my gut feeling from your description would be to go with the 4GB of non-ECC as it sounds like the scenario doesn't match the criteria I use for justifying ECC as a must have. 2. What are your experiences with ECC? 3. Did self-halt because of a memory error (having ECC memory) ever happen to someone here? If it does you have defective hardware that is in need of replacement. Yes, I have had bad RAM; whether it's ECC or non-ECC isn't the issue when it is simply defective. 4. If there is non-ECC memory installed, how does FreeBSD recognizes (corrects?) memory errors? Generally speaking this occurs more at the hardware level. Non-ECC RAM can correct single bit errors while ECC is capable of fixing multi-bit errors. However, should I become aware that ECC was fixing too many errors too often I would consider there to be defective hardware present. The purpose of these schemes is to compensate for the fact that in every so many (some large number) of memory transactions there may be a bit that gets flipped. If this is happening more often than (some large number) then there is a defect present. ECC just buys you uptime in the event there are more errors than there should be. In either case these bit flips should only happen extremely infrequently, if ever at all. Consider that these schemes are sort of a fallback to an extreme what if situation and really shouldn't come into play during most nominal operations. I would go with ECC for something that just had to stay up even in the face or errors. In either case I'd still replace the defective component(s), irregardless of whether they were ECC or not. I've seen thousands of machines with non-ECC RAM over the last 15 years that worked just fine. Just my $.02 here. YMMV and all other standard disclaimers apply. :-) -Mike ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: FreeBSD and ECC memory?
4. If there is non-ECC memory installed, how does FreeBSD recognizes (corrects?) memory errors? it's not OS job, but hardware. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: FreeBSD and ECC memory?
On Fri, Jul 25, 2008 at 08:42:54AM -0400, Michael Powell wrote: Nejc ?koberne wrote: Hello, I am buying hardware for a FreeBSD server and me and my friend argue about whether or not to by ECC RAM for the server. It is a HP ProLiant ML110 G4 machine and currently it has 2 x 512 HP DDR2 ECC memory. My friend says buying ECC memory is not wise, because we would not profit from it since this server will not need very high availability (but still we'd like to make it a solid server). And also that ECC memory slows down memory operations by 2-3% all together. Also, we would profit from buying non-ECC memory because we already have 2 x 1GB non-ECC memory and if we: - buy extra 2 x 1GB non-ECC memory we'll have 4GB all together (4 x 1GB) - buy extra 2 x 1GB ECC memory we'll have 3GB all together (2 x 512MB + 2 x 1GB) 1. So, what would you base your decision on? Is getting ECC worth losing 1GB of non-ECC memory? My decision would be based upon what the server was going to be used for. Home use, or non mission critical I'd say non-ECC is just fine. At work for mission critical database, mail, etc I stick with ECC. Especially when it comes to Windows, as Windows has a nasty habit of trying to mask what's going on behind the scene. No way I'd run a large SQL database or Exchange server without ECC. I'd be more concerned with trying to buy all the memory at the same time so the sticks were all identical, especially with regard to timing and speed ratings. You can create a problem when you have stick(s) from one manufacturer then add in different ones later. IMHO, in this particular situation, my gut feeling from your description would be to go with the 4GB of non-ECC as it sounds like the scenario doesn't match the criteria I use for justifying ECC as a must have. 2. What are your experiences with ECC? 3. Did self-halt because of a memory error (having ECC memory) ever happen to someone here? If it does you have defective hardware that is in need of replacement. Yes, I have had bad RAM; whether it's ECC or non-ECC isn't the issue when it is simply defective. 4. If there is non-ECC memory installed, how does FreeBSD recognizes (corrects?) memory errors? Generally speaking this occurs more at the hardware level. Non-ECC RAM can correct single bit errors while ECC is capable of fixing multi-bit errors. No, non-ECC RAM cannot detect or correct any errors at all. (Old parity-RAM could detect, but not correct, single-bit errors.) ECC is generally capable of detecting multi-bit errors and fixing single-bit errors. (There are different ways of implementing ECC. Some of them might well be able to fix multi-bit errors too.) However, should I become aware that ECC was fixing too many errors too often I would consider there to be defective hardware present. The purpose of these schemes is to compensate for the fact that in every so many (some large number) of memory transactions there may be a bit that gets flipped. If this is happening more often than (some large number) then there is a defect present. ECC just buys you uptime in the event there are more errors than there should be. Note that random, spontaneous bit flips can happen (infrequently) even in perfectly good RAM. (Due to cosmic rays, radioactive decay in surrounding material, and similar stuff. (No, I am not joking.)) ECC will handle such errors just fine, and that is the main reason why I would want ECC. You can also get defective memory modules, but such can usually be detected by running memtest86 or similar. ECC can usually handle memory modules that have some bits more or less permanently wrong, but such modules should be replaced as soon as possible. In either case these bit flips should only happen extremely infrequently, if ever at all. Consider that these schemes are sort of a fallback to an extreme what if situation and really shouldn't come into play during most nominal operations. I would go with ECC for something that just had to stay up even in the face or errors. In either case I'd still replace the defective component(s), irregardless of whether they were ECC or not. I've seen thousands of machines with non-ECC RAM over the last 15 years that worked just fine. -- Insert your favourite quote here. Erik Trulsson [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: FreeBSD and ECC memory?
Michael Powell wrote: [snip] 1. So, what would you base your decision on? Is getting ECC worth losing 1GB of non-ECC memory? Oh - and the other criterion I forgot to mention. If the box in question is only being used by 1 or 2 people and can have downtime to fix defects whenever you want, non-ECC is a consideration. That being said, if it is a box depended upon by many people and expected to be reliable I'd spend the money on 4GB of ECC from the outset. The difference being I need to put up a box and move on to other things. Having to return and muck with complaints is a counter productive waste of time that could be better spent with new projects. [snip] Just my $.02 here. YMMV and all other standard disclaimers apply. :-) -Mike ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: FreeBSD and ECC memory?
Erik Trulsson wrote: [snip] No, non-ECC RAM cannot detect or correct any errors at all. (Old parity-RAM could detect, but not correct, single-bit errors.) Actually quite true. The old parity bit functionality that was removed from RAM and then called non-ECC actually migrated to the memory controller. So yes, it isn't the RAM that does it. Poor choice of wording on my part. ECC is generally capable of detecting multi-bit errors and fixing single-bit errors. (There are different ways of implementing ECC. Some of them might well be able to fix multi-bit errors too.) These cost lots of money. Common on Big Iron. In fact, non-ECC as an option isn't even offerred on B.I. [snip] The purpose of these schemes is to compensate for the fact that in every so many (some large number) of memory transactions there may be a bit that gets flipped. If this is happening more often than (some large number) then there is a defect present. ECC just buys you uptime in the event there are more errors than there should be. Note that random, spontaneous bit flips can happen (infrequently) even in perfectly good RAM. (Due to cosmic rays, radioactive decay in surrounding material, and similar stuff. (No, I am not joking.)) ECC will handle such errors just fine, and that is the main reason why I would want ECC. Especially true in satellites. The RAM in a satellite, or other spacecraft must be radiation hardened to be usuable at all. And yes, it is no joke but the truth what you say. For me the dividing line is when lots of people depend on a box 24/7 it must be ECC. A storage server in someones basement doesn't necessarily fit into this category. You can also get defective memory modules, but such can usually be detected by running memtest86 or similar. ECC can usually handle memory modules that have some bits more or less permanently wrong, but such modules should be replaced as soon as possible. I agree - I was kind of harping on the defective idea. If it's defective the manufacturer owes me a replacement, as in yesterday. [snip] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: FreeBSD and ECC memory?
If you can afford it, always buy the ECC. Saves your bacon more often than not in the long run. My Mac Pro personal desktop has it. It developed an issue in one of the sticks. The system detected that many errors were getting corrected, and disabled the whole stick. Sure I lost 2GB but the system did not go down. I can shut it down and replace the memory at my leisure. A Solaris 10 server I run has a memory stick creating many errors. System is still up and I can replace the stick when I can without a hard crash. ECC cannot necessarily protect you from every memory issue but it can protect you from many sorts of memory issues and can keep you from having hard crashes and allow you to fix problems on your schedule instead of in a panic. First time you have a hard crash due to memory issues you will wish you had ECC. (And a motherboard that supports ChipKill) Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: FreeBSD and ECC memory?
On Fri, Jul 25, 2008 at 09:28:11AM -0400, Michael Powell wrote: Erik Trulsson wrote: [snip] No, non-ECC RAM cannot detect or correct any errors at all. (Old parity-RAM could detect, but not correct, single-bit errors.) Actually quite true. The old parity bit functionality that was removed from RAM and then called non-ECC actually migrated to the memory controller. So yes, it isn't the RAM that does it. Poor choice of wording on my part. Not quite. Old parity-RAM usually had an extra parity bit for every 8 data bits. By computing the parity (odd or even number of 1s) in the data bits and comparing it with the value of the parity bit (which got set when you wrote to memory) you could see if any single bit had been flipped. (ECC also uses these extra bits, but uses them in a smarter way.) Non-ECC RAM (as well as older non-parity RAM) does not have these extra bits and therefore you cannot detect any spontaneous bit-flips inside the RAM, since you have nothing to compare the data read against. (The reason non-ECC RAM is more common than ECC RAM is simply that these extra bits require extra chips on the memory module and therefore cost more money - money which most people are not prepared to pay.) (If you count the number of chips on a non-ECC memory module you will find that the number of chips on it is usually a multiple of 8, while on ECC- or parity-RAM it is usually a multiple of 9.) Many modern memory controllers do have parity checking (or even ECC) on the busses between controller and RAM and between controller and CPU. This lets you detect (or even fix) any errors may happen as data is transferred from RAM to CPU. It does not let you detect random errors inside the RAM, which parity or ECC can let you do. ECC is generally capable of detecting multi-bit errors and fixing single-bit errors. (There are different ways of implementing ECC. Some of them might well be able to fix multi-bit errors too.) These cost lots of money. Common on Big Iron. In fact, non-ECC as an option isn't even offerred on B.I. [snip] The purpose of these schemes is to compensate for the fact that in every so many (some large number) of memory transactions there may be a bit that gets flipped. If this is happening more often than (some large number) then there is a defect present. ECC just buys you uptime in the event there are more errors than there should be. Note that random, spontaneous bit flips can happen (infrequently) even in perfectly good RAM. (Due to cosmic rays, radioactive decay in surrounding material, and similar stuff. (No, I am not joking.)) ECC will handle such errors just fine, and that is the main reason why I would want ECC. Especially true in satellites. The RAM in a satellite, or other spacecraft must be radiation hardened to be usuable at all. And yes, it is no joke but the truth what you say. For me the dividing line is when lots of people depend on a box 24/7 it must be ECC. A storage server in someones basement doesn't necessarily fit into this category. It depends also on what kind of data is stored on the server. One of the really nasty problems that can occur with random bit-flips in non-ECC RAM is that important data can get silently corrupted. You can get an error in your database or spreadsheet or payroll data or whatever without noticing until it is too late (by which time all your backups will probably have this wrong data too.) Depending on the data this can be VERY bad, even if it is a system that is only used occasionally by a few people. Memory errors which cause the computer to crash can be quite disruptive, but they are at least easily noticed, and can then be handled. You can also get defective memory modules, but such can usually be detected by running memtest86 or similar. ECC can usually handle memory modules that have some bits more or less permanently wrong, but such modules should be replaced as soon as possible. I agree - I was kind of harping on the defective idea. If it's defective the manufacturer owes me a replacement, as in yesterday. Yes, and in the (luckily fairly uncommon) case that one of the chips on a memory module suddenly decides to stop working, then ECC can serve the same purpose as RAID does for disks - it allows the system to keep going until you have time to replace the broken part. (Which should be done ASAP since if you get random bit-flips in addition to a broken chip, ECC will not be able to correct those bits.) -- Insert your favourite quote here. Erik Trulsson [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]