Re: Wow! Is memory ever cheap!

2001-05-10 Thread H. Peter Anvin

Followup to:  <[EMAIL PROTECTED]>
By author:Edgar Toernig <[EMAIL PROTECTED]>
In newsgroup: linux.dev.kernel
> 
> I think you have a wrong idea why the ECC is there.  ECC deals with
> the inherit shortcommings of DRAM.
> 
> DRAMs are not perfect.  They have a probability to lose a bit.
> Normally this probability is low enough to live with it.  Lets say
> you have a system with 1MByte and let's say the probability for a
> single bit error is around 1 error in 100 years.  Good enough.
> Now put 1GByte in the system. You'll get a probability of 10 errors
> per year.  Maybe good enough for a Windows box but not acceptable
> for your server.  So you put in ECC to bring this probability back
> into reasonable numbers.  ECC can correct the single bit errors.
> You only have to deal with double bit errors.  Chance for them is
> much much lower.
> 

Yes, ECC, unlike parity, is a technique for reducing the error rate,
with the side benefit of intercepting an error when it happens.

I am not disagreeing with Larry that integrity checks are a Good
Thing[TM], and in general are a hallmark of good engineering.
However, they are not a replacement for ECC for the purpose of driving
the failure rate down into an acceptable probability range.

It is of course a very nice thing that DRAM prices have come down into
the range where buying them in gigabyte quantities are reasonable :)

-hpa
-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-10 Thread H. Peter Anvin

Followup to:  [EMAIL PROTECTED]
By author:Edgar Toernig [EMAIL PROTECTED]
In newsgroup: linux.dev.kernel
 
 I think you have a wrong idea why the ECC is there.  ECC deals with
 the inherit shortcommings of DRAM.
 
 DRAMs are not perfect.  They have a probability to lose a bit.
 Normally this probability is low enough to live with it.  Lets say
 you have a system with 1MByte and let's say the probability for a
 single bit error is around 1 error in 100 years.  Good enough.
 Now put 1GByte in the system. You'll get a probability of 10 errors
 per year.  Maybe good enough for a Windows box but not acceptable
 for your server.  So you put in ECC to bring this probability back
 into reasonable numbers.  ECC can correct the single bit errors.
 You only have to deal with double bit errors.  Chance for them is
 much much lower.
 

Yes, ECC, unlike parity, is a technique for reducing the error rate,
with the side benefit of intercepting an error when it happens.

I am not disagreeing with Larry that integrity checks are a Good
Thing[TM], and in general are a hallmark of good engineering.
However, they are not a replacement for ECC for the purpose of driving
the failure rate down into an acceptable probability range.

It is of course a very nice thing that DRAM prices have come down into
the range where buying them in gigabyte quantities are reasonable :)

-hpa
-- 
[EMAIL PROTECTED] at work, [EMAIL PROTECTED] in private!
Unix gives you enough rope to shoot yourself in the foot.
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-09 Thread Edgar Toernig

Larry McVoy wrote:
> 
> Let's review:  ECC is nice, but it doesn't solve all data corruption
> problems.  Applications which do their own end to end data integrity
> checks will catch many more error cases than what ECC catches.

I think you have a wrong idea why the ECC is there.  ECC deals with
the inherit shortcommings of DRAM.

DRAMs are not perfect.  They have a probability to lose a bit.
Normally this probability is low enough to live with it.  Lets say
you have a system with 1MByte and let's say the probability for a
single bit error is around 1 error in 100 years.  Good enough.
Now put 1GByte in the system. You'll get a probability of 10 errors
per year.  Maybe good enough for a Windows box but not acceptable
for your server.  So you put in ECC to bring this probability back
into reasonable numbers.  ECC can correct the single bit errors.
You only have to deal with double bit errors.  Chance for them is
much much lower.

Sure, it doesn't solve all data corruption problems - only simple
errors in DRAMs.  But it makes systems with huge amount of RAM staying
up alive much longer.  And btw, your integrity checks over data will
not protect against a corrupted kernel or application...

Ciao, ET.

PS: Just let your app run long enough.  I'm sure it will detect a
checksum error some day ;-)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-09 Thread Matthew Jacob



On Wed, 9 May 2001, [ISO-8859-1] Gérard Roudier wrote:

> 
> 
> On Tue, 8 May 2001, Dan Hollis wrote:
> 
> > On Tue, 8 May 2001, Larry McVoy wrote:
> > > which is a text version of the paper I mentioned before.  The basic
> > > message of the paper is that it really doesn't help much to have things
> > > like ECC unless you can be sure that 100% of the rest of your system
> > > has similar checks.
> > 
> > UDMA has crc, scsi has parity, pci has (i think) parity, tcpip has crc,
> > your cpu l1 and l2 have ecc...
> 
> SCSI Ultra-160 has CRC.
> 
> PCI has parity (btw, you think right), but only a few drivers make sure
> PCI parity checking is enabled. On the other hand, a PCI parity error

Sun's panic if they get SERR or PERR.

> should be considered as extremally serious and the system should be
> stopped when such happens.
> 
> Btw, it seems (read at the pci list) that the original PCI hadn't parity.
> After all, PCI had been designed for PC machines... :)
> 
> > Looks like similar checks are already there.
> > 
> > -Dan
> > 
>   Gérard.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-09 Thread Gérard Roudier



On Tue, 8 May 2001, Dan Hollis wrote:

> On Tue, 8 May 2001, Larry McVoy wrote:
> > which is a text version of the paper I mentioned before.  The basic
> > message of the paper is that it really doesn't help much to have things
> > like ECC unless you can be sure that 100% of the rest of your system
> > has similar checks.
> 
> UDMA has crc, scsi has parity, pci has (i think) parity, tcpip has crc,
> your cpu l1 and l2 have ecc...

SCSI Ultra-160 has CRC.

PCI has parity (btw, you think right), but only a few drivers make sure
PCI parity checking is enabled. On the other hand, a PCI parity error
should be considered as extremally serious and the system should be
stopped when such happens.

Btw, it seems (read at the pci list) that the original PCI hadn't parity.
After all, PCI had been designed for PC machines... :)

> Looks like similar checks are already there.
> 
> -Dan
> 
  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-09 Thread Malcolm Beattie

Larry McVoy writes:
> On Wed, May 09, 2001 at 12:24:25AM -0400, Marty Leisner wrote:
> > My understanding is suns big machines stopped using ecc and they
> 
> The SUN problem was a cache problem and there is no way that I believe
> that SUN would turn of ECC in the cache.  There are good reasons for
> not doing so.  If you think through the end to end argument, you will
> see that you have no way to do checks on the data path into/out of the
> processor.  If that part of the datapath is not checked then no amount
> of checking elsewhere does any good, the processor can be corrupting
> your data and never know it.  If SUN was so stupid as to remove this,
> then it is a dramatically different place.  I heard that there was a
> bug in the cache controller, I never heard that they had removed ECC.

There are issues with error detection/correction/recovery with
different designs of L1 and L2 caches. There's a good paper:

IBM S/390 storage hierarchy - G5 and G6 performance considerations
IBM Journal of Research and Development
Vol 43 No. 5/6
available at
http://www.research.ibm.com/journal/rd/435/jackson.html

which covers IBM's choice of L1 and L2 design for S/390. The section on
"S/390 reliability and performance implications" is relevant here. In
particular, they use a solution which isn't best from the performance
point of view but ensures you don't discover "too late" about an error.
I heard a rumour (now I get to the unsubstantiated part :-) that Sun
chose a higher-performing design for their cache subsystem but which has
a nastier failure mode in the case of cache errors.

--Malcolm

-- 
Malcolm Beattie <[EMAIL PROTECTED]>
Unix Systems Programmer
Oxford University Computing Services
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-09 Thread Malcolm Beattie

Larry McVoy writes:
 On Wed, May 09, 2001 at 12:24:25AM -0400, Marty Leisner wrote:
  My understanding is suns big machines stopped using ecc and they
 
 The SUN problem was a cache problem and there is no way that I believe
 that SUN would turn of ECC in the cache.  There are good reasons for
 not doing so.  If you think through the end to end argument, you will
 see that you have no way to do checks on the data path into/out of the
 processor.  If that part of the datapath is not checked then no amount
 of checking elsewhere does any good, the processor can be corrupting
 your data and never know it.  If SUN was so stupid as to remove this,
 then it is a dramatically different place.  I heard that there was a
 bug in the cache controller, I never heard that they had removed ECC.

There are issues with error detection/correction/recovery with
different designs of L1 and L2 caches. There's a good paper:

IBM S/390 storage hierarchy - G5 and G6 performance considerations
IBM Journal of Research and Development
Vol 43 No. 5/6
available at
http://www.research.ibm.com/journal/rd/435/jackson.html

which covers IBM's choice of L1 and L2 design for S/390. The section on
S/390 reliability and performance implications is relevant here. In
particular, they use a solution which isn't best from the performance
point of view but ensures you don't discover too late about an error.
I heard a rumour (now I get to the unsubstantiated part :-) that Sun
chose a higher-performing design for their cache subsystem but which has
a nastier failure mode in the case of cache errors.

--Malcolm

-- 
Malcolm Beattie [EMAIL PROTECTED]
Unix Systems Programmer
Oxford University Computing Services
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-09 Thread Gérard Roudier



On Tue, 8 May 2001, Dan Hollis wrote:

 On Tue, 8 May 2001, Larry McVoy wrote:
  which is a text version of the paper I mentioned before.  The basic
  message of the paper is that it really doesn't help much to have things
  like ECC unless you can be sure that 100% of the rest of your system
  has similar checks.
 
 UDMA has crc, scsi has parity, pci has (i think) parity, tcpip has crc,
 your cpu l1 and l2 have ecc...

SCSI Ultra-160 has CRC.

PCI has parity (btw, you think right), but only a few drivers make sure
PCI parity checking is enabled. On the other hand, a PCI parity error
should be considered as extremally serious and the system should be
stopped when such happens.

Btw, it seems (read at the pci list) that the original PCI hadn't parity.
After all, PCI had been designed for PC machines... :)

 Looks like similar checks are already there.
 
 -Dan
 
  Gérard.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-09 Thread Matthew Jacob



On Wed, 9 May 2001, [ISO-8859-1] Gérard Roudier wrote:

 
 
 On Tue, 8 May 2001, Dan Hollis wrote:
 
  On Tue, 8 May 2001, Larry McVoy wrote:
   which is a text version of the paper I mentioned before.  The basic
   message of the paper is that it really doesn't help much to have things
   like ECC unless you can be sure that 100% of the rest of your system
   has similar checks.
  
  UDMA has crc, scsi has parity, pci has (i think) parity, tcpip has crc,
  your cpu l1 and l2 have ecc...
 
 SCSI Ultra-160 has CRC.
 
 PCI has parity (btw, you think right), but only a few drivers make sure
 PCI parity checking is enabled. On the other hand, a PCI parity error

Sun's panic if they get SERR or PERR.

 should be considered as extremally serious and the system should be
 stopped when such happens.
 
 Btw, it seems (read at the pci list) that the original PCI hadn't parity.
 After all, PCI had been designed for PC machines... :)
 
  Looks like similar checks are already there.
  
  -Dan
  
   Gérard.
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-09 Thread Edgar Toernig

Larry McVoy wrote:
 
 Let's review:  ECC is nice, but it doesn't solve all data corruption
 problems.  Applications which do their own end to end data integrity
 checks will catch many more error cases than what ECC catches.

I think you have a wrong idea why the ECC is there.  ECC deals with
the inherit shortcommings of DRAM.

DRAMs are not perfect.  They have a probability to lose a bit.
Normally this probability is low enough to live with it.  Lets say
you have a system with 1MByte and let's say the probability for a
single bit error is around 1 error in 100 years.  Good enough.
Now put 1GByte in the system. You'll get a probability of 10 errors
per year.  Maybe good enough for a Windows box but not acceptable
for your server.  So you put in ECC to bring this probability back
into reasonable numbers.  ECC can correct the single bit errors.
You only have to deal with double bit errors.  Chance for them is
much much lower.

Sure, it doesn't solve all data corruption problems - only simple
errors in DRAMs.  But it makes systems with huge amount of RAM staying
up alive much longer.  And btw, your integrity checks over data will
not protect against a corrupted kernel or application...

Ciao, ET.

PS: Just let your app run long enough.  I'm sure it will detect a
checksum error some day ;-)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-08 Thread John Alvord

On Tue, 8 May 2001 22:22:10 -0700, Larry McVoy <[EMAIL PROTECTED]>
wrote:
>
>Just to make sure you understand:  I think ECC is a fine thing.  If I'm
>running systems with no other integrity checks, I'll take ECC and like it.
>However, having ECC does not mean that I trust that my data is safe,
>that is most certainly not a true statement.  The bus, the disks, the
>disk controller, the disk driver, the buffer cache, etc, can all corrupt
>the data.   Oh, yeah, let's not forget NFS.  I have seen each and every
>one of those things corrupt data.

This is an interesting observation of a truth that was well known in
the second generation computers of the 1950s and 1960s. I first worked
at John Hancock... they had a bunch of 7074 machines. All those
systems made use of programmed checksums in each tape block and in
each full file. The reason was that those machines did not have ECC...
they did have parity checking if I remember right. With IBM's third
generation computers (S/360s) and probably other manufacturers, ECC
became a standard feature. Parity checking was added through different
data paths such as channel memory, buffer memory, etc. There was so
much protection added that the programmed checksums became
superfluous.

There were still odd moments. I remember working on an Amdahl computer
problem where some internal data paths... where the contents of one
register moved to an internal storage area... and the path did not
have parity. There was a machine fault... the path was electrically
open, so the contents of the register always became zero. But since it
wasn't parity checked, there was no machine check. I remember another
problem on the IBM 3033. Cosmic rays (really) caused one bit errors in
channel memory. That was parity but not ECC so you got a weird channel
check. Back at the diagnosis ranch, the board looked good. It was only
when someone noticed that the rate of such problems was proportional
to the height above sea level that the light bulb went on.

The lesson is that when paths are not checked, hardware or software,
data being held or transformed can change. Old lesson but a good one
to know.

john alvord
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-08 Thread Dan Hollis

On Tue, 8 May 2001, Larry McVoy wrote:
> which is a text version of the paper I mentioned before.  The basic
> message of the paper is that it really doesn't help much to have things
> like ECC unless you can be sure that 100% of the rest of your system
> has similar checks.

UDMA has crc, scsi has parity, pci has (i think) parity, tcpip has crc,
your cpu l1 and l2 have ecc...

Looks like similar checks are already there.

-Dan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-08 Thread Larry McVoy

On Wed, May 09, 2001 at 12:24:25AM -0400, Marty Leisner wrote:
> I'm confused by the "lets not use ECC and use bk" talk.

I'll take a pass at unconfusing you, I can see how you might be.  I wish
I had never mentioned BK, that was never the point.  End to end was the
point, BK was just an example and now I'm getting accused of bringing
up the whole thread as a BK advertisement.  Which completely misses
the point.  Please go read

http://www.google.com/search?q=cache:web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf+clark+end+to+end=en

which is a text version of the paper I mentioned before.  The basic
message of the paper is that it really doesn't help much to have things
like ECC unless you can be sure that 100% of the rest of your system
has similar checks.

The point was made again, but apparently missed here, when I pointed
out that Linux's disk subsystem passes up bad data when it knows there
may be a problem.  ECC will not help you in this case, the data was bad
before it hit memory.  So now you have carefully error corrected BAD DATA.
See the point?  ECC doesn't help unless every other component is equally
careful; those components include software and hardware.  You can fix
that chunk of software and then I'll go find a rogue disk controller
that breaks the datapath, there are plenty to choose from.

Just to make sure you understand:  I think ECC is a fine thing.  If I'm
running systems with no other integrity checks, I'll take ECC and like it.
However, having ECC does not mean that I trust that my data is safe,
that is most certainly not a true statement.  The bus, the disks, the
disk controller, the disk driver, the buffer cache, etc, can all corrupt
the data.   Oh, yeah, let's not forget NFS.  I have seen each and every
one of those things corrupt data.

As to the BitKeeper stuff, those of you who think this is a BitKeeper
discussion are off wacking in the weeds.  The point isn't that BitKeeper
is good because it has integrity checks, the point is that integrity
checks are a good thing.  Period.   BitKeeper was just an example.
If there was a Linux filesystem that had built in integrity checks (and
I knew about it, for all I know there is one), then I would have used
that as the example.  I used BitKeeper as an example because I know it
and I can point to numerous cases where it exposed problems that ECC
would not have caught.  Ask Dave Miller about the mmap/read sparc linux
cache aliasing bug that BK exposed, that one was nasty.

Let's review:  ECC is nice, but it doesn't solve all data corruption
problems.  Applications which do their own end to end data integrity
checks will catch many more error cases than what ECC catches.  My efforts
in this thread had nothing to do with BitKeeper, they were trying to
get people to realize that end to end is good, and ECC isn't end to end.

Examples of end to end applications, which I should have thought of
sooner, are the md5sums on ftp.kernel.org, the integrity checks in rpms,
crcs in cpio.  I'm sure you can think of lots of others, this is an
old problem.

> My understanding is suns big machines stopped using ecc and they

The SUN problem was a cache problem and there is no way that I believe
that SUN would turn of ECC in the cache.  There are good reasons for
not doing so.  If you think through the end to end argument, you will
see that you have no way to do checks on the data path into/out of the
processor.  If that part of the datapath is not checked then no amount
of checking elsewhere does any good, the processor can be corrupting
your data and never know it.  If SUN was so stupid as to remove this,
then it is a dramatically different place.  I heard that there was a
bug in the cache controller, I never heard that they had removed ECC.
If you really want to know I can ask, I know at least one of the guys
who works on that stuff there.
-- 
---
Larry McVoy  lm at bitmover.com   http://www.bitmover.com/lm 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-08 Thread Marty Leisner


I'm confused by the "lets not use ECC and use bk" talk.

My understanding is suns big machines stopped using ecc and they
started to have "random" problems running big-iron applications
that took them a while to figure out (and a lot of bad press) and can
only be rectified in the big cycle (this was last year so its probably solved 
now).

I thought one of the primary reasons to have ecc is to catch
wierd things before they become catostrophic...and at least
know WHY weirdness is happening...


marty

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-08 Thread Marty Leisner


I'm confused by the lets not use ECC and use bk talk.

My understanding is suns big machines stopped using ecc and they
started to have random problems running big-iron applications
that took them a while to figure out (and a lot of bad press) and can
only be rectified in the big cycle (this was last year so its probably solved 
now).

I thought one of the primary reasons to have ecc is to catch
wierd things before they become catostrophic...and at least
know WHY weirdness is happening...


marty

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-08 Thread Larry McVoy

On Wed, May 09, 2001 at 12:24:25AM -0400, Marty Leisner wrote:
 I'm confused by the lets not use ECC and use bk talk.

I'll take a pass at unconfusing you, I can see how you might be.  I wish
I had never mentioned BK, that was never the point.  End to end was the
point, BK was just an example and now I'm getting accused of bringing
up the whole thread as a BK advertisement.  Which completely misses
the point.  Please go read

http://www.google.com/search?q=cache:web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf+clark+end+to+endhl=en

which is a text version of the paper I mentioned before.  The basic
message of the paper is that it really doesn't help much to have things
like ECC unless you can be sure that 100% of the rest of your system
has similar checks.

The point was made again, but apparently missed here, when I pointed
out that Linux's disk subsystem passes up bad data when it knows there
may be a problem.  ECC will not help you in this case, the data was bad
before it hit memory.  So now you have carefully error corrected BAD DATA.
See the point?  ECC doesn't help unless every other component is equally
careful; those components include software and hardware.  You can fix
that chunk of software and then I'll go find a rogue disk controller
that breaks the datapath, there are plenty to choose from.

Just to make sure you understand:  I think ECC is a fine thing.  If I'm
running systems with no other integrity checks, I'll take ECC and like it.
However, having ECC does not mean that I trust that my data is safe,
that is most certainly not a true statement.  The bus, the disks, the
disk controller, the disk driver, the buffer cache, etc, can all corrupt
the data.   Oh, yeah, let's not forget NFS.  I have seen each and every
one of those things corrupt data.

As to the BitKeeper stuff, those of you who think this is a BitKeeper
discussion are off wacking in the weeds.  The point isn't that BitKeeper
is good because it has integrity checks, the point is that integrity
checks are a good thing.  Period.   BitKeeper was just an example.
If there was a Linux filesystem that had built in integrity checks (and
I knew about it, for all I know there is one), then I would have used
that as the example.  I used BitKeeper as an example because I know it
and I can point to numerous cases where it exposed problems that ECC
would not have caught.  Ask Dave Miller about the mmap/read sparc linux
cache aliasing bug that BK exposed, that one was nasty.

Let's review:  ECC is nice, but it doesn't solve all data corruption
problems.  Applications which do their own end to end data integrity
checks will catch many more error cases than what ECC catches.  My efforts
in this thread had nothing to do with BitKeeper, they were trying to
get people to realize that end to end is good, and ECC isn't end to end.

Examples of end to end applications, which I should have thought of
sooner, are the md5sums on ftp.kernel.org, the integrity checks in rpms,
crcs in cpio.  I'm sure you can think of lots of others, this is an
old problem.

 My understanding is suns big machines stopped using ecc and they

The SUN problem was a cache problem and there is no way that I believe
that SUN would turn of ECC in the cache.  There are good reasons for
not doing so.  If you think through the end to end argument, you will
see that you have no way to do checks on the data path into/out of the
processor.  If that part of the datapath is not checked then no amount
of checking elsewhere does any good, the processor can be corrupting
your data and never know it.  If SUN was so stupid as to remove this,
then it is a dramatically different place.  I heard that there was a
bug in the cache controller, I never heard that they had removed ECC.
If you really want to know I can ask, I know at least one of the guys
who works on that stuff there.
-- 
---
Larry McVoy  lm at bitmover.com   http://www.bitmover.com/lm 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-08 Thread Dan Hollis

On Tue, 8 May 2001, Larry McVoy wrote:
 which is a text version of the paper I mentioned before.  The basic
 message of the paper is that it really doesn't help much to have things
 like ECC unless you can be sure that 100% of the rest of your system
 has similar checks.

UDMA has crc, scsi has parity, pci has (i think) parity, tcpip has crc,
your cpu l1 and l2 have ecc...

Looks like similar checks are already there.

-Dan

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-08 Thread John Alvord

On Tue, 8 May 2001 22:22:10 -0700, Larry McVoy [EMAIL PROTECTED]
wrote:

Just to make sure you understand:  I think ECC is a fine thing.  If I'm
running systems with no other integrity checks, I'll take ECC and like it.
However, having ECC does not mean that I trust that my data is safe,
that is most certainly not a true statement.  The bus, the disks, the
disk controller, the disk driver, the buffer cache, etc, can all corrupt
the data.   Oh, yeah, let's not forget NFS.  I have seen each and every
one of those things corrupt data.

This is an interesting observation of a truth that was well known in
the second generation computers of the 1950s and 1960s. I first worked
at John Hancock... they had a bunch of 7074 machines. All those
systems made use of programmed checksums in each tape block and in
each full file. The reason was that those machines did not have ECC...
they did have parity checking if I remember right. With IBM's third
generation computers (S/360s) and probably other manufacturers, ECC
became a standard feature. Parity checking was added through different
data paths such as channel memory, buffer memory, etc. There was so
much protection added that the programmed checksums became
superfluous.

There were still odd moments. I remember working on an Amdahl computer
problem where some internal data paths... where the contents of one
register moved to an internal storage area... and the path did not
have parity. There was a machine fault... the path was electrically
open, so the contents of the register always became zero. But since it
wasn't parity checked, there was no machine check. I remember another
problem on the IBM 3033. Cosmic rays (really) caused one bit errors in
channel memory. That was parity but not ECC so you got a weird channel
check. Back at the diagnosis ranch, the board looked good. It was only
when someone noticed that the rate of such problems was proportional
to the height above sea level that the light bulb went on.

The lesson is that when paths are not checked, hardware or software,
data being held or transformed can change. Old lesson but a good one
to know.

john alvord
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-07 Thread Ben Ford

H. Peter Anvin wrote:

>Larry McVoy wrote:
>
>>On Mon, May 07, 2001 at 12:33:57PM -0700, H. Peter Anvin wrote:
>>
>>>Larry McVoy wrote:
>>>
>Because your original post was "yeah, Bitkeeper is a memory hog but you
>can get really cheap non-ECC RAM so just stuff your system with crappy
>RAM and be happy."
>
>>>I wasn't the one who said it, you did.  I don't have any evidence either
>>>way.
>>>
>>Err, Peter, it's starting to sound like you have some ax to grind that I
>>don't know about.  So I'll bow out of this conversation.
>>
>
>The only axe I have to grind was the obvious application myopia of your
>original post... "my application is the only one that matters."  That's
>all.
>
>   -hpa
>


This is a 750Mhz K7 system with 1.5GB of memory in 3 512MB DIMMS.  The
DIMMS are not ECC, but we use BitKeeper here and it tells us when we
have bad DIMMS.

Guess what the memory cost?  $396.58 shipped to my door, second day air,
with a lifetime warranty.  I got it at www.memory4less.com 
 which I found
using www.pricewatch.com .  I have no association with 
either of those
places other than being a customer (i.e., this isn't advertising spam).

I'm burning it in right now, I wrote a little program which fills it
with different test patterns and then reads them back to make sure they
don't lose any bits.  Seems to be working, it's done about 30 passes.

1.5GB for $400.  Amazing.  No more whining from you guys that BitKeeper
uses too much memory  [:-)] 

$ hinv
Main memory size: 1535.9375 Mbytes
1 AuthenticAMD  processor
1 1.44M floppy drive
1 vga+ graphics device
1 keyboard
IDE devices:
/dev/hda is a ST310211A, 9541MB w/512kB Cache, CHS=1216/255/63
SCSI devices:
/dev/sda is a 3ware disk, model 3w- 74.541 GB
PCI bus devices:
Host bridge: VIA Technologies VT 82C691 Apollo Pro (rev 2).
PCI bridge: VIA Technologies VT 82C598 Apollo MVP3 AGP (rev 0).
ISA bridge: VIA Technologies VT 82C686 Apollo Super (rev 34).
IDE interface: VIA Technologies VT 82C586 Apollo IDE (rev 16).
Host bridge: VIA Technologies VT 82C686 Apollo Super ACPI (rev 48).
Ethernet controller: 3Com 3C905B 100bTX (rev 48).
Ethernet controller: 3Com 3C905B 100bTX (rev 48).
Ethernet controller: 3Com 3C905B 100bTX (rev 48).
Ethernet controller: 3Com 3C905B 100bTX (rev 48).
RAID storage controller: Unknown vendor Unknown device (rev 18).quote
VGA compatible controller: Matrox Matrox G200 AGP (rev 1).
-- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm



Lets move on now.

-- 
I'd rather listen to Newton than to Mundie [MS flunkie who made a speech on
the evil-ness of open source]. He may have been dead for almost three
hundred years, but despite that he stinks up the room less.

Linus





Re: Wow! Is memory ever cheap!

2001-05-07 Thread Rik van Riel

On Mon, 7 May 2001, Larry McVoy wrote:

> For the record, however, I never stated that BitKeeper is a
> memory hog, that's a conclusion you drew.

I read it that way in your message, but it's good to have
the situation clarified ;)

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-07 Thread H. Peter Anvin

Larry McVoy wrote:
> 
> On Mon, May 07, 2001 at 12:33:57PM -0700, H. Peter Anvin wrote:
> > Larry McVoy wrote:
> > > > Because your original post was "yeah, Bitkeeper is a memory hog but you
> > > > can get really cheap non-ECC RAM so just stuff your system with crappy
> > > > RAM and be happy."
> 
> > I wasn't the one who said it, you did.  I don't have any evidence either
> > way.
> 
> Err, Peter, it's starting to sound like you have some ax to grind that I
> don't know about.  So I'll bow out of this conversation.
> 

The only axe I have to grind was the obvious application myopia of your
original post... "my application is the only one that matters."  That's
all.

-hpa

-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-07 Thread Larry McVoy

On Mon, May 07, 2001 at 12:33:57PM -0700, H. Peter Anvin wrote:
> Larry McVoy wrote:
> > > Because your original post was "yeah, Bitkeeper is a memory hog but you
> > > can get really cheap non-ECC RAM so just stuff your system with crappy
> > > RAM and be happy."  

> I wasn't the one who said it, you did.  I don't have any evidence either
> way.

Err, Peter, it's starting to sound like you have some ax to grind that I
don't know about.  So I'll bow out of this conversation.

For the record, however, I never stated that BitKeeper is a memory hog,
that's a conclusion you drew.  Somehow, if I had said "look, for very
little money you can fit all of the kernel source, revision history,
and objects in memory", would you have translated that into "BitKeeper
is a memory hog"?  It seems that way.  

BitKeeper has nothing to do with it, it's all about how big the data
set is that the application is working on.  BitKeeper is better than
most applications, it mmaps the data and uses the page cache so that it
doesn't cache it twice.  Contrast that with most other apps, you'll see
they have their own internal cache of data when they should just use mmap.

Anyway, I think we've beaten this to death, so let's move on.
-- 
---
Larry McVoy  lm at bitmover.com   http://www.bitmover.com/lm 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-07 Thread H. Peter Anvin

Larry McVoy wrote:
> 
> On Mon, May 07, 2001 at 12:21:28PM -0700, H. Peter Anvin wrote:
> > Larry McVoy wrote:
> > > What does BitKeeper have to do with this conversation?
> >
> > Because your original post was "yeah, Bitkeeper is a memory hog but you
> > can get really cheap non-ECC RAM so just stuff your system with crappy
> > RAM and be happy."  Doing so dedicates my system to running a small set
> > of applications, which I am utterly uninterested in.
> 
> .. BitKeeper isn't a memory hog, the kernel is bloated.  Over 100MB of
>source last I checked.  BitKeeper is incredibly good at _NOT_ being
>a memory hog, it uses the page cache as its memory pool.  If things
>fit in the cache, they go fast, if they don't, they don't.  BitKeeper
>is just like diff in that respect.  If you think BitKeeper is a memory
>hog, then you must hate diff too.  How about netscape?  Don't run that
>either?  Give me a break.
> 

I wasn't the one who said it, you did.  I don't have any evidence either
way.

> .. It's great that you aren't interested in running that set of small
>applications, I'm sure the entire kernel list is happy to learn that.

I believe the same is true for most people, with the major exceptions
being the embedded systems and server farm people.

-hpa

-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-07 Thread Larry McVoy

On Mon, May 07, 2001 at 12:21:28PM -0700, H. Peter Anvin wrote:
> Larry McVoy wrote:
> > What does BitKeeper have to do with this conversation?
> 
> Because your original post was "yeah, Bitkeeper is a memory hog but you
> can get really cheap non-ECC RAM so just stuff your system with crappy
> RAM and be happy."  Doing so dedicates my system to running a small set
> of applications, which I am utterly uninterested in.

.. BitKeeper isn't a memory hog, the kernel is bloated.  Over 100MB of
   source last I checked.  BitKeeper is incredibly good at _NOT_ being
   a memory hog, it uses the page cache as its memory pool.  If things
   fit in the cache, they go fast, if they don't, they don't.  BitKeeper
   is just like diff in that respect.  If you think BitKeeper is a memory
   hog, then you must hate diff too.  How about netscape?  Don't run that
   either?  Give me a break.

.. It's great that you aren't interested in running that set of small 
   applications, I'm sure the entire kernel list is happy to learn that.

.. You can get really cheap ECC ram as well, even if it were 2x as expensive,
   that's still really cheap, less than 50 cents a megabyte, so what's your
   problem?  Go get some ECC memory and be happy.
-- 
---
Larry McVoy  lm at bitmover.com   http://www.bitmover.com/lm 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-07 Thread H. Peter Anvin

Larry McVoy wrote:
> > >
> > > A) Fast has nothing to do with it, ECC runs at the same speed as non-ECC;
> >
> > "It" meaning BitKeeper.
> 
> What does BitKeeper have to do with this conversation?
> 
> s/BitKeeper/any_app_which_has_integrity_checks/
> 
> Whether that app runs fast or not has nothing to do with ECC/non-ECC, right?
> And while whether that app runs fast or not may have something to do with
> other apps that you run along side of it, that's true for all apps, right?
> So why the focus on BitKeeper?  Am I missing something?
> 

Because your original post was "yeah, Bitkeeper is a memory hog but you
can get really cheap non-ECC RAM so just stuff your system with crappy
RAM and be happy."  Doing so dedicates my system to running a small set
of applications, which I am utterly uninterested in.

-hpa

-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-07 Thread Larry McVoy

On Mon, May 07, 2001 at 12:01:50PM -0700, H. Peter Anvin wrote:
> Larry McVoy wrote:
> > > Isn't this pretty much saying "if you're willing to dedicate your
> > > system to running nothing but Bitkeeper, you can run it really fast?"
> > 
> > A) Fast has nothing to do with it, ECC runs at the same speed as non-ECC;
> 
> "It" meaning BitKeeper.

What does BitKeeper have to do with this conversation?  

s/BitKeeper/any_app_which_has_integrity_checks/

Whether that app runs fast or not has nothing to do with ECC/non-ECC, right?
And while whether that app runs fast or not may have something to do with
other apps that you run along side of it, that's true for all apps, right?
So why the focus on BitKeeper?  Am I missing something?

> This is not true in my experience.  YES, it will detect bad memory
> configurations, but with over 2^34 memory cells to worry about -- each of
> them carrying a charge of a few dozen electrons only -- you WILL have
> random failures, especially if the memory is allowed to remain stale for
> extended periods of time, as is very likely in a configuration like this
> (think disk cache.)

BitKeeper, at least, runs the integrity checks every time it accesses the
data.  So it doesn't matter if it is in the disk cache or not.  The same
could be true of any other application.

> Bad memory configurations is bad.  However, good memory configurations
> are not necessarily perfect.

No, they most certainly aren't.  You can have all the ECC you want and if 
the disk or the bus or some or the part of the path corrupts the data then
you are hosed.

Dave Clark made the point that you _MUST_ have end to end checks if you 
care about your data.  He would argue, correctly, in my opinion, that 
it doesn't matter if you have ECC, something else can screw you.

And in fact, while we were having this discussion I was running a disk
scrubber to see if I had bad disks or not:

[root@disks /u1]# df -m .
Filesystem   1M-blocks  Used Available Use% Mounted on
/dev/hdg217099 0 17099   0% /u1
[root@disks /u1]# ~lm/tmp/a.out 17000
011
BAD @ 1045602304 3e50b000:3e52a000
[root@disks /u1]# 

and from dmesg:

hdh: irq timeout: status=0x50 { DriveReady SeekComplete }
hdg: timeout waiting for DMA
ide_dmaproc: chipset supported ide_dma_timeout func only: 14

But my application was NEVER notified that the drive subsystem was hosed,
the operating system (this is 2.4.4, by the way, steaming hot bits), never
told the application that the data was probably bad.  And it isn't a memory
problem, I ran a memory scrubber and the system memory is just fine, it's
the disk subsystem that went out to lunch.  Without telling me, by the way.
If this happened inside of SUN the guy reponsible would be getting a new
orifice courtesy of systems group.  Not cool to pass the data up when it
is bad.

So explain to me how ECC is enough?  It's clearly not.  People have made
compelling arguments for end to end checks for at least 25 years, the
internet works largely because of these end to end checks (turn off 
checksums and find out if I'm right or not, you'll see), ECC isn't end to
end, so what's the point?  Yeah, it's better than nothing but to argue
that it even remotely approaches enough is just flat out wrong, and it
is is _inherently_ unable to be part of a solution which is correct, it's
simply one place that the data lands.  
-- 
---
Larry McVoy  lm at bitmover.com   http://www.bitmover.com/lm 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-07 Thread H. Peter Anvin

Larry McVoy wrote:
> > >
> > > On the other hand, if your apps don't have built in integrity checks then
> > > ECC is pretty much a requirement.
> >
> > Isn't this pretty much saying "if you're willing to dedicate your
> > system to running nothing but Bitkeeper, you can run it really fast?"
> 
> A) Fast has nothing to do with it, ECC runs at the same speed as non-ECC;

"It" meaning BitKeeper.

> B) As I said above, "if your apps don't have built in integrity checks then
>ECC is pretty much a requirement";
> C) As I said above, we use our systems for BK development, so this choice
>makes sense for us.
> 
> I think the point you are really missing is that it is not an either/or
> choice.  All you really need in practice is one application which is
> both heavily used and has integrity checks.  It could be BitKeeper or
> something else, all that matters is that it will detect memory problems.

This is not true in my experience.  YES, it will detect bad memory
configurations, but with over 2^34 memory cells to worry about -- each of
them carrying a charge of a few dozen electrons only -- you WILL have
random failures, especially if the memory is allowed to remain stale for
extended periods of time, as is very likely in a configuration like this
(think disk cache.)

Bad memory configurations is bad.  However, good memory configurations
are not necessarily perfect.

-hpa

-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-07 Thread Larry McVoy

On Mon, May 07, 2001 at 11:47:34AM -0700, H. Peter Anvin wrote:
> Followup to:  <[EMAIL PROTECTED]>
> By author:Larry McVoy <[EMAIL PROTECTED]>
> In newsgroup: linux.dev.kernel
> >
> > On Sun, May 06, 2001 at 02:20:43PM +1200, Chris Wedgwood wrote:
> > > 1.5GB without ECC? Seems like a disater waiting to happen? Is ECC
> > > memory much more expensive?
> > 
> > Almost twice as expensive for 512MB dimms.
> > 
> > I used to be a die hard ECC fan but that changed since what we do here is
> > BitKeeper and BitKeeper checksums everything.  It tells us right away when
> > we have problems (to date it has found bad memory dimms, NFS corruption,
> > and a SPARC/Linux cache aliasing bug).  So I've given up in ECC, we don't
> > need it.
> > 
> > On the other hand, if your apps don't have built in integrity checks then
> > ECC is pretty much a requirement.
> 
> Isn't this pretty much saying "if you're willing to dedicate your
> system to running nothing but Bitkeeper, you can run it really fast?"

A) Fast has nothing to do with it, ECC runs at the same speed as non-ECC;
B) As I said above, "if your apps don't have built in integrity checks then
   ECC is pretty much a requirement";
C) As I said above, we use our systems for BK development, so this choice
   makes sense for us.

I think the point you are really missing is that it is not an either/or
choice.  All you really need in practice is one application which is
both heavily used and has integrity checks.  It could be BitKeeper or
something else, all that matters is that it will detect memory problems.

That application will flush out your memory problems.  Yeah, you could
get burned before that app finds them and if you are worried about that,
then run ECC.  I think it's an interesting data point, however, that I
care deeply about data integrity and I've transitioned from insisting 
on ECC to not caring.  If my choice wasn't working for me, I would 
still be using ECC.  In other words, I'm a lot closer to your way of
thinking than you might expect.
-- 
---
Larry McVoy  lm at bitmover.com   http://www.bitmover.com/lm 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-07 Thread H. Peter Anvin

Followup to:  <[EMAIL PROTECTED]>
By author:Larry McVoy <[EMAIL PROTECTED]>
In newsgroup: linux.dev.kernel
>
> On Sun, May 06, 2001 at 02:20:43PM +1200, Chris Wedgwood wrote:
> > 1.5GB without ECC? Seems like a disater waiting to happen? Is ECC
> > memory much more expensive?
> 
> Almost twice as expensive for 512MB dimms.
> 
> I used to be a die hard ECC fan but that changed since what we do here is
> BitKeeper and BitKeeper checksums everything.  It tells us right away when
> we have problems (to date it has found bad memory dimms, NFS corruption,
> and a SPARC/Linux cache aliasing bug).  So I've given up in ECC, we don't
> need it.
> 
> On the other hand, if your apps don't have built in integrity checks then
> ECC is pretty much a requirement.
> 

Isn't this pretty much saying "if you're willing to dedicate your
system to running nothing but Bitkeeper, you can run it really fast?"

-hpa
-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-07 Thread H. Peter Anvin

Followup to:  [EMAIL PROTECTED]
By author:Larry McVoy [EMAIL PROTECTED]
In newsgroup: linux.dev.kernel

 On Sun, May 06, 2001 at 02:20:43PM +1200, Chris Wedgwood wrote:
  1.5GB without ECC? Seems like a disater waiting to happen? Is ECC
  memory much more expensive?
 
 Almost twice as expensive for 512MB dimms.
 
 I used to be a die hard ECC fan but that changed since what we do here is
 BitKeeper and BitKeeper checksums everything.  It tells us right away when
 we have problems (to date it has found bad memory dimms, NFS corruption,
 and a SPARC/Linux cache aliasing bug).  So I've given up in ECC, we don't
 need it.
 
 On the other hand, if your apps don't have built in integrity checks then
 ECC is pretty much a requirement.
 

Isn't this pretty much saying if you're willing to dedicate your
system to running nothing but Bitkeeper, you can run it really fast?

-hpa
-- 
[EMAIL PROTECTED] at work, [EMAIL PROTECTED] in private!
Unix gives you enough rope to shoot yourself in the foot.
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-07 Thread Larry McVoy

On Mon, May 07, 2001 at 11:47:34AM -0700, H. Peter Anvin wrote:
 Followup to:  [EMAIL PROTECTED]
 By author:Larry McVoy [EMAIL PROTECTED]
 In newsgroup: linux.dev.kernel
 
  On Sun, May 06, 2001 at 02:20:43PM +1200, Chris Wedgwood wrote:
   1.5GB without ECC? Seems like a disater waiting to happen? Is ECC
   memory much more expensive?
  
  Almost twice as expensive for 512MB dimms.
  
  I used to be a die hard ECC fan but that changed since what we do here is
  BitKeeper and BitKeeper checksums everything.  It tells us right away when
  we have problems (to date it has found bad memory dimms, NFS corruption,
  and a SPARC/Linux cache aliasing bug).  So I've given up in ECC, we don't
  need it.
  
  On the other hand, if your apps don't have built in integrity checks then
  ECC is pretty much a requirement.
 
 Isn't this pretty much saying if you're willing to dedicate your
 system to running nothing but Bitkeeper, you can run it really fast?

A) Fast has nothing to do with it, ECC runs at the same speed as non-ECC;
B) As I said above, if your apps don't have built in integrity checks then
   ECC is pretty much a requirement;
C) As I said above, we use our systems for BK development, so this choice
   makes sense for us.

I think the point you are really missing is that it is not an either/or
choice.  All you really need in practice is one application which is
both heavily used and has integrity checks.  It could be BitKeeper or
something else, all that matters is that it will detect memory problems.

That application will flush out your memory problems.  Yeah, you could
get burned before that app finds them and if you are worried about that,
then run ECC.  I think it's an interesting data point, however, that I
care deeply about data integrity and I've transitioned from insisting 
on ECC to not caring.  If my choice wasn't working for me, I would 
still be using ECC.  In other words, I'm a lot closer to your way of
thinking than you might expect.
-- 
---
Larry McVoy  lm at bitmover.com   http://www.bitmover.com/lm 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-07 Thread H. Peter Anvin

Larry McVoy wrote:
  
   On the other hand, if your apps don't have built in integrity checks then
   ECC is pretty much a requirement.
 
  Isn't this pretty much saying if you're willing to dedicate your
  system to running nothing but Bitkeeper, you can run it really fast?
 
 A) Fast has nothing to do with it, ECC runs at the same speed as non-ECC;

It meaning BitKeeper.

 B) As I said above, if your apps don't have built in integrity checks then
ECC is pretty much a requirement;
 C) As I said above, we use our systems for BK development, so this choice
makes sense for us.
 
 I think the point you are really missing is that it is not an either/or
 choice.  All you really need in practice is one application which is
 both heavily used and has integrity checks.  It could be BitKeeper or
 something else, all that matters is that it will detect memory problems.

This is not true in my experience.  YES, it will detect bad memory
configurations, but with over 2^34 memory cells to worry about -- each of
them carrying a charge of a few dozen electrons only -- you WILL have
random failures, especially if the memory is allowed to remain stale for
extended periods of time, as is very likely in a configuration like this
(think disk cache.)

Bad memory configurations is bad.  However, good memory configurations
are not necessarily perfect.

-hpa

-- 
[EMAIL PROTECTED] at work, [EMAIL PROTECTED] in private!
Unix gives you enough rope to shoot yourself in the foot.
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-07 Thread Larry McVoy

On Mon, May 07, 2001 at 12:01:50PM -0700, H. Peter Anvin wrote:
 Larry McVoy wrote:
   Isn't this pretty much saying if you're willing to dedicate your
   system to running nothing but Bitkeeper, you can run it really fast?
  
  A) Fast has nothing to do with it, ECC runs at the same speed as non-ECC;
 
 It meaning BitKeeper.

What does BitKeeper have to do with this conversation?  

s/BitKeeper/any_app_which_has_integrity_checks/

Whether that app runs fast or not has nothing to do with ECC/non-ECC, right?
And while whether that app runs fast or not may have something to do with
other apps that you run along side of it, that's true for all apps, right?
So why the focus on BitKeeper?  Am I missing something?

 This is not true in my experience.  YES, it will detect bad memory
 configurations, but with over 2^34 memory cells to worry about -- each of
 them carrying a charge of a few dozen electrons only -- you WILL have
 random failures, especially if the memory is allowed to remain stale for
 extended periods of time, as is very likely in a configuration like this
 (think disk cache.)

BitKeeper, at least, runs the integrity checks every time it accesses the
data.  So it doesn't matter if it is in the disk cache or not.  The same
could be true of any other application.

 Bad memory configurations is bad.  However, good memory configurations
 are not necessarily perfect.

No, they most certainly aren't.  You can have all the ECC you want and if 
the disk or the bus or some or the part of the path corrupts the data then
you are hosed.

Dave Clark made the point that you _MUST_ have end to end checks if you 
care about your data.  He would argue, correctly, in my opinion, that 
it doesn't matter if you have ECC, something else can screw you.

And in fact, while we were having this discussion I was running a disk
scrubber to see if I had bad disks or not:

[root@disks /u1]# df -m .
Filesystem   1M-blocks  Used Available Use% Mounted on
/dev/hdg217099 0 17099   0% /u1
[root@disks /u1]# ~lm/tmp/a.out 17000
011
BAD @ 1045602304 3e50b000:3e52a000
[root@disks /u1]# 

and from dmesg:

hdh: irq timeout: status=0x50 { DriveReady SeekComplete }
hdg: timeout waiting for DMA
ide_dmaproc: chipset supported ide_dma_timeout func only: 14

But my application was NEVER notified that the drive subsystem was hosed,
the operating system (this is 2.4.4, by the way, steaming hot bits), never
told the application that the data was probably bad.  And it isn't a memory
problem, I ran a memory scrubber and the system memory is just fine, it's
the disk subsystem that went out to lunch.  Without telling me, by the way.
If this happened inside of SUN the guy reponsible would be getting a new
orifice courtesy of systems group.  Not cool to pass the data up when it
is bad.

So explain to me how ECC is enough?  It's clearly not.  People have made
compelling arguments for end to end checks for at least 25 years, the
internet works largely because of these end to end checks (turn off 
checksums and find out if I'm right or not, you'll see), ECC isn't end to
end, so what's the point?  Yeah, it's better than nothing but to argue
that it even remotely approaches enough is just flat out wrong, and it
is is _inherently_ unable to be part of a solution which is correct, it's
simply one place that the data lands.  
-- 
---
Larry McVoy  lm at bitmover.com   http://www.bitmover.com/lm 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-07 Thread H. Peter Anvin

Larry McVoy wrote:
  
   A) Fast has nothing to do with it, ECC runs at the same speed as non-ECC;
 
  It meaning BitKeeper.
 
 What does BitKeeper have to do with this conversation?
 
 s/BitKeeper/any_app_which_has_integrity_checks/
 
 Whether that app runs fast or not has nothing to do with ECC/non-ECC, right?
 And while whether that app runs fast or not may have something to do with
 other apps that you run along side of it, that's true for all apps, right?
 So why the focus on BitKeeper?  Am I missing something?
 

Because your original post was yeah, Bitkeeper is a memory hog but you
can get really cheap non-ECC RAM so just stuff your system with crappy
RAM and be happy.  Doing so dedicates my system to running a small set
of applications, which I am utterly uninterested in.

-hpa

-- 
[EMAIL PROTECTED] at work, [EMAIL PROTECTED] in private!
Unix gives you enough rope to shoot yourself in the foot.
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-07 Thread Larry McVoy

On Mon, May 07, 2001 at 12:21:28PM -0700, H. Peter Anvin wrote:
 Larry McVoy wrote:
  What does BitKeeper have to do with this conversation?
 
 Because your original post was yeah, Bitkeeper is a memory hog but you
 can get really cheap non-ECC RAM so just stuff your system with crappy
 RAM and be happy.  Doing so dedicates my system to running a small set
 of applications, which I am utterly uninterested in.

.. BitKeeper isn't a memory hog, the kernel is bloated.  Over 100MB of
   source last I checked.  BitKeeper is incredibly good at _NOT_ being
   a memory hog, it uses the page cache as its memory pool.  If things
   fit in the cache, they go fast, if they don't, they don't.  BitKeeper
   is just like diff in that respect.  If you think BitKeeper is a memory
   hog, then you must hate diff too.  How about netscape?  Don't run that
   either?  Give me a break.

.. It's great that you aren't interested in running that set of small 
   applications, I'm sure the entire kernel list is happy to learn that.

.. You can get really cheap ECC ram as well, even if it were 2x as expensive,
   that's still really cheap, less than 50 cents a megabyte, so what's your
   problem?  Go get some ECC memory and be happy.
-- 
---
Larry McVoy  lm at bitmover.com   http://www.bitmover.com/lm 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-07 Thread H. Peter Anvin

Larry McVoy wrote:
 
 On Mon, May 07, 2001 at 12:21:28PM -0700, H. Peter Anvin wrote:
  Larry McVoy wrote:
   What does BitKeeper have to do with this conversation?
 
  Because your original post was yeah, Bitkeeper is a memory hog but you
  can get really cheap non-ECC RAM so just stuff your system with crappy
  RAM and be happy.  Doing so dedicates my system to running a small set
  of applications, which I am utterly uninterested in.
 
 .. BitKeeper isn't a memory hog, the kernel is bloated.  Over 100MB of
source last I checked.  BitKeeper is incredibly good at _NOT_ being
a memory hog, it uses the page cache as its memory pool.  If things
fit in the cache, they go fast, if they don't, they don't.  BitKeeper
is just like diff in that respect.  If you think BitKeeper is a memory
hog, then you must hate diff too.  How about netscape?  Don't run that
either?  Give me a break.
 

I wasn't the one who said it, you did.  I don't have any evidence either
way.

 .. It's great that you aren't interested in running that set of small
applications, I'm sure the entire kernel list is happy to learn that.

I believe the same is true for most people, with the major exceptions
being the embedded systems and server farm people.

-hpa

-- 
[EMAIL PROTECTED] at work, [EMAIL PROTECTED] in private!
Unix gives you enough rope to shoot yourself in the foot.
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-07 Thread Larry McVoy

On Mon, May 07, 2001 at 12:33:57PM -0700, H. Peter Anvin wrote:
 Larry McVoy wrote:
   Because your original post was yeah, Bitkeeper is a memory hog but you
   can get really cheap non-ECC RAM so just stuff your system with crappy
   RAM and be happy.  

 I wasn't the one who said it, you did.  I don't have any evidence either
 way.

Err, Peter, it's starting to sound like you have some ax to grind that I
don't know about.  So I'll bow out of this conversation.

For the record, however, I never stated that BitKeeper is a memory hog,
that's a conclusion you drew.  Somehow, if I had said look, for very
little money you can fit all of the kernel source, revision history,
and objects in memory, would you have translated that into BitKeeper
is a memory hog?  It seems that way.  

BitKeeper has nothing to do with it, it's all about how big the data
set is that the application is working on.  BitKeeper is better than
most applications, it mmaps the data and uses the page cache so that it
doesn't cache it twice.  Contrast that with most other apps, you'll see
they have their own internal cache of data when they should just use mmap.

Anyway, I think we've beaten this to death, so let's move on.
-- 
---
Larry McVoy  lm at bitmover.com   http://www.bitmover.com/lm 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-07 Thread H. Peter Anvin

Larry McVoy wrote:
 
 On Mon, May 07, 2001 at 12:33:57PM -0700, H. Peter Anvin wrote:
  Larry McVoy wrote:
Because your original post was yeah, Bitkeeper is a memory hog but you
can get really cheap non-ECC RAM so just stuff your system with crappy
RAM and be happy.
 
  I wasn't the one who said it, you did.  I don't have any evidence either
  way.
 
 Err, Peter, it's starting to sound like you have some ax to grind that I
 don't know about.  So I'll bow out of this conversation.
 

The only axe I have to grind was the obvious application myopia of your
original post... my application is the only one that matters.  That's
all.

-hpa

-- 
[EMAIL PROTECTED] at work, [EMAIL PROTECTED] in private!
Unix gives you enough rope to shoot yourself in the foot.
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-07 Thread Rik van Riel

On Mon, 7 May 2001, Larry McVoy wrote:

 For the record, however, I never stated that BitKeeper is a
 memory hog, that's a conclusion you drew.

I read it that way in your message, but it's good to have
the situation clarified ;)

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-07 Thread Ben Ford

H. Peter Anvin wrote:

Larry McVoy wrote:

On Mon, May 07, 2001 at 12:33:57PM -0700, H. Peter Anvin wrote:

Larry McVoy wrote:

Because your original post was yeah, Bitkeeper is a memory hog but you
can get really cheap non-ECC RAM so just stuff your system with crappy
RAM and be happy.

I wasn't the one who said it, you did.  I don't have any evidence either
way.

Err, Peter, it's starting to sound like you have some ax to grind that I
don't know about.  So I'll bow out of this conversation.


The only axe I have to grind was the obvious application myopia of your
original post... my application is the only one that matters.  That's
all.

   -hpa

quote

This is a 750Mhz K7 system with 1.5GB of memory in 3 512MB DIMMS.  The
DIMMS are not ECC, but we use BitKeeper here and it tells us when we
have bad DIMMS.

Guess what the memory cost?  $396.58 shipped to my door, second day air,
with a lifetime warranty.  I got it at www.memory4less.com 
http://www.memory4less.com which I found
using www.pricewatch.com http://www.pricewatch.com.  I have no association with 
either of those
places other than being a customer (i.e., this isn't advertising spam).

I'm burning it in right now, I wrote a little program which fills it
with different test patterns and then reads them back to make sure they
don't lose any bits.  Seems to be working, it's done about 30 passes.

1.5GB for $400.  Amazing.  No more whining from you guys that BitKeeper
uses too much memory  [:-)] 

$ hinv
Main memory size: 1535.9375 Mbytes
1 AuthenticAMD  processor
1 1.44M floppy drive
1 vga+ graphics device
1 keyboard
IDE devices:
/dev/hda is a ST310211A, 9541MB w/512kB Cache, CHS=1216/255/63
SCSI devices:
/dev/sda is a 3ware disk, model 3w- 74.541 GB
PCI bus devices:
Host bridge: VIA Technologies VT 82C691 Apollo Pro (rev 2).
PCI bridge: VIA Technologies VT 82C598 Apollo MVP3 AGP (rev 0).
ISA bridge: VIA Technologies VT 82C686 Apollo Super (rev 34).
IDE interface: VIA Technologies VT 82C586 Apollo IDE (rev 16).
Host bridge: VIA Technologies VT 82C686 Apollo Super ACPI (rev 48).
Ethernet controller: 3Com 3C905B 100bTX (rev 48).
Ethernet controller: 3Com 3C905B 100bTX (rev 48).
Ethernet controller: 3Com 3C905B 100bTX (rev 48).
Ethernet controller: 3Com 3C905B 100bTX (rev 48).
RAID storage controller: Unknown vendor Unknown device (rev 18).quote
VGA compatible controller: Matrox Matrox G200 AGP (rev 1).
-- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

/quote

Lets move on now.

-- 
I'd rather listen to Newton than to Mundie [MS flunkie who made a speech on
the evil-ness of open source]. He may have been dead for almost three
hundred years, but despite that he stinks up the room less.

Linus





Re: Wow! Is memory ever cheap!

2001-05-05 Thread Larry McVoy

On Sun, May 06, 2001 at 02:20:43PM +1200, Chris Wedgwood wrote:
> 1.5GB without ECC? Seems like a disater waiting to happen? Is ECC
> memory much more expensive?

Almost twice as expensive for 512MB dimms.

I used to be a die hard ECC fan but that changed since what we do here is
BitKeeper and BitKeeper checksums everything.  It tells us right away when
we have problems (to date it has found bad memory dimms, NFS corruption,
and a SPARC/Linux cache aliasing bug).  So I've given up in ECC, we don't
need it.

On the other hand, if your apps don't have built in integrity checks then
ECC is pretty much a requirement.

By the way, the integrity checks don't need to be complicated, BK uses
a horrible 16 bit ignore the overflow checksum to be compat with SCCS
and it seems to have caught everything that much more sophisticated and
CPU intensive checksums have caught.  In other words, anything is much
much better than nothing.
-- 
---
Larry McVoy  lm at bitmover.com   http://www.bitmover.com/lm 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-05 Thread Matthew Jacob


I'll frickin' whine if I want to :-). I still use bitkeeper on a Solaris 2.6
machine with 32MB of memory.


On Sat, 5 May 2001, Larry McVoy wrote:

> This is a 750Mhz K7 system with 1.5GB of memory in 3 512MB DIMMS.  The
> DIMMS are not ECC, but we use BitKeeper here and it tells us when we
> have bad DIMMS.
> 
> Guess what the memory cost?  $396.58 shipped to my door, second day air,
> with a lifetime warranty.  I got it at www.memory4less.com which I found
> using www.pricewatch.com.  I have no association with either of those
> places other than being a customer (i.e., this isn't advertising spam).
> 
> I'm burning it in right now, I wrote a little program which fills it
> with different test patterns and then reads them back to make sure they
> don't lose any bits.  Seems to be working, it's done about 30 passes.
> 
> 1.5GB for $400.  Amazing.  No more whining from you guys that BitKeeper
> uses too much memory :-)
> 
> $ hinv
> Main memory size: 1535.9375 Mbytes
> 1 AuthenticAMD  processor
> 1 1.44M floppy drive
> 1 vga+ graphics device
> 1 keyboard
> IDE devices:
> /dev/hda is a ST310211A, 9541MB w/512kB Cache, CHS=1216/255/63
> SCSI devices:
> /dev/sda is a 3ware disk, model 3w- 74.541 GB
> PCI bus devices:
> Host bridge: VIA Technologies VT 82C691 Apollo Pro (rev 2).
> PCI bridge: VIA Technologies VT 82C598 Apollo MVP3 AGP (rev 0).
> ISA bridge: VIA Technologies VT 82C686 Apollo Super (rev 34).
> IDE interface: VIA Technologies VT 82C586 Apollo IDE (rev 16).
> Host bridge: VIA Technologies VT 82C686 Apollo Super ACPI (rev 48).
> Ethernet controller: 3Com 3C905B 100bTX (rev 48).
> Ethernet controller: 3Com 3C905B 100bTX (rev 48).
> Ethernet controller: 3Com 3C905B 100bTX (rev 48).
> Ethernet controller: 3Com 3C905B 100bTX (rev 48).
> RAID storage controller: Unknown vendor Unknown device (rev 18).
> VGA compatible controller: Matrox Matrox G200 AGP (rev 1).
> -- 
> ---
> Larry McVoylm at bitmover.com   
>http://www.bitmover.com/lm 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Wow! Is memory ever cheap!

2001-05-05 Thread Larry McVoy

This is a 750Mhz K7 system with 1.5GB of memory in 3 512MB DIMMS.  The
DIMMS are not ECC, but we use BitKeeper here and it tells us when we
have bad DIMMS.

Guess what the memory cost?  $396.58 shipped to my door, second day air,
with a lifetime warranty.  I got it at www.memory4less.com which I found
using www.pricewatch.com.  I have no association with either of those
places other than being a customer (i.e., this isn't advertising spam).

I'm burning it in right now, I wrote a little program which fills it
with different test patterns and then reads them back to make sure they
don't lose any bits.  Seems to be working, it's done about 30 passes.

1.5GB for $400.  Amazing.  No more whining from you guys that BitKeeper
uses too much memory :-)

$ hinv
Main memory size: 1535.9375 Mbytes
1 AuthenticAMD  processor
1 1.44M floppy drive
1 vga+ graphics device
1 keyboard
IDE devices:
/dev/hda is a ST310211A, 9541MB w/512kB Cache, CHS=1216/255/63
SCSI devices:
/dev/sda is a 3ware disk, model 3w- 74.541 GB
PCI bus devices:
Host bridge: VIA Technologies VT 82C691 Apollo Pro (rev 2).
PCI bridge: VIA Technologies VT 82C598 Apollo MVP3 AGP (rev 0).
ISA bridge: VIA Technologies VT 82C686 Apollo Super (rev 34).
IDE interface: VIA Technologies VT 82C586 Apollo IDE (rev 16).
Host bridge: VIA Technologies VT 82C686 Apollo Super ACPI (rev 48).
Ethernet controller: 3Com 3C905B 100bTX (rev 48).
Ethernet controller: 3Com 3C905B 100bTX (rev 48).
Ethernet controller: 3Com 3C905B 100bTX (rev 48).
Ethernet controller: 3Com 3C905B 100bTX (rev 48).
RAID storage controller: Unknown vendor Unknown device (rev 18).
VGA compatible controller: Matrox Matrox G200 AGP (rev 1).
-- 
---
Larry McVoy  lm at bitmover.com   http://www.bitmover.com/lm 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Wow! Is memory ever cheap!

2001-05-05 Thread Larry McVoy

This is a 750Mhz K7 system with 1.5GB of memory in 3 512MB DIMMS.  The
DIMMS are not ECC, but we use BitKeeper here and it tells us when we
have bad DIMMS.

Guess what the memory cost?  $396.58 shipped to my door, second day air,
with a lifetime warranty.  I got it at www.memory4less.com which I found
using www.pricewatch.com.  I have no association with either of those
places other than being a customer (i.e., this isn't advertising spam).

I'm burning it in right now, I wrote a little program which fills it
with different test patterns and then reads them back to make sure they
don't lose any bits.  Seems to be working, it's done about 30 passes.

1.5GB for $400.  Amazing.  No more whining from you guys that BitKeeper
uses too much memory :-)

$ hinv
Main memory size: 1535.9375 Mbytes
1 AuthenticAMD  processor
1 1.44M floppy drive
1 vga+ graphics device
1 keyboard
IDE devices:
/dev/hda is a ST310211A, 9541MB w/512kB Cache, CHS=1216/255/63
SCSI devices:
/dev/sda is a 3ware disk, model 3w- 74.541 GB
PCI bus devices:
Host bridge: VIA Technologies VT 82C691 Apollo Pro (rev 2).
PCI bridge: VIA Technologies VT 82C598 Apollo MVP3 AGP (rev 0).
ISA bridge: VIA Technologies VT 82C686 Apollo Super (rev 34).
IDE interface: VIA Technologies VT 82C586 Apollo IDE (rev 16).
Host bridge: VIA Technologies VT 82C686 Apollo Super ACPI (rev 48).
Ethernet controller: 3Com 3C905B 100bTX (rev 48).
Ethernet controller: 3Com 3C905B 100bTX (rev 48).
Ethernet controller: 3Com 3C905B 100bTX (rev 48).
Ethernet controller: 3Com 3C905B 100bTX (rev 48).
RAID storage controller: Unknown vendor Unknown device (rev 18).
VGA compatible controller: Matrox Matrox G200 AGP (rev 1).
-- 
---
Larry McVoy  lm at bitmover.com   http://www.bitmover.com/lm 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-05 Thread Matthew Jacob


I'll frickin' whine if I want to :-). I still use bitkeeper on a Solaris 2.6
machine with 32MB of memory.


On Sat, 5 May 2001, Larry McVoy wrote:

 This is a 750Mhz K7 system with 1.5GB of memory in 3 512MB DIMMS.  The
 DIMMS are not ECC, but we use BitKeeper here and it tells us when we
 have bad DIMMS.
 
 Guess what the memory cost?  $396.58 shipped to my door, second day air,
 with a lifetime warranty.  I got it at www.memory4less.com which I found
 using www.pricewatch.com.  I have no association with either of those
 places other than being a customer (i.e., this isn't advertising spam).
 
 I'm burning it in right now, I wrote a little program which fills it
 with different test patterns and then reads them back to make sure they
 don't lose any bits.  Seems to be working, it's done about 30 passes.
 
 1.5GB for $400.  Amazing.  No more whining from you guys that BitKeeper
 uses too much memory :-)
 
 $ hinv
 Main memory size: 1535.9375 Mbytes
 1 AuthenticAMD  processor
 1 1.44M floppy drive
 1 vga+ graphics device
 1 keyboard
 IDE devices:
 /dev/hda is a ST310211A, 9541MB w/512kB Cache, CHS=1216/255/63
 SCSI devices:
 /dev/sda is a 3ware disk, model 3w- 74.541 GB
 PCI bus devices:
 Host bridge: VIA Technologies VT 82C691 Apollo Pro (rev 2).
 PCI bridge: VIA Technologies VT 82C598 Apollo MVP3 AGP (rev 0).
 ISA bridge: VIA Technologies VT 82C686 Apollo Super (rev 34).
 IDE interface: VIA Technologies VT 82C586 Apollo IDE (rev 16).
 Host bridge: VIA Technologies VT 82C686 Apollo Super ACPI (rev 48).
 Ethernet controller: 3Com 3C905B 100bTX (rev 48).
 Ethernet controller: 3Com 3C905B 100bTX (rev 48).
 Ethernet controller: 3Com 3C905B 100bTX (rev 48).
 Ethernet controller: 3Com 3C905B 100bTX (rev 48).
 RAID storage controller: Unknown vendor Unknown device (rev 18).
 VGA compatible controller: Matrox Matrox G200 AGP (rev 1).
 -- 
 ---
 Larry McVoylm at bitmover.com   
http://www.bitmover.com/lm 
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Wow! Is memory ever cheap!

2001-05-05 Thread Larry McVoy

On Sun, May 06, 2001 at 02:20:43PM +1200, Chris Wedgwood wrote:
 1.5GB without ECC? Seems like a disater waiting to happen? Is ECC
 memory much more expensive?

Almost twice as expensive for 512MB dimms.

I used to be a die hard ECC fan but that changed since what we do here is
BitKeeper and BitKeeper checksums everything.  It tells us right away when
we have problems (to date it has found bad memory dimms, NFS corruption,
and a SPARC/Linux cache aliasing bug).  So I've given up in ECC, we don't
need it.

On the other hand, if your apps don't have built in integrity checks then
ECC is pretty much a requirement.

By the way, the integrity checks don't need to be complicated, BK uses
a horrible 16 bit ignore the overflow checksum to be compat with SCCS
and it seems to have caught everything that much more sophisticated and
CPU intensive checksums have caught.  In other words, anything is much
much better than nothing.
-- 
---
Larry McVoy  lm at bitmover.com   http://www.bitmover.com/lm 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/