Re: [sqlite] Sometimes it really is a hardware problem....

2005-03-11 Thread Scott Baker
If you're looking for a cool "test-suite" I highly recommend the 
Ultimate Boot CD. It includes approxmiately 8.7 million tools (not 
really, but it is a lot), one of which is MemTest86.

http://www.ultimatebootcd.com/
It's solved ALL kinds of hardware issues. I highly recommend it.
Jay wrote:
--- "D. Richard Hipp" <[EMAIL PROTECTED]> wrote:
I find it utterly amazing that a machine with bad memory could
run a full-blown Linux desktop and a copy of Win2K running in
VMWare for days on end without showing a problem, then suddenly
begin having trouble with the SQLite regression suite.  Yet that
is what appears to have happened.

I had the same sort of thing happen. The machine just would not
compile the linux source. Luckily it had different errors each time
which is what tripped me to look for a hardware problem.
http://www.memtest86.com/
Has a nifty tester with an ISO image. You can make a bootable cd
to test your machine. It makes a great addition to your test tools
suite.

		
__ 
Do you Yahoo!? 
Yahoo! Mail - now with 250MB free storage. Learn more. 
http://info.mail.yahoo.com/mail_250


--
Scott Baker
Canby Telephone - Network Administrator - RHCE
Ph: 503.266.8253


Re: [sqlite] Sometimes it really is a hardware problem....

2005-03-11 Thread Jay

--- "D. Richard Hipp" <[EMAIL PROTECTED]> wrote:
> I find it utterly amazing that a machine with bad memory could
> run a full-blown Linux desktop and a copy of Win2K running in
> VMWare for days on end without showing a problem, then suddenly
> begin having trouble with the SQLite regression suite.  Yet that
> is what appears to have happened.

I had the same sort of thing happen. The machine just would not
compile the linux source. Luckily it had different errors each time
which is what tripped me to look for a hardware problem.

http://www.memtest86.com/

Has a nifty tester with an ISO image. You can make a bootable cd
to test your machine. It makes a great addition to your test tools
suite.




__ 
Do you Yahoo!? 
Yahoo! Mail - now with 250MB free storage. Learn more. 
http://info.mail.yahoo.com/mail_250


Re: [sqlite] Sometimes it really is a hardware problem....

2005-03-11 Thread Joel Lucsy
On Fri, 11 Mar 2005 13:48:07 -0500, D. Richard Hipp <[EMAIL PROTECTED]> wrote:
> some errors popped up.  On a 512MB SIMM, less than 10 memory cells
> where showing a problem, and then only if a specific bit pattern
> was written into adjacent cells.  The error was always in the
> 0x08 bit.  I removed the offending SIMM, rebooted and all tests
> passed.

Was the magic number of cells 8? I'm wondering if you had a bad "chip"
that somehow passed QA, but wasn't in a critical section of memory to
corrupt the system.

-- 
Joel Lucsy
"The dinosaurs became extinct because they didn't have a space
program." -- Larry Niven


[sqlite] Sometimes it really is a hardware problem....

2005-03-11 Thread D. Richard Hipp
I've been struggling for days to get version 3.1.4 out.  Every
time I would run the regression test I would get failures.  The
failures would not always be at the same place, but I would always
get one or two.

I frequently got failures in the memory-db tests where we create
a large in-memory database, make lots of changes, roll those
changes back, then verify that the database holds exactly the same
information as it did before the transaction.  In a database of
about a megabyte in size, I would sometimes see a single bit
difference after the rollback.  The bit that changed would always
be the 0x08 bit.  But the location of the change within the
database was seemingly random.

I was talking with Dan about this yesterday - he was unable to
reproduce the problem.  So I said "Maybe it's hardware?"
"Not likely", Dan replied.  And rightly so.  No programmer ever
wants to admit that a nasty problem might be lurking in their
own code.  It is always easier to blame something else - some
library you are linking against, the operating system, the
hardware you are running on.  But at the end of the day, the
problem usually does end up being in your own code and not
elsewhere.  So after you have been programming for a while
(decades in my case) you begin to be very suspicious when
people go blaming malfunctions on the parts they didn't write.

But last night, I was at wits end trying to track down the problem
in SQLite.  I figured it can't hurt to test the memory, so I
rebooted using the SuSE install disk which happens to have a
nifty memory checker built in.  About 10 minutes into the test,
some errors popped up.  On a 512MB SIMM, less than 10 memory cells
where showing a problem, and then only if a specific bit pattern
was written into adjacent cells.  The error was always in the
0x08 bit.  I removed the offending SIMM, rebooted and all tests
passed.

I find it utterly amazing that a machine with bad memory could
run a full-blown Linux desktop and a copy of Win2K running in
VMWare for days on end without showing a problem, then suddenly
begin having trouble with the SQLite regression suite.  Yet that
is what appears to have happened.

Now it is still always the best policy to blame your own code
first.  When something isn't working right, the person sitting
behind the keyboard is the most likely cause.  Sometimes you
will run into problems with the library you are using, or with
your compiler, or your OS, but those cases are rare.  Hardware
is seldom an issue.  But as this case shows, sometime, very
rarely, it really can be the hardware's fault.

-- 
D. Richard Hipp <[EMAIL PROTECTED]>