Re: [sqlite] Sometimes it really is a hardware problem....
If you're looking for a cool "test-suite" I highly recommend the Ultimate Boot CD. It includes approxmiately 8.7 million tools (not really, but it is a lot), one of which is MemTest86. http://www.ultimatebootcd.com/ It's solved ALL kinds of hardware issues. I highly recommend it. Jay wrote: --- "D. Richard Hipp" <[EMAIL PROTECTED]> wrote: I find it utterly amazing that a machine with bad memory could run a full-blown Linux desktop and a copy of Win2K running in VMWare for days on end without showing a problem, then suddenly begin having trouble with the SQLite regression suite. Yet that is what appears to have happened. I had the same sort of thing happen. The machine just would not compile the linux source. Luckily it had different errors each time which is what tripped me to look for a hardware problem. http://www.memtest86.com/ Has a nifty tester with an ISO image. You can make a bootable cd to test your machine. It makes a great addition to your test tools suite. __ Do you Yahoo!? Yahoo! Mail - now with 250MB free storage. Learn more. http://info.mail.yahoo.com/mail_250 -- Scott Baker Canby Telephone - Network Administrator - RHCE Ph: 503.266.8253
Re: [sqlite] Sometimes it really is a hardware problem....
--- "D. Richard Hipp" <[EMAIL PROTECTED]> wrote: > I find it utterly amazing that a machine with bad memory could > run a full-blown Linux desktop and a copy of Win2K running in > VMWare for days on end without showing a problem, then suddenly > begin having trouble with the SQLite regression suite. Yet that > is what appears to have happened. I had the same sort of thing happen. The machine just would not compile the linux source. Luckily it had different errors each time which is what tripped me to look for a hardware problem. http://www.memtest86.com/ Has a nifty tester with an ISO image. You can make a bootable cd to test your machine. It makes a great addition to your test tools suite. __ Do you Yahoo!? Yahoo! Mail - now with 250MB free storage. Learn more. http://info.mail.yahoo.com/mail_250
Re: [sqlite] Sometimes it really is a hardware problem....
On Fri, 11 Mar 2005 13:48:07 -0500, D. Richard Hipp <[EMAIL PROTECTED]> wrote: > some errors popped up. On a 512MB SIMM, less than 10 memory cells > where showing a problem, and then only if a specific bit pattern > was written into adjacent cells. The error was always in the > 0x08 bit. I removed the offending SIMM, rebooted and all tests > passed. Was the magic number of cells 8? I'm wondering if you had a bad "chip" that somehow passed QA, but wasn't in a critical section of memory to corrupt the system. -- Joel Lucsy "The dinosaurs became extinct because they didn't have a space program." -- Larry Niven
[sqlite] Sometimes it really is a hardware problem....
I've been struggling for days to get version 3.1.4 out. Every time I would run the regression test I would get failures. The failures would not always be at the same place, but I would always get one or two. I frequently got failures in the memory-db tests where we create a large in-memory database, make lots of changes, roll those changes back, then verify that the database holds exactly the same information as it did before the transaction. In a database of about a megabyte in size, I would sometimes see a single bit difference after the rollback. The bit that changed would always be the 0x08 bit. But the location of the change within the database was seemingly random. I was talking with Dan about this yesterday - he was unable to reproduce the problem. So I said "Maybe it's hardware?" "Not likely", Dan replied. And rightly so. No programmer ever wants to admit that a nasty problem might be lurking in their own code. It is always easier to blame something else - some library you are linking against, the operating system, the hardware you are running on. But at the end of the day, the problem usually does end up being in your own code and not elsewhere. So after you have been programming for a while (decades in my case) you begin to be very suspicious when people go blaming malfunctions on the parts they didn't write. But last night, I was at wits end trying to track down the problem in SQLite. I figured it can't hurt to test the memory, so I rebooted using the SuSE install disk which happens to have a nifty memory checker built in. About 10 minutes into the test, some errors popped up. On a 512MB SIMM, less than 10 memory cells where showing a problem, and then only if a specific bit pattern was written into adjacent cells. The error was always in the 0x08 bit. I removed the offending SIMM, rebooted and all tests passed. I find it utterly amazing that a machine with bad memory could run a full-blown Linux desktop and a copy of Win2K running in VMWare for days on end without showing a problem, then suddenly begin having trouble with the SQLite regression suite. Yet that is what appears to have happened. Now it is still always the best policy to blame your own code first. When something isn't working right, the person sitting behind the keyboard is the most likely cause. Sometimes you will run into problems with the library you are using, or with your compiler, or your OS, but those cases are rare. Hardware is seldom an issue. But as this case shows, sometime, very rarely, it really can be the hardware's fault. -- D. Richard Hipp <[EMAIL PROTECTED]>