Over the past month I have learned [the hard way] that many PCs are not entirely reliable. Of course I knew this, but nothing like experience drives it home. Evidently the memory access patterns and frequency enables OS's and applications to work remarkably well with unreliable hardware.
I would suggest running memtest86 for at least several days, and at least several hundred iterations, before assuming that your problems are due to bugs in OpenSolaris. I have seen errors that pop up after of course a random time, but on the order or 3 or 4 days. I gave mine 10 days at 10% overclocking before I declared it good. If this seems intolerably time consuming, consider the time wasted chasing ghosts, or dealing with lost data. If your hardware is capable, preferably overclocked 5 or 10%, so that when you set it back to the spec'ed speed you can have confidence that you have timing margin. Or under clock it by 5 or 10% after testing at the spec'ed speed. 10% is not going to kill anybody, but being flaky is. When you adjust the clock rate in your test condition vs. your run condition, ideally it would be ALL clocks: CPU, FSB, PCI, Memory, etc. All need to have timing margin so as to not run on the hairy edge. If you cannot adjust your clock rates, perhaps you can in manual mode and run them at minimums for test (e.g. 2 cycles) and run normally at 3. The processor cache will mitigate the effect on performance. If you do all this, you will have a system that runs 5 or 10% slower but that you know you can depend on. Note that you may also find that the unreliability is not related to clock speed. In this case I don't have any advise, but you can at least be aware to look to your hardware, not to OpenSolaris. Also be aware that one can wear out sockets swapping and rearranging parts trying to isolate the root-cause. You get a small few with that nice tight crunch, then perhaps a dozen or two more before you are approaching the danger zone. I suggest that a note to this effect should be "sticky'ed" to the top of the "help" list with a link to memtest86. I tried the two mainline and a couple of off-shoot versions and in all cases all of them gave the same result. Maybe it should be part of the distribution CD, as the default bootable as a not-so-subtle hint! Thank you all for all of your patience and advise through this. I have tried to tag the appropriate threads where I was using flakey hardware. I specifically want to note that my threads in early December regarding ZFS were with what I now have proof IS RELIABLE hardware. Those were diagnosed correcting in that huge thread. --Ray -- This message posted from opensolaris.org