"Bob Smith" <[EMAIL PROTECTED]> wrote:

> This may be off the beaten track, but..
> Has anyone used tomsrtbt to make a disk (or set of same) that can do system
> hardware checks? And does anyone know where I can find Linux diagnostic
> programs I could fold into tomsrtbt? ( I've hunted around on the 'net to no
> avail for weeks)

This is a perpetual area of interest to me.

The best thing I have found is "The Linux Stress Test" which I have
outlined on my web site.  Feel free to take a look at it.  The gist
of it is to do repeated kernel compiles as a means of exercising
the hardware.

Unfortunately, this does not fit on a floppy.  The stress test
exercises the CPU, caches, memory, bus, disk, disk controller,
power supply.  If a long series of kernel compiles are successful,
then chances are good that the above hardware items are good.
Other items, such as keyboard, video card, network cards, etc.
are not stressed at all.

My latest work with the stress test was to make up my own variant
of the Slackware 4.0 "zipslack" distribution.  This resulted in a
50 MB zip file that can be copied to a Windows or DOS partition
(FAT16 or FAT32) and unzip'd.  This uses the UMSDOS file system
and, so, does not require repartitioning the target machine.
I burned a couple of CDs to send to some friends who were having
some hardware troubles.  I made the CD bootable (Slackware, not
Tom's) and put my version of zipslack on it, plus the Slackware
installation directories.  The user does not need to boot to the
CD but can just copy zipfcs.zip via Windows to C:\ or D:\ etc
and then unzip it.  One friend had very good success using it
on multiple machines to verify they were good or bad, and to
fix the bad machines and then verify they had become good.  The
other friend/customer won't open the box, so all he could do
was more or less verify the machine was bad (but, I suspect,
with his doubts as to the validity of the stress test).

I was rushed in getting the CD set up.  Next time, I hope to put
tomsrtbt on it as well, and probably have the CD bootable with
tomsrtbt instead of Slackware.  I also intend to trim down
zipfcs.zip a lot.  I used the Linux 2.2.x kernel but will probably
switch to the 2.0.x kernel to save space.

I am not a fan of the UMSDOS file system, but it is unrealistic
to expect some of my target users to repartition a hard drive.
One big problem with the UMSDOS file system is its tremendous
bloat.  Unzip'ing zipfcs.zip requires a _lot_ more disk space
than the equivalent on an ext2 file system would require.  This
problem won't be quite as bad once I trim zipfcs.zip drastically.

I also don't like the UMSDOS file system because I think it is
slower.  I want more of the kernel compile time spent on accessing
RAM (and bus, CPU, etc.) rather than accessing the disk.  Still,
the stress test run from UMSDOS does seem to work ok, other than
the extreme disk space requirements.

One thought I've had would be leave the /usr/src/linux on the CD
and just link to it from the UMSDOS Linux, or perhaps even from
tomsrtbt running in RAM.  This would eliminate the space requirements
on the hard drive and make the stress test very easy to run -- as
long as the PC had a CD.  The problem here is that the compile must
write *.o files and I think it has to write them under /usr/src/linux.
If this is correct, then it would be difficult to run /usr/src/linux
from the CD as it would be read-only.  I haven't look into this yet,
to see whether there might be a way to specify a different location
for the *.o files while still keeping the source on the CD.  If
anyone knows or has any suggestions, please let me know.

If I could boot to tomsrtbt (either from CD or from floppy) and
have the *.o written to RAM but read the source from the CD, that
would be nearly ideal for my stress testing purposes.  It might
mean the stress test couldn't be run on machines with less than
x MB of RAM.  Right now, I can't run it on machines with less than
some large amount of hard disk space.

Even better, of course, would be an equivalent stress test that could
run entirely from a floppy.  It seems that compiling the kernel makes
such a good stress test because it (pseudo) randomly exercises the
memory and does so intensively.  It stands to reason a very small
program could do the same thing.  The ordinary memory test programs
do not seem to do this reliably.

Since I've come this far, I might as well say a few more words
about the problem.  One of my friends/customers said "I guess
I'll take the machine back to the computer store to see what they
think."  I say to hell with what the computer store thinks.  If
their opinion was worth anything, the computer would be working
reliably.  We don't want their opinion now; we want an objective
test or measurement so we can say YES or NO as to whether it is a
hardware problem.  If YES, then we change the hardware and rerun
the stress test until the problem is fixed.  So, there is some
use for just the first YES or NO, but the real benefit of the
stress test comes from part substitution, bus/CPU clock speed
changes, etc., and then retesting.  Of course, I am talking about
intermittent problems, which are the hardest to troubleshoot.
Always-bad hardware isn't the real problem when troubleshooting.

"My computer locked up," says my customer.  "So what!  Fix your
hardware!" I want to say.  It isn't our software causing the
problem, but how do I prove this to a customer?  "I took the machine
back to the computer store and they say the computer checks out ok,"
says my customer.

I want to be able to mail them a CD (or, better, deliver the CD at
the same time we deliver or install our software) and have them
run the stress test.  Of course, I would prefer the stress test
fit on a floppy or be downloadable easily.  The current 50 MB size
of zipfcs.zip is too big for this.  Even if I trim it way down,
I guess I can't get much under 10 MB, considering the size of the
kernel source.

If I were selling and installing the hardware (which I am not),
I would just create a small ext2 partition and install enough
Linux to run the stress test and set up Lilo to boot their
Windows by default but allow choosing Linux.  Then, I would
run the stress test before delivering the machine.  Later, if
there was any question about the hardware, I could have the
user run the stress test easily.

Well, there you have it.  I hope the ideas help and/or that you
will have ideas or suggestions for me.  This is not a burning
problem for me, just an ongoing interest.


  -- Frank
  http://www.eskimo.com/~pygmy

Reply via email to