One topic that's been talked about off-and-on in academic circles is
provably correct computing hardware. Development of such a technology is
made to order for open discussion and peer review. If open hardware design
grows and expands, it would be a worthwhile field to pioneer. It's not an
especially close match for the Traversal team's special expertise, though.
The obvious application area for provably correct computer hardware
would be in safety-critical products, such as boiler safety interlocks,
aircraft flight controls, and voting machines. In these applications,
provably complete real-time fault coverage is just as necessary as provably
correct design. In a burner safety control, for instance, UL 372 requires
that any fault must be detected within 4 seconds, and the device must either
operate safely or fail into a safe condition (shut off the fuel).
NASA, not having such a device available, equipped the Shuttle with
5 flight control computers of 3 different designs, programmed by different
software teams using different tool chains. On a burner safety interlock I
designed, I used two of a particular microcontroller the UL engineers had
very long experience with, gave each a veto over the fuel valve, and got the
programmers to have each processor send the other streams of arithmetic and
logic problems to continually test for internal faults. I would have
preferred to use two different processor architectures, and the UL engineers
would have too, but as I say, the one we used had a very long record in
safety products.
I haven't done any real work on provably correct hardware, or even
made a literature search, but the reliability work I have done suggests a
few approaches.
Complexity is the enemy of both reliability and provability. When
provability is the vital requirement, performance can be sacrificed by
reducing the hardware to a bare minimum platform to run firmware and
microcode. That minimizes the number of logic paths that must be analyzed
and proven correct. It also minimizes the run time for a field fault test,
and thus the maximum time the system is capable of running with undetected
damage. Field failures in microcode memory are simple to detect; just build
parity and CRC checks into hardware, and run these tests continuously.
When hardware is added, it should be to add redundancy and fault
detection. Dual redundancy with challenge-and-response tests is good for
situations where a safe failure state exists. Where guaranteed safe
operation after a fault is required, triple redundancy or failover to backup
hardware are sometimes options. The challenge is to design the system so it
doesn't end up with some residual piece of fault-detection hardware that
isn't tested by something else.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)