Re: Testing is dead, long live Open Proofs

Michael G Schwern Thu, 12 Feb 2009 14:10:23 -0800

Gabor Szabo wrote:
> Following a blog entry of David A. Wheeler I arrived to the
> web page of Open Proofs: http://www.openproofs.org/ he created.
> 
> In short, as I did not understand much, there are tools to mathematically
> prove that a certain program does something or does not do something.
> 
> Unfortunately most of these tools and most of their use is secret.
> 
> To quote the web site
> "Verified software is often highly classified, sensitive, and/or proprietary.
> In addition, research papers are too short and abstract to present
> meaningful examples."
> 
> The idea behind the web site is to provide a set of open source tools
> and examples
> of open source software with proofs, that is open proofs.
> 
> There is an example of a short C program with its proof
> http://www.openproofs.org/wiki/Queens_on_a_chessboard_with_Caduceus
> 
> All this seems to be interesting but I could hardly understand it.
> 
> Maybe someone here is also interested and could explain.


Explain the idea of proving code, or explain the proof itself?

The proof I can't explain.  But the idea is simple enough:  code, in the end,
is math.  If you can formulate a problem as a math equation (and I see Mr
Gödel in the back as something to say about that) you can write a proof that
it comes out to what you say it does.  This solves the grand conundrum of
testing, that testing can never prove a program is correct, it can only prove
that it isn't.

The trouble is, that A) not all problems are applicable to mathematical proofs
(the Halting Problem is one) B) it's not just practical given how software is
developed today.

A can be worked around.  Haskell works around it with monads, a fancy way of
saying "this part of the program is unprovable (usually because it has a side
effect) but trust me when I say the result will be type X".  The basic example
being I/O, the ultimate side-effect. [2]

B is worth picking apart.  The first program is with the languages, Perl being
the worst of the lot.  They're riddled with side-effects.  Even a simple
statement like $a = $b in Perl can't be treated as a simple mathematical
assignment.  What if they're tied or overloaded?  Anything can happen!  You
have to trace the history of $a and $b all through the program (and any
libraries) to make sure it is, in fact, a simple scalar.  If it's not, if at
any point it might have been assigned a tie or object, you have to do a
crap-ton more work to figure out what will happen when $b is assigned to $a.

This is the attraction of functional programming languages.  Being quite
literally math expressed as a programming language, they're very open to being
proved.  Primarily this is taken advantage of by things like automatic strong
typing [1] and massive amounts of compile-time optimization.  But it also
means that you can prove the entire program, or just a part of it, and that's
important.

Which brings us to problem number two: cost.  Most programming proofs are of
either very small problems or very well understood problems.  This is because
putting together a proof is very, very, very time consuming and then, of
course, it has to be checked.  In the real world, by the time you've come up
with the proof the code or requirements will probably have shifted out from
under you.  So most proven code is either for very small problems, which are
really math problems in disguise (like the Queens problem), or for very well
understood and important problems.

Maybe you have lots of money and huge problem to solve.  Let's say, for
example, that you're the NSA and you want to prove your new super secret
encryption software is correct.  Or you're Google and you want to prove that
the software you're about to install on your 2039982 servers isn't going to
start sending people to http://goat.se. [3]

One practical example might be memory allocation.  It's absolutely critical.
Everything uses it.  There are only so many strategies.  It has to be fast,
efficient and correct.  So if you're going to spend effort proving a piece of
code that would be a good place to do it.  Problem is, most memory allocators
are written in C, for performance.  C is a hairball of side effects and most
memory allocators are too large and change too frequently to lend themselves
to proofs.

Since everything is written in C, your compiler and your operating system,
this leaves all programmers, even functional ones, on a shifting foundation of
sand and kernel bugs.

Which brings us back to Haskell.  What if you wrote an operating system and
compiler, from the bottom up, in a functional programming language?  You can
prove it, piece by piece, until the whole thing is mathematically rock solid.
 What if you applied automatic proofing software, like already in use in
mathematics, to generate the proofs at a practical speed?  No more kernel
bugs.  Potentially no more bugs, period, just wrong requirements. :)  As a
side-effect, it opens the door to massive automatic optimization.

That's my amatuerish understanding of the situation.  For day-to-day (or even
week-to-week) work, testing isn't going anywhere.  But damn, I hope somebody
writes a provable kernel.


[1] See mjd's excellent slides "Strong Typing Does Not Have To Suck"
http://perl.plover.com/yak/typing/

[2] 
https://secure.wikimedia.org/wikipedia/en/wiki/Monad_(functional_programming)

[3] Do not click.  Oh god, you already did.


-- 
Being faith-based doesn't trump reality.
        -- Bruce Sterling

Re: Testing is dead, long live Open Proofs

Reply via email to