Re: [SC-L] Theoretical question about vulnerabilities

2005-04-15 Thread Crispin Cowan
David Crocker wrote:
Well, that approach is certainly better than not guarding against buffer
overflows at all. However, I maintain it is grossly inferior to the approach we
use, which is to prove that all array accesses are within bounds.
Proving that all array accesses are within bounds would seem to be 
Turing undecidable. Either you are not proving what you way you are 
proving, or your programs are not full Turing machines.

Proof: insert diagonalization argument here :
Issue: Some people may regard diagonalized programs are a contrivance, 
and are only interested in correctness proofs for real programs (for 
some value of real).

Crispin's rebuttal: Suppose I want to prove that your program checker 
does not have any illegal array references ...

What exactly
is your program going to do when it detects an array bound violation at
run-time?
Hermes' kludge to address this was two-fold:
  1. There are no arrays. Rather, there are relational tables, and you
 can extract a row based on a field value. You can programatically
 get a table to act much like an array by having a field with a
 unique index number, 1, 2, 3, etc.
  2. If you try to extract a row from a table that does not have a
 matching value, then you get an exception. Exceptions are thrown
 and caught up the call chain the way most modern (Java etc.)
 languages do it.
Yes, this is a kludge because it ultimately means a run-time exception, 
which is just a pretty way of handling a seg fault.

Crispin
--
Crispin Cowan, Ph.D.  http://immunix.com/~crispin/
CTO, Immunix  http://immunix.com



RE: [SC-L] Theoretical question about vulnerabilities

2005-04-14 Thread David Crocker
Crispin wrote:

 Here's an example of a case it cannot prove:

if X then
Y - initial value
endif
...
if X then
Z - Y + 1
endif

The above code is correct in that Y's value is taken only when it has
been initialized. But to prove the code correct, an analyzer would have
to be flow sensitive, which is hard to do.


The whole science of program proving is based on exactly this sort of flow
analysis. The analyser will postulate that Y is initialised at the assignment to
Z, given the deduced program state at that point. This is called a verification
condition or proof obligation. It can then attempt to prove the hypothesis;
for this example, the proof is trivial. I guess it would have been more
difficult 20 years ago when Hermes was written.

The alternative solution to the problem of uninitialised variables is for the
language or static checker to enforce Java's definite initialisation rule or
something similar. This at least guarantees predictable behaviour.

An issue arises when a tool like Perfect Developer is generating Java code,
because even though PD can prove that Y is initialised before use in the above
example, the generated Java code would be violate the definite initialisation
rule. We have to generate dummy initialisations in such cases.

David Crocker, Escher Technologies Ltd.
Consultancy, contracting and tools for dependable software development
www.eschertech.com



RE: [SC-L] Theoretical question about vulnerabilities

2005-04-14 Thread David Crocker
Crispin Cowan wrote:


Precisely because statically proven array bounds checking is Turing Hard, that
is not how such languages work.

Rather, languages that guarantee array bounds insert dynamic checks on every
array reference, and then use static checking to remove all of the dynamic
checks that can be proven to be unnecessary. For instance, it is often the case
that a tight inner loop has hard-coded static bounds, and so a static checker
can prove that the dynamic checks can be removed from the inner loop, hoisting
them to the outer loop and saving a large proportion of the execution cost of
dynamic array checks.


Well, that approach is certainly better than not guarding against buffer
overflows at all. However, I maintain it is grossly inferior to the approach we
use, which is to prove that all array accesses are within bounds. What exactly
is your program going to do when it detects an array bound violation at
run-time? You can program it to take some complicated recovery action; but how
are you going to test that? You can abort the program (and restart it if it is a
service); but then all you have done is to turn a potential security
vulnerability into a denial of service vulnerability.

So the better approach is to design the program so that there can be no buffer
overflows; and then verify through proof (backed up by testing) that you have
achieved that goal.

David Crocker, Escher Technologies Ltd.
Consultancy, contracting and tools for dependable software development
www.eschertech.com



RE: [SC-L] Theoretical question about vulnerabilities

2005-04-13 Thread Peter Amey


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED]
 Behalf Of der Mouse
 Sent: 12 April 2005 05:15
 To: SC-L@securecoding.org
 Subject: Re: [SC-L] Theoretical question about vulnerabilities


  [B]uffer overflows can always be avoided, because if there is ANY
  input whatsoever that can produce a buffer overflow, the proofs will
  fail and the problem will be identified.

 Then either (a) there exist programs which never access out-of-bounds
 but which the checker incorrectly flags as doing so, or (b)
 there exist
 programs for which the checker never terminates (quite possibly both).
 (This is simply the Halting Theorem rephrased.)


Could you explain why you believe that proof of a specific property in a 
constrained environment
is equivalent to the
Halting Problem?  When I explain and even demonstrate the benefits of formal 
reasoning I find
that, increasingly
often, someone pops up with Godel or Turing to prove that it is all a waste 
of time.  Having
done so they then,
presumably, go back to using the less rigorous approaches which are a 
demonstrable cause of
failure in the systems our
industry builds.

I really do find this line of argument rather irritating; the theoretical 
limits of proof are
quite a different thing
from  the practical application of proof-based technology in a suitably 
constrained environment. 
For the record, I am
quite confident that we can prove freedom from buffer overflow in a large and 
useful class of
programs without hitting
any of the limits you suggest.  Indeed we, and our customers, are routinely 
doing so.

Peter


**

This email is confidential and intended solely for the use of the individual to 
whom it is
addressed.  If you are not
the intended recipient, be advised that you have received this email in error 
and that any use,
disclosure, copying or
distribution or any action taken or omitted to be taken in reliance on it is 
strictly prohibited. 
If you have
received this email in error please contact the sender.  Any views or opinions 
presented in this
email are solely
those of the author and do not necessarily represent those of Praxis High 
Integrity Systems Ltd
(Praxis HIS).

 Although this email and any attachments are believed to be free of any virus 
or other defect, no
responsibility is
accepted by Praxis HIS or any of its associated companies for any loss or 
damage arising in any
way from the receipt
or use thereof.  The IT Department at Praxis HIS can be contacted at [EMAIL 
PROTECTED]

**





Re: [SC-L] Theoretical question about vulnerabilities

2005-04-13 Thread der Mouse
 [B]uffer overflows can always be avoided, because if there is ANY
 input whatsoever that can produce a buffer overflow, the proofs
 will fail and the problem will be identified.
 Then either (a) there exist programs which never access
 out-of-bounds but which the checker incorrectly flags as doing so,
 or (b) there exist programs for which the checker never terminates
 (quite possibly both).  (This is simply the Halting Theorem
 rephrased.)
 Could you explain why you believe that proof of a specific property
 in a constrained environment is equivalent to the Halting Problem?

Take the code for the checker.  Wrap it in a small amount of code that
makes a deliberate out-of-bounds access if the checker outputs `no
problem'.  Then write another small bit of code which ignores its input
and runs the wrapped checker on itself.

Run the original checker on the result.

This is basically the same diagonalization argument Turing used to
demonstrate that the Halting Problem was unsolvable; that's the sense
in which I call it the Halting Theorem rephrased.

 I really do find this line of argument rather irritating; the
 theoretical limits of proof are quite a different thing from the
 practical application of proof-based technology in a suitably
 constrained environment.

Entirely true.  But if you use theoretical language like proof, you
have to expect to be held to a theroetical standard of correctness.

/~\ The ASCIIder Mouse
\ / Ribbon Campaign
 X  Against HTML   [EMAIL PROTECTED]
/ \ Email! 7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B



Re: [SC-L] Theoretical question about vulnerabilities

2005-04-13 Thread Crispin Cowan
der Mouse wrote:
[B]uffer overflows can always be avoided, because if there is ANY
input whatsoever that can produce a buffer overflow, the proofs will
fail and the problem will be identified.

Then either (a) there exist programs which never access out-of-bounds
but which the checker incorrectly flags as doing so, or (b) there exist
programs for which the checker never terminates (quite possibly both).
(This is simply the Halting Theorem rephrased.)

Precisely because statically proven array bounds checking is Turing
Hard, that is not how such languages work.
Rather, languages that guarantee array bounds insert dynamic checks on
every array reference, and then use static checking to remove all of the
dynamic checks that can be proven to be unnecessary. For instance, it is
often the case that a tight inner loop has hard-coded static bounds, and
so a static checker can prove that the dynamic checks can be removed
from the inner loop, hoisting them to the outer loop and saving a large
proportion of the execution cost of dynamic array checks.
How much of this optimization can be done is arguable:
   * The JonesKelly GCC enhancement that does full array bounds
 checking makes (nearly?) no attempt at this optimization, and
 suffers slowdowns of 10X to 30X on real applications.
   * The Bounded Pointers GCC enhancement that does full array bounds
 checking but with a funky incompatible implementation that makes
 pointers bigger than a machine word, does some of these
 optimizations and suffers a slowdown of 3X to 5X. Some have argued
 that it can be improved from there, but how much remains to be seen.
   * Java compilers get the advantage of a language that was actually
 designed for type safety, in contrast with C that aggressively
 makes static type checking difficult. The last data I remember on
 Java is that turning array bounds checking on and off makes a 30%
 difference in performance.
Crispin
--
Crispin Cowan, Ph.D.  http://immunix.com/~crispin/
CTO, Immunix  http://immunix.com



Re: [SC-L] Theoretical question about vulnerabilities

2005-04-13 Thread Crispin Cowan
David Crocker wrote:
Exactly. I'm not interested in trying to write a program prover that will prove
that an arbitrary program is correct, if indeed it is. I am only interested in
proving that well-structured programs are correct.
The Hermes programming language took this approach
http://www.research.ibm.com/people/d/dfb/hermes.html
Hermes proved a safety property called Type State Checking in the course
of compiling programs. Type State offers very nice safety properties for
correctness, including proving that no variable will be used before it
is initialized. But the Hermes Type State Checker was not formally
complete; there were valid programs that the checker could not *prove*
were correct, and so it would reject them. Here's an example of a case
it cannot prove:
if X then
   Y - initial value
endif
...
if X then
   Z - Y + 1
endif
The above code is correct in that Y's value is taken only when it has
been initialized. But to prove the code correct, an analyzer would have
to be flow sensitive, which is hard to do.
Here's where it gets interesting. The authors of Type State went and
analyzed a big pile of existing code that was in production but that the
Type State checker failed to prove correct. In (nearly?) every case,
they found a *latent bug* associated with the code that failed to pass
the Checker. We can infer from that result that code that depends on
flow sensitivity for its correctness is hard for humans to reason about,
and therefore likely to be wrong.
Disclaimer: I worked on Hermes as an intern at the IBM Watson lab waay
back in 1991 and 1992. Hermes is my favorite type safe programming
language, but given the dearth of implementations, applications, and
programmers, that is of little practical interest :)
Crispin
--
Crispin Cowan, Ph.D.  http://immunix.com/~crispin/
CTO, Immunix  http://immunix.com



Re: [SC-L] Theoretical question about vulnerabilities

2005-04-12 Thread Crispin Cowan
David Crocker wrote:
3. Cross-site scripting. This is a particular form of HTML injection and would
be caught by the proof process in a similar way to SQL injection, provided that
the specification included a notion of the generated HTML being well-formed. If
that was missing from the specification, then HTML injection would not be
caught.

XSS occurs where client A can feed input to Server B such that client C
will accept and trust the input. The correct specification is that
Server B should do a perfect job of allowing clients to upload content
that is damaging to other clients. I submit that this is infeasible
without perfect knowledge of the vulnerabilities of all the possible
clients. This seems to be begging the definition of prove correct
pretty hard.
You can do a pretty good job of preventing XSS by stripping user posts
of all interesting features and permitting only basic HTML. But this
still does not completely eliminate XSS, as you cannot a priori know
about all the possible buffer overflows  etc. of every client that will
come to visit, and basic HTML still allows for some freaky stuff, e.g.
very long labels.
Crispin
--
Crispin Cowan, Ph.D.  http://immunix.com/~crispin/
CTO, Immunix  http://immunix.com


Re: [SC-L] Theoretical question about vulnerabilities

2005-04-12 Thread Crispin Cowan
Nash wrote:
** It would be extremely interesting to know how many exploits could
be expected after a reasonable period of execution time. It seems that
as execution time went up we'd be less likely to have an exploit just
show up. My intuition could be completely wrong, though.

I would think that time is pretty much irrelevant, because it depends
on the intelligence used to order the inputs you try. For instance,
time-to-exploit will be very long if you feed inputs to (say) Microsoft
IIS starting with one byte of input and going up in ASCII order.
Time-to-exploit gets much shorter if you use a fuzzer program: an
input generator that can be configured with the known semantic inputs of
the victim program, and that focuses specifically on trying to find
buffer overflows and printf format string errors by generating long
strings and using strings containing %n.
Even among fuzzers, time-to-exploit depends on how intelligent the
fuzzer is in terms of aiming at the victim program's data structures.
There are many specialized fuzzers aimed at various kinds of
applications, aimed at network stacks, aimed at IDS software, etc.
Crispin
--
Crispin Cowan, Ph.D.  http://immunix.com/~crispin/
CTO, Immunix  http://immunix.com



Re: [SC-L] Theoretical question about vulnerabilities

2005-04-12 Thread karger

Pascal Meunier [EMAIL PROTECTED] writes

Do you think it is possible to enumerate all the ways all vulnerabilities
can be created?  Is the set of all possible exploitable programming mistakes
bounded?

I believe that one can make a Turing machine halting argument to show
that this is impossible.  If you include denial of service attacks
(and infinite loops are certainly a denial of service), then the
halting problem applies immediately and trivially.  But even if you
exclude denial of service, I think you could construct a proof based
on a Turing machine that halts if and only if there is an exploitable
programming mistake, or something like that.

I'm not really very good at such proofs, so I'll leave that as an
exercise for the reader (convenient excuse for not doing the hard
bits!)

It is the practical impossibility of finding all the existing
vulnerabilities that led to the Anderson study of 1972 replacing the
penetrate and patch stratrategy for security with the security
kernel approach of developing small and simple code that you could
make a convincing argument that it is secure.  That in turn led to the
development of high assurance techniques for building secure systems
that remain today the only way that has been shown to produce code
with demonstrably better security than most of what's out there.

(I say most deliberately.  I'm sure that various people reading this
will come up with examples of isolated systems that have good security
but didn't use high assurance. No disputes there, so you don't need to
fill up SC-L with a list of them.  The point is that high assurance is
a systematic engineering approach that works and, when followed, has
excellent security results.  The fact that almost no one uses it says
much more about the lack of maturity of our field than about the
technique itself.  It took a VERY long time for bridge builders to
develop enough discipline that most bridges stayed up.  The same is
true for software security, unfortunately.  It also says a lot about
whether people are really willing to pay for security.)

Although old, Gasser's book probably has the best description of what
I'm talking about.  These two classic documents should be read by
ANYONE trying to do secure coding, and fortunately, they are both
online!  Thanks to NIST and the University of Nebraska at Omaha for
putting them up.  (For completeness, the NIST website was created from
documents scanned by the University of California at Davis.)

citations:

1.  Anderson, J.P., Computer Security Technology Planning Study,
ESD-TR-73-51, Vols. I and II, October 1972, James P. Anderson and
Co., Fort Washington, PA, HQ Electronic Systems Division: Hanscom
AFB, MA. URL: http://csrc.nist.gov/publications/history/ande72.pdf

2.  Gasser, M., Building Secure Systems. 1988, New York: Van Nostrand
Reinhold.  URL: http://nucia.ist.unomaha.edu/library/gasser.php


- Paul



Re: [SC-L] Theoretical question about vulnerabilities

2005-04-12 Thread der Mouse
 Or until you find a bug in your automated prover.  Or, worse,
 discover that a vulnerability exists despite your proof, meaning
 that you either missed a loophole in your spec or your prover has a
 bug, and you don't have the slightest idea which.
 On that basis, can I presume that you believe all programming should
 be done in machine code, in case the compiler/assembler/linker you
 might be tempted to use has a bug?

You can presume anything you like.  But in this case you'd be wrong.

I was/am not arguing that such tools should not be used (for this or
any other reason).  My point is merely that calling what they do
proof is misleading to the point where I'd call it outright wrong.
You have roughly the same level of assurance that code passed by such a
checker is correct that you do that machine/assembly code output by a
traditional compiler is correct: good enough for most purposes, but by
no stretch of the imagination is it even as close to proof as most
mathematics proofs are - and, like them, it ultimately rests on
smart people think it's OK.

 Ultimately, this amounts to a VHLL, except that
 [...nomenclature...].  And, as with any language, whoever writes
 this VHLL can write bugs.
 Like I said, you can still fail to include important security
 properties in the specification.  However, [...].

Basically, the same arguments usually made for any higher-level
langauge versus a corresponding lower-level language: machine versus
assembly, assembly versus C, C versus Lisp, etc.

And I daresay that it provides at least vaguely the same advantages and
disadvantages as for most of the higher/lower level comparisons, too.
But it is hardly the panacea that proving the program correct makes
it sound like.  As someone (who? I forget) is said to have said,
Beware, I have only proven this program correct, not tested it.

/~\ The ASCIIder Mouse
\ / Ribbon Campaign
 X  Against HTML   [EMAIL PROTECTED]
/ \ Email! 7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B




RE: [SC-L] Theoretical question about vulnerabilities

2005-04-11 Thread David Crocker
Pascal Meunier wrote:


Do you think it is possible to enumerate all the ways all vulnerabilities can be
created?  Is the set of all possible exploitable programming mistakes bounded?


No. It's not so much a programming problem, more a specification problem.

Tools now exist that make it possible to develop single-threaded programs that
are mathematically proven to meet their specifications. The problem is knowing
what should be included in the specifications. Let me give you some examples:

1. Buffer overflow. Even if nobody realised that buffer overflows could be used
to bypass security, it is an implied specification of any program that no array
should ever be accessed via an out-of-bounds index. All the tools out there for
proving software correctness take this as a given. So buffer overflows can
always be avoided, because if there is ANY input whatsoever that can produce a
buffer overflow, the proofs will fail and the problem will be identified. You
don't even need to write a specification for the software in this case - the
implied specification is enough.

2. SQL injection. If the required behaviour of the application is correctly
specified, and the behaviour of the SQL server involved is also correctly
specified (or at least the correct constraints are specified for SQL query
commands), then it will be impossible to prove the software is correct if it has
any SQL injection vulnerabilities. So again, this would be picked up by the
proof process, even if nobody knew that SQL injection can be used to breach
security.

3. Cross-site scripting. This is a particular form of HTML injection and would
be caught by the proof process in a similar way to SQL injection, provided that
the specification included a notion of the generated HTML being well-formed. If
that was missing from the specification, then HTML injection would not be
caught.

4. Tricks to make the browser display an address in the address bar that is not
the address of the current HTML page. To catch these, you would need to include
in the specification, the address bar shall always show the address of the
current page. This is easy to state once you know it is a requirement; but
until last year it would probably not have been an obvious requirement.

In summary: If you can state what you mean by secure in terms of what must
happen and what must not happen, then by using precise specifications and
automatic proof, you can achieve complete security for all possible inputs -
until the definition of secure needs to be expanded.


This should have consequences for source code vulnerability analysis software.
It should make it impossible to write software that detects all of the mistakes
themselves.  Is it enough to look for violations of some invariants (rules)
without knowing how they happened?


The problem is that while you can enumerate the set of invariants that you
currently know are important, you don't know how the set may need to be expanded
in the future.

David Crocker, Escher Technologies Ltd.
Consultancy, contracting and tools for dependable software development
www.eschertech.com



Re: [SC-L] Theoretical question about vulnerabilities

2005-04-11 Thread Nash
Pascal Meunier wrote:
 Do you think it is possible to enumerate all the ways
 all vulnerabilities can be created? Is the set of all
 possible exploitable programming mistakes bounded?

By bounded I take you to mean finite. In particular with reference
to your taxonomy below. By enumerate I take you to mean list out in
a finite way. Please note, these are not the standard mathematical
meanings for these terms. Though, they may be standard for CS folks.

If I interpreted you correctly, then the answer is, no, as Crispin
indicated.

However, let's take enumerate to mean list out, one by one and allow
ourselves to consider infinite enumerations as acceptable. In this case,
the answer becomes, yes.

This proof is abbreviated, but should be recognizable as a pretty
standard argument by those familiar with computable functions and/or
recursive function theory.

   Thm. The set of exploits for a program is enumerable.

   Pf.

   Let P(x) be a program computing the n-ary, partially computable
   function F(x). Let an exploit be a natural number input, y, such
   that at some time, t, during the computation performed by P(y) the
   fixed memory address, Z, contains the number k.**

   Then, there exists a computable function G(x,t) such that:

- G(x, t) = 1 if and only if P(x) gives value k to address Z at
some time less than or equal to t.

- G(x, t) = 0 otherwise.

   The values of x for which G(x,t) = 1 is effectively enumerable (in
   the infinite sense) because it is the domain of a computable function.

Q.E.D.

You can look up the relevent theory behind this proof in [Davis].

So, where does this leave us? Well, what we don't have is a computable
predicate, Exploit(p,y), that always tells us if y is an exploit for
the program p. That's what Crispin was saying about Turing. This predicate
is equivalently hard to Halt(p,y), which is not computable.

However, we can enumerate all the inputs that eventually result in the
computer's state satisfying the (Z == k) condition. I suspect this is
probably all you really need for a given program, as a practical matter.
Since, for example, most attackers probably will not wait for hours and
hours while an exploit develops.*

I think the real issue here is complexity, not computability. It takes a
long time to come up with the exploits. Maybe the time it takes is too
long for the amount of real economic value gained by the knowledge of
what's in that set. That seems to be part of Crispin's objection (more or
less).


 I would think that what makes it possible to talk about design patterns and
 attack patterns is that they reflect intentional actions towards desirable
 (for the perpetrator) goals, and the set of desirable goals is bounded at
 any given time (assuming infinite time then perhaps it is not bounded).

I think this is a very reasonable working assumption. It seems
consistent with my experience that given any actual system at any actual
point in time there are only finitely many desirable objectives in
play. There are many more theoretical objectives, though, so how you
choose to pare down the list could determine whether you end up with a
useful scheme, or not.


 All we can hope is to come reasonably close and produce something useful,
 but not theoretically strong and closed.

I think that there's lots of work going on in proof theory and Semantics
that makes me hopeful we'll eventually get tools that are both useful
and strong. Model Checking is one approach and it seems to have alot of
promise. It's relatively fast, e.g., and unlike deductive approaches it
doesn't require a mathematician to drive it. See [Clarke] for details.
[Clarke] is very interesting, I think. He explicitly argues that model
checking beats other formal methods at dealing with the state space
explosion problem.

Those with a more practical mind-set are probably laughing that beating
the other formal methods isn't really saying much because they are all
pretty awful. ;-)

 Is it enough to look for violations of some
 invariants (rules) without knowing how they happened?

In the static checking sense, I don't see how this could be done.


 Any thoughts on this?  Any references to relevant theories of failures and
 errors, or to explorations of this or similar ideas, would be welcome.

There are academics active in this field of research. Here's a few
links:

http://cm.bell-labs.com/cm/cs/what/spin2005/


http://www.google.com/search?q=international+SPIN+workshopstart=0start=0ie=utf-8oe=utf-8client=firefox-arls=org.mozilla:en-US:official


ciao,

 -nash

Notes:

** This definition of exploit is chosen more or less arbitrarily. It
seems reasonable to me. It might not be. I would conjecture that any
definition of exploit would be equivalent to this issue, though.

 Halt(x,y) is not computable, but it is enumerable. That is, I can
list out, one by one, all the inputs y on which program x