### Re: [SC-L] Theoretical question about vulnerabilities

David Crocker wrote: Well, that approach is certainly better than not guarding against buffer overflows at all. However, I maintain it is grossly inferior to the approach we use, which is to prove that all array accesses are within bounds. Proving that all array accesses are within bounds would seem to be Turing undecidable. Either you are not proving what you way you are proving, or your programs are not full Turing machines. Proof: insert diagonalization argument here : Issue: Some people may regard diagonalized programs are a contrivance, and are only interested in correctness proofs for real programs (for some value of real). Crispin's rebuttal: Suppose I want to prove that your program checker does not have any illegal array references ... What exactly is your program going to do when it detects an array bound violation at run-time? Hermes' kludge to address this was two-fold: 1. There are no arrays. Rather, there are relational tables, and you can extract a row based on a field value. You can programatically get a table to act much like an array by having a field with a unique index number, 1, 2, 3, etc. 2. If you try to extract a row from a table that does not have a matching value, then you get an exception. Exceptions are thrown and caught up the call chain the way most modern (Java etc.) languages do it. Yes, this is a kludge because it ultimately means a run-time exception, which is just a pretty way of handling a seg fault. Crispin -- Crispin Cowan, Ph.D. http://immunix.com/~crispin/ CTO, Immunix http://immunix.com

### RE: [SC-L] Theoretical question about vulnerabilities

Crispin wrote: Here's an example of a case it cannot prove: if X then Y - initial value endif ... if X then Z - Y + 1 endif The above code is correct in that Y's value is taken only when it has been initialized. But to prove the code correct, an analyzer would have to be flow sensitive, which is hard to do. The whole science of program proving is based on exactly this sort of flow analysis. The analyser will postulate that Y is initialised at the assignment to Z, given the deduced program state at that point. This is called a verification condition or proof obligation. It can then attempt to prove the hypothesis; for this example, the proof is trivial. I guess it would have been more difficult 20 years ago when Hermes was written. The alternative solution to the problem of uninitialised variables is for the language or static checker to enforce Java's definite initialisation rule or something similar. This at least guarantees predictable behaviour. An issue arises when a tool like Perfect Developer is generating Java code, because even though PD can prove that Y is initialised before use in the above example, the generated Java code would be violate the definite initialisation rule. We have to generate dummy initialisations in such cases. David Crocker, Escher Technologies Ltd. Consultancy, contracting and tools for dependable software development www.eschertech.com

### RE: [SC-L] Theoretical question about vulnerabilities

Crispin Cowan wrote: Precisely because statically proven array bounds checking is Turing Hard, that is not how such languages work. Rather, languages that guarantee array bounds insert dynamic checks on every array reference, and then use static checking to remove all of the dynamic checks that can be proven to be unnecessary. For instance, it is often the case that a tight inner loop has hard-coded static bounds, and so a static checker can prove that the dynamic checks can be removed from the inner loop, hoisting them to the outer loop and saving a large proportion of the execution cost of dynamic array checks. Well, that approach is certainly better than not guarding against buffer overflows at all. However, I maintain it is grossly inferior to the approach we use, which is to prove that all array accesses are within bounds. What exactly is your program going to do when it detects an array bound violation at run-time? You can program it to take some complicated recovery action; but how are you going to test that? You can abort the program (and restart it if it is a service); but then all you have done is to turn a potential security vulnerability into a denial of service vulnerability. So the better approach is to design the program so that there can be no buffer overflows; and then verify through proof (backed up by testing) that you have achieved that goal. David Crocker, Escher Technologies Ltd. Consultancy, contracting and tools for dependable software development www.eschertech.com

### RE: [SC-L] Theoretical question about vulnerabilities

-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of der Mouse Sent: 12 April 2005 05:15 To: SC-L@securecoding.org Subject: Re: [SC-L] Theoretical question about vulnerabilities [B]uffer overflows can always be avoided, because if there is ANY input whatsoever that can produce a buffer overflow, the proofs will fail and the problem will be identified. Then either (a) there exist programs which never access out-of-bounds but which the checker incorrectly flags as doing so, or (b) there exist programs for which the checker never terminates (quite possibly both). (This is simply the Halting Theorem rephrased.) Could you explain why you believe that proof of a specific property in a constrained environment is equivalent to the Halting Problem? When I explain and even demonstrate the benefits of formal reasoning I find that, increasingly often, someone pops up with Godel or Turing to prove that it is all a waste of time. Having done so they then, presumably, go back to using the less rigorous approaches which are a demonstrable cause of failure in the systems our industry builds. I really do find this line of argument rather irritating; the theoretical limits of proof are quite a different thing from the practical application of proof-based technology in a suitably constrained environment. For the record, I am quite confident that we can prove freedom from buffer overflow in a large and useful class of programs without hitting any of the limits you suggest. Indeed we, and our customers, are routinely doing so. Peter ** This email is confidential and intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, be advised that you have received this email in error and that any use, disclosure, copying or distribution or any action taken or omitted to be taken in reliance on it is strictly prohibited. If you have received this email in error please contact the sender. Any views or opinions presented in this email are solely those of the author and do not necessarily represent those of Praxis High Integrity Systems Ltd (Praxis HIS). Although this email and any attachments are believed to be free of any virus or other defect, no responsibility is accepted by Praxis HIS or any of its associated companies for any loss or damage arising in any way from the receipt or use thereof. The IT Department at Praxis HIS can be contacted at [EMAIL PROTECTED] **

### Re: [SC-L] Theoretical question about vulnerabilities

[B]uffer overflows can always be avoided, because if there is ANY input whatsoever that can produce a buffer overflow, the proofs will fail and the problem will be identified. Then either (a) there exist programs which never access out-of-bounds but which the checker incorrectly flags as doing so, or (b) there exist programs for which the checker never terminates (quite possibly both). (This is simply the Halting Theorem rephrased.) Could you explain why you believe that proof of a specific property in a constrained environment is equivalent to the Halting Problem? Take the code for the checker. Wrap it in a small amount of code that makes a deliberate out-of-bounds access if the checker outputs `no problem'. Then write another small bit of code which ignores its input and runs the wrapped checker on itself. Run the original checker on the result. This is basically the same diagonalization argument Turing used to demonstrate that the Halting Problem was unsolvable; that's the sense in which I call it the Halting Theorem rephrased. I really do find this line of argument rather irritating; the theoretical limits of proof are quite a different thing from the practical application of proof-based technology in a suitably constrained environment. Entirely true. But if you use theoretical language like proof, you have to expect to be held to a theroetical standard of correctness. /~\ The ASCIIder Mouse \ / Ribbon Campaign X Against HTML [EMAIL PROTECTED] / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

### Re: [SC-L] Theoretical question about vulnerabilities

der Mouse wrote: [B]uffer overflows can always be avoided, because if there is ANY input whatsoever that can produce a buffer overflow, the proofs will fail and the problem will be identified. Then either (a) there exist programs which never access out-of-bounds but which the checker incorrectly flags as doing so, or (b) there exist programs for which the checker never terminates (quite possibly both). (This is simply the Halting Theorem rephrased.) Precisely because statically proven array bounds checking is Turing Hard, that is not how such languages work. Rather, languages that guarantee array bounds insert dynamic checks on every array reference, and then use static checking to remove all of the dynamic checks that can be proven to be unnecessary. For instance, it is often the case that a tight inner loop has hard-coded static bounds, and so a static checker can prove that the dynamic checks can be removed from the inner loop, hoisting them to the outer loop and saving a large proportion of the execution cost of dynamic array checks. How much of this optimization can be done is arguable: * The JonesKelly GCC enhancement that does full array bounds checking makes (nearly?) no attempt at this optimization, and suffers slowdowns of 10X to 30X on real applications. * The Bounded Pointers GCC enhancement that does full array bounds checking but with a funky incompatible implementation that makes pointers bigger than a machine word, does some of these optimizations and suffers a slowdown of 3X to 5X. Some have argued that it can be improved from there, but how much remains to be seen. * Java compilers get the advantage of a language that was actually designed for type safety, in contrast with C that aggressively makes static type checking difficult. The last data I remember on Java is that turning array bounds checking on and off makes a 30% difference in performance. Crispin -- Crispin Cowan, Ph.D. http://immunix.com/~crispin/ CTO, Immunix http://immunix.com

### Re: [SC-L] Theoretical question about vulnerabilities

David Crocker wrote: Exactly. I'm not interested in trying to write a program prover that will prove that an arbitrary program is correct, if indeed it is. I am only interested in proving that well-structured programs are correct. The Hermes programming language took this approach http://www.research.ibm.com/people/d/dfb/hermes.html Hermes proved a safety property called Type State Checking in the course of compiling programs. Type State offers very nice safety properties for correctness, including proving that no variable will be used before it is initialized. But the Hermes Type State Checker was not formally complete; there were valid programs that the checker could not *prove* were correct, and so it would reject them. Here's an example of a case it cannot prove: if X then Y - initial value endif ... if X then Z - Y + 1 endif The above code is correct in that Y's value is taken only when it has been initialized. But to prove the code correct, an analyzer would have to be flow sensitive, which is hard to do. Here's where it gets interesting. The authors of Type State went and analyzed a big pile of existing code that was in production but that the Type State checker failed to prove correct. In (nearly?) every case, they found a *latent bug* associated with the code that failed to pass the Checker. We can infer from that result that code that depends on flow sensitivity for its correctness is hard for humans to reason about, and therefore likely to be wrong. Disclaimer: I worked on Hermes as an intern at the IBM Watson lab waay back in 1991 and 1992. Hermes is my favorite type safe programming language, but given the dearth of implementations, applications, and programmers, that is of little practical interest :) Crispin -- Crispin Cowan, Ph.D. http://immunix.com/~crispin/ CTO, Immunix http://immunix.com

### Re: [SC-L] Theoretical question about vulnerabilities

David Crocker wrote: 3. Cross-site scripting. This is a particular form of HTML injection and would be caught by the proof process in a similar way to SQL injection, provided that the specification included a notion of the generated HTML being well-formed. If that was missing from the specification, then HTML injection would not be caught. XSS occurs where client A can feed input to Server B such that client C will accept and trust the input. The correct specification is that Server B should do a perfect job of allowing clients to upload content that is damaging to other clients. I submit that this is infeasible without perfect knowledge of the vulnerabilities of all the possible clients. This seems to be begging the definition of prove correct pretty hard. You can do a pretty good job of preventing XSS by stripping user posts of all interesting features and permitting only basic HTML. But this still does not completely eliminate XSS, as you cannot a priori know about all the possible buffer overflows etc. of every client that will come to visit, and basic HTML still allows for some freaky stuff, e.g. very long labels. Crispin -- Crispin Cowan, Ph.D. http://immunix.com/~crispin/ CTO, Immunix http://immunix.com

### Re: [SC-L] Theoretical question about vulnerabilities

Nash wrote: ** It would be extremely interesting to know how many exploits could be expected after a reasonable period of execution time. It seems that as execution time went up we'd be less likely to have an exploit just show up. My intuition could be completely wrong, though. I would think that time is pretty much irrelevant, because it depends on the intelligence used to order the inputs you try. For instance, time-to-exploit will be very long if you feed inputs to (say) Microsoft IIS starting with one byte of input and going up in ASCII order. Time-to-exploit gets much shorter if you use a fuzzer program: an input generator that can be configured with the known semantic inputs of the victim program, and that focuses specifically on trying to find buffer overflows and printf format string errors by generating long strings and using strings containing %n. Even among fuzzers, time-to-exploit depends on how intelligent the fuzzer is in terms of aiming at the victim program's data structures. There are many specialized fuzzers aimed at various kinds of applications, aimed at network stacks, aimed at IDS software, etc. Crispin -- Crispin Cowan, Ph.D. http://immunix.com/~crispin/ CTO, Immunix http://immunix.com

### Re: [SC-L] Theoretical question about vulnerabilities

Pascal Meunier [EMAIL PROTECTED] writes Do you think it is possible to enumerate all the ways all vulnerabilities can be created? Is the set of all possible exploitable programming mistakes bounded? I believe that one can make a Turing machine halting argument to show that this is impossible. If you include denial of service attacks (and infinite loops are certainly a denial of service), then the halting problem applies immediately and trivially. But even if you exclude denial of service, I think you could construct a proof based on a Turing machine that halts if and only if there is an exploitable programming mistake, or something like that. I'm not really very good at such proofs, so I'll leave that as an exercise for the reader (convenient excuse for not doing the hard bits!) It is the practical impossibility of finding all the existing vulnerabilities that led to the Anderson study of 1972 replacing the penetrate and patch stratrategy for security with the security kernel approach of developing small and simple code that you could make a convincing argument that it is secure. That in turn led to the development of high assurance techniques for building secure systems that remain today the only way that has been shown to produce code with demonstrably better security than most of what's out there. (I say most deliberately. I'm sure that various people reading this will come up with examples of isolated systems that have good security but didn't use high assurance. No disputes there, so you don't need to fill up SC-L with a list of them. The point is that high assurance is a systematic engineering approach that works and, when followed, has excellent security results. The fact that almost no one uses it says much more about the lack of maturity of our field than about the technique itself. It took a VERY long time for bridge builders to develop enough discipline that most bridges stayed up. The same is true for software security, unfortunately. It also says a lot about whether people are really willing to pay for security.) Although old, Gasser's book probably has the best description of what I'm talking about. These two classic documents should be read by ANYONE trying to do secure coding, and fortunately, they are both online! Thanks to NIST and the University of Nebraska at Omaha for putting them up. (For completeness, the NIST website was created from documents scanned by the University of California at Davis.) citations: 1. Anderson, J.P., Computer Security Technology Planning Study, ESD-TR-73-51, Vols. I and II, October 1972, James P. Anderson and Co., Fort Washington, PA, HQ Electronic Systems Division: Hanscom AFB, MA. URL: http://csrc.nist.gov/publications/history/ande72.pdf 2. Gasser, M., Building Secure Systems. 1988, New York: Van Nostrand Reinhold. URL: http://nucia.ist.unomaha.edu/library/gasser.php - Paul

### Re: [SC-L] Theoretical question about vulnerabilities

Or until you find a bug in your automated prover. Or, worse, discover that a vulnerability exists despite your proof, meaning that you either missed a loophole in your spec or your prover has a bug, and you don't have the slightest idea which. On that basis, can I presume that you believe all programming should be done in machine code, in case the compiler/assembler/linker you might be tempted to use has a bug? You can presume anything you like. But in this case you'd be wrong. I was/am not arguing that such tools should not be used (for this or any other reason). My point is merely that calling what they do proof is misleading to the point where I'd call it outright wrong. You have roughly the same level of assurance that code passed by such a checker is correct that you do that machine/assembly code output by a traditional compiler is correct: good enough for most purposes, but by no stretch of the imagination is it even as close to proof as most mathematics proofs are - and, like them, it ultimately rests on smart people think it's OK. Ultimately, this amounts to a VHLL, except that [...nomenclature...]. And, as with any language, whoever writes this VHLL can write bugs. Like I said, you can still fail to include important security properties in the specification. However, [...]. Basically, the same arguments usually made for any higher-level langauge versus a corresponding lower-level language: machine versus assembly, assembly versus C, C versus Lisp, etc. And I daresay that it provides at least vaguely the same advantages and disadvantages as for most of the higher/lower level comparisons, too. But it is hardly the panacea that proving the program correct makes it sound like. As someone (who? I forget) is said to have said, Beware, I have only proven this program correct, not tested it. /~\ The ASCIIder Mouse \ / Ribbon Campaign X Against HTML [EMAIL PROTECTED] / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

### RE: [SC-L] Theoretical question about vulnerabilities

Pascal Meunier wrote: Do you think it is possible to enumerate all the ways all vulnerabilities can be created? Is the set of all possible exploitable programming mistakes bounded? No. It's not so much a programming problem, more a specification problem. Tools now exist that make it possible to develop single-threaded programs that are mathematically proven to meet their specifications. The problem is knowing what should be included in the specifications. Let me give you some examples: 1. Buffer overflow. Even if nobody realised that buffer overflows could be used to bypass security, it is an implied specification of any program that no array should ever be accessed via an out-of-bounds index. All the tools out there for proving software correctness take this as a given. So buffer overflows can always be avoided, because if there is ANY input whatsoever that can produce a buffer overflow, the proofs will fail and the problem will be identified. You don't even need to write a specification for the software in this case - the implied specification is enough. 2. SQL injection. If the required behaviour of the application is correctly specified, and the behaviour of the SQL server involved is also correctly specified (or at least the correct constraints are specified for SQL query commands), then it will be impossible to prove the software is correct if it has any SQL injection vulnerabilities. So again, this would be picked up by the proof process, even if nobody knew that SQL injection can be used to breach security. 3. Cross-site scripting. This is a particular form of HTML injection and would be caught by the proof process in a similar way to SQL injection, provided that the specification included a notion of the generated HTML being well-formed. If that was missing from the specification, then HTML injection would not be caught. 4. Tricks to make the browser display an address in the address bar that is not the address of the current HTML page. To catch these, you would need to include in the specification, the address bar shall always show the address of the current page. This is easy to state once you know it is a requirement; but until last year it would probably not have been an obvious requirement. In summary: If you can state what you mean by secure in terms of what must happen and what must not happen, then by using precise specifications and automatic proof, you can achieve complete security for all possible inputs - until the definition of secure needs to be expanded. This should have consequences for source code vulnerability analysis software. It should make it impossible to write software that detects all of the mistakes themselves. Is it enough to look for violations of some invariants (rules) without knowing how they happened? The problem is that while you can enumerate the set of invariants that you currently know are important, you don't know how the set may need to be expanded in the future. David Crocker, Escher Technologies Ltd. Consultancy, contracting and tools for dependable software development www.eschertech.com

### Re: [SC-L] Theoretical question about vulnerabilities

Pascal Meunier wrote: Do you think it is possible to enumerate all the ways all vulnerabilities can be created? Is the set of all possible exploitable programming mistakes bounded? By bounded I take you to mean finite. In particular with reference to your taxonomy below. By enumerate I take you to mean list out in a finite way. Please note, these are not the standard mathematical meanings for these terms. Though, they may be standard for CS folks. If I interpreted you correctly, then the answer is, no, as Crispin indicated. However, let's take enumerate to mean list out, one by one and allow ourselves to consider infinite enumerations as acceptable. In this case, the answer becomes, yes. This proof is abbreviated, but should be recognizable as a pretty standard argument by those familiar with computable functions and/or recursive function theory. Thm. The set of exploits for a program is enumerable. Pf. Let P(x) be a program computing the n-ary, partially computable function F(x). Let an exploit be a natural number input, y, such that at some time, t, during the computation performed by P(y) the fixed memory address, Z, contains the number k.** Then, there exists a computable function G(x,t) such that: - G(x, t) = 1 if and only if P(x) gives value k to address Z at some time less than or equal to t. - G(x, t) = 0 otherwise. The values of x for which G(x,t) = 1 is effectively enumerable (in the infinite sense) because it is the domain of a computable function. Q.E.D. You can look up the relevent theory behind this proof in [Davis]. So, where does this leave us? Well, what we don't have is a computable predicate, Exploit(p,y), that always tells us if y is an exploit for the program p. That's what Crispin was saying about Turing. This predicate is equivalently hard to Halt(p,y), which is not computable. However, we can enumerate all the inputs that eventually result in the computer's state satisfying the (Z == k) condition. I suspect this is probably all you really need for a given program, as a practical matter. Since, for example, most attackers probably will not wait for hours and hours while an exploit develops.* I think the real issue here is complexity, not computability. It takes a long time to come up with the exploits. Maybe the time it takes is too long for the amount of real economic value gained by the knowledge of what's in that set. That seems to be part of Crispin's objection (more or less). I would think that what makes it possible to talk about design patterns and attack patterns is that they reflect intentional actions towards desirable (for the perpetrator) goals, and the set of desirable goals is bounded at any given time (assuming infinite time then perhaps it is not bounded). I think this is a very reasonable working assumption. It seems consistent with my experience that given any actual system at any actual point in time there are only finitely many desirable objectives in play. There are many more theoretical objectives, though, so how you choose to pare down the list could determine whether you end up with a useful scheme, or not. All we can hope is to come reasonably close and produce something useful, but not theoretically strong and closed. I think that there's lots of work going on in proof theory and Semantics that makes me hopeful we'll eventually get tools that are both useful and strong. Model Checking is one approach and it seems to have alot of promise. It's relatively fast, e.g., and unlike deductive approaches it doesn't require a mathematician to drive it. See [Clarke] for details. [Clarke] is very interesting, I think. He explicitly argues that model checking beats other formal methods at dealing with the state space explosion problem. Those with a more practical mind-set are probably laughing that beating the other formal methods isn't really saying much because they are all pretty awful. ;-) Is it enough to look for violations of some invariants (rules) without knowing how they happened? In the static checking sense, I don't see how this could be done. Any thoughts on this? Any references to relevant theories of failures and errors, or to explorations of this or similar ideas, would be welcome. There are academics active in this field of research. Here's a few links: http://cm.bell-labs.com/cm/cs/what/spin2005/ http://www.google.com/search?q=international+SPIN+workshopstart=0start=0ie=utf-8oe=utf-8client=firefox-arls=org.mozilla:en-US:official ciao, -nash Notes: ** This definition of exploit is chosen more or less arbitrarily. It seems reasonable to me. It might not be. I would conjecture that any definition of exploit would be equivalent to this issue, though. Halt(x,y) is not computable, but it is enumerable. That is, I can list out, one by one, all the inputs y on which program x