I think Gunnar hit a lot of the important points. Bake offs do provide interesting data. I have a few slide decks which I've created to help companies with this problem, and would be happy to provide them to anyone willing to email me side-channel. Of the items Gunnar listed, I find that baking off tools helps organizations understand where they're going to have to apply horsepower and money.
For instance, companies that purchase Coverity's Prevent seem to have little trouble getting penetration into their dev. teams, even beyond initial pilot. Model tuning provides breeze-easy ability to keep 'mostly effective' rules in play and still reduce false positives. However, with that ease of adoption and developer-driven results interpretation, orgs. buy some inflexibility in terms of later extensibility. Java support, now only in beta, lacks sorely and the mechanisms by which one writes custom checkers poses a stiff learning curve. Whereas, when one adopts Fortify's sourceAnalyzer, developer penetration will be _the_ problem unless the piloting team bakes a lot of rule tuning into the product's configuration and results pruning into the usable model prior to role out. However, later customization seems easiest of any of the tools I'm familiar with. Language and rules coverage seems, at the macro-level, consistently the most robust. In contrast, it takes real experience to illuminate each tool's difference in the accuracy department. Only a bakeoff that contains _your_ organization's code can help cut through the fog of what each vendor's account manager will promise. The reason seems to be that the way a lot of these tools behave relative to each other (especially Prexis, K7, and Source Analyzer) depends greatly on minute details of how they implemented rules. However, at the end of the day, their technologies remain shockingly similar (at least as compared to products from Coverity, Secure Software, or Microsoft's internal Prefix). For instance, in one bake off, we found that (with particular open source C code) Fortify's tool found more unique instances of overflows on stack-based, locally declared buffers, with offending locally declared length-specifiers. However, Klocwork's tool was profoundly more accurate in cases in which the overflow had similar properties but represented an 'off by one' error within a buffer declared as a fixed length array. Discussing tradeoffs in tool implementation at this level leads bakers down a bevy of rabbit holes. Looking at them to the extent Cigital does, for deep understanding of our clients' code and how _exactly_ the tool is helping/hurting us isn't _your_ goal. But, by collecting data on 7 figures of your own code base, you can start to see what trends in your programmers' coding practices play to which tools. This, can in fact, help you make a better tool choice. ---- John Steven Technical Director; Principal, Software Security Group Direct: (703) 404-5726 Cell: (703) 727-4034 Key fingerprint = 4772 F7F3 1019 4668 62AD 94B0 AE7F http://www.cigital.com Software Confidence. Achieved. On Jan 6, 2007, at 11:27 AM, Gunnar Peterson wrote: >> 1. I haven't gotten a sense that a bakeoff matters. For example, >> if I wanted >> to write a simple JSP application, it really doesn't matter if I >> use Tomcat, >> Jetty, Resin or BEA from a functionality perspective while they >> may each have >> stuff that others don't, at the end of the day they are all good >> enough. So is >> there really that much difference in comparing say Fortify to >> OunceLabs or >> whatever other tools in this space exist vs simply choosing which >> ever one >> wants to cut me the best deal (e.g. site license for $99 a year :-) ? >> > > I recommend that companies do a bakeoff to determine > > 1. ease of integration with dev process - everyone's dev/build > process is > slightly different > > 2. signal to noise ratio - is the tool finding high priority/high > impact > bugs? > > 3. remediation guidance - finding is great, fixing is better, how > actionable and relevant is the remediation guidance? > > 4. extensibility - say you have a particular interface, like mq > series for > example, which has homegrown authN and authZ foo that you want to > use the > static analysis to determine if it is used correctly. How easy is it > build/check/enfore these rules? > > 5. roles - how easy is it to separate out roles/reports/ > functionaility like > developer, ant jockey, and auditor? > > 6. software architecture span - your high risk/high priority apps are > probably multi-tier w/ lots of integration points, how much > visibility to > how many integration points and tiers does the static analysis tool > allow > you to see? How easy is it to correlate across tiers and interfaces? > ---------------------------------------------------------------------------- This electronic message transmission contains information that may be confidential or privileged. The information contained herein is intended solely for the recipient and use by any other party is not authorized. If you are not the intended recipient (or otherwise authorized to receive this message by the intended recipient), any disclosure, copying, distribution or use of the contents of the information is prohibited. If you have received this electronic message transmission in error, please contact the sender by reply email and delete all copies of this message. Cigital, Inc. accepts no responsibility for any loss or damage resulting directly or indirectly from the use of this email or its contents. Thank You. ---------------------------------------------------------------------------- _______________________________________________ Secure Coding mailing list (SC-L) SC-L@securecoding.org List information, subscriptions, etc - http://krvw.com/mailman/listinfo/sc-l List charter available at - http://www.securecoding.org/list/charter.php SC-L is hosted and moderated by KRvW Associates, LLC (http://www.KRvW.com) as a free, non-commercial service to the software security community. _______________________________________________