NIST already responded to my email on a different list. I was impressed
with what they had to say...
**************
We have been releasing the real deep data. There have been delays, but there
are no sinister reasons for the delays.
The results of the 2nd SATE (our report and all data) will be released in June
(We promised to release between February and May, but we're late with the
report).
We released the results of the 1st SATE last summer: our report, the raw tool
reports, and our analysis of the reports. The data is available (below the list
of cautions) from
http://samate.nist.gov/SATE2008.html
or a direct link:
http://samate.nist.gov/SATE2008/resources/sate2008.tar.gz
I will answer some specific points in Jim's email below, but first, let me
describe some limitations of SATE and how we are addressing them. SATE 2008 had
a number of big limitations, including:
1) We analyzed a non-random subset of tool warnings
2) Determining correctness of tool warnings turned out to be more complicated
than a binary true/false decision. Also, determining relevance of a warning to
security turned out more difficult than we thought.
3) In most cases, we did not match warnings from different tools that refer to
the same weakness. When we started SATE, we thought that we could match
warnings by line number and weakness name or CWE id. In fact, most weaknesses
are more complex - see Section 3.4 of our report.
4) Analysis criteria were applied inconsistently.
In our publicly released analysis, we used the confirmed/unconfirmed markings
instead of true/false markings. We describe the reasons for this in our report
- Section 4.2, page 29 of
http://samate.nist.gov/docs/NIST_Special_Publication_500-279.pdf
In SATE 2009, we made some improvements, including:
1) Randomly select a subset of tool warnings for analysis
2) We also looked at tool warnings that were related to human findings by
security experts.
3) Use 4 categories for analysis of correctness: true, true but insignificant
(for security), false, unknown. It is an improvement, but there are still
problems: for example distinguishing true from true but insignificant is often
hard
> 1) false positive rates from these tools are overwhelming
First, defining a false positive is tough. Also in SATE 2008, the criteria
that we used for analysis of correctness were inconsistent, we did not analyze
a random sample of warnings, our analysis had errors. Steve gave a good example
in his reply. We corrected some of these problems in 2009, but still way to go
> 2) the work load to triage results from ONE of these tools were
> man-years
We are not the developers of the test cases, our knowledge of the test case
code is very limited. Also, we used tools differently from their use in
practice. We analyzed tool warnings for correctness and looked for related
warnings from other tools, whereas developers use tools to determine what
changes need to be made to software, auditors look for evidence of assurance.
> 3) by every possible measurement, manual review was more cost effective
As Steve said, SATE did not consider cost. In SATE 2009, we had security
contractors analyze two of the test cases and report the most important
security weaknesses. We then looked at tool warnings that report the same (or
related) weakness. This will be released as part of 2009 release (The data set
is too small for statistical conclusions.)
A big limitation of SATE has been the lack of ground truths about what security
weaknesses really are in the test cases. This determination is hard for reasonably large
software. We are trying to address this: manual analysis by security contractors,
"CVE-selected" test cases.
> the NIST team chose only a small percentage of the automated findings to
review
A small percentage by itself should not be a problem if the selection of tool
warnings is done correctly (it was not done correctly in SATE 2008).
Vadim
I feel that NIST made a few errors in the first 2 SATE studies.
After the second round of SATE, the results were never fully released
to the public - even when NIST agreed to do just that at the inception
of the contest. I do not understand why SATE censored the final
results - I feel such censorship hurts the industry.
And even worse, I felt that vendor pressure encouraged NIST to not
release the final results. If the results (the real deep data, not the
executive summary that NIST release) were favorable to the tool
vendors, I bet they would have welcomed the release of the real data.
But instead, vendor pressure caused NIST to block the release of the
final data set.
The problems that the data would have revealed is:
1) false positive rates from these tools are overwhelming
2) the work load to triage results from ONE of these tools were man-years
3) by every possible measurement, manual review was more cost effective
Even worse were the methods around the process of this "study". For
example, all of the Java app's in this "study" contained poor hash
implementations. But because the tools (none of them) could see this,
that "finding" was completely ignored. The coverage was limited ONLY
to injection and data flow problems that tools have a chance of
finding. In fact, the NIST team chose only a small percentage of the
automated findings to review, since it would have taken years to
review everything due to the massive number of false positives. Get
the problem here?
I'm discouraged by SATE. I hope some of these problems are addressed
in the third study.
- Jim
_______________________________________________
Secure Coding mailing list (SC-L) SC-L@securecoding.org
List information, subscriptions, etc -
http://krvw.com/mailman/listinfo/sc-l
List charter available at - http://www.securecoding.org/list/charter.php
SC-L is hosted and moderated by KRvW Associates, LLC
(http://www.KRvW.com)
as a free, non-commercial service to the software security community.
Follow KRvW Associates on Twitter at: http://twitter.com/KRvW_Associates
_______________________________________________
--
Jim Manico
OWASP Podcast Host/Producer
OWASP ESAPI Project Manager
http://www.manico.net
_______________________________________________
Secure Coding mailing list (SC-L) SC-L@securecoding.org
List information, subscriptions, etc - http://krvw.com/mailman/listinfo/sc-l
List charter available at - http://www.securecoding.org/list/charter.php
SC-L is hosted and moderated by KRvW Associates, LLC (http://www.KRvW.com)
as a free, non-commercial service to the software security community.
Follow KRvW Associates on Twitter at: http://twitter.com/KRvW_Associates
_______________________________________________