Re: Static Analysis: proposed interchange format (firehose)
On Thu, 2013-01-17 at 13:33 +0800, Daniel Veillard wrote: On Wed, Jan 16, 2013 at 03:53:56PM -0500, David Malcolm wrote: This is a followup to my proposal in http://lists.fedoraproject.org/pipermail/devel/2012-December/175232.html I want a common output format for static analysis tools so that we can easily slurp the results from different tools into a database and have a common system for managing the results (marking false positives, having automated de-duplication, etc). (I like the name firehose for the overall system since it describes the issue we'll have of managing the flood of data). I came up with an XML format, which I've uploaded code to here: https://github.com/fedora-static-analysis/firehose Does this look sane? I think that it should be possible to write okay, taking the question from the XML side, so analysing the firehose.rng schemas driving the format. Points and remarks as i go through it: Thanks! - the cwe attribute is a number or free form ? if a number add and explicit rule to check its type. I've constrained it to be an integer as of: https://github.com/fedora-static-analysis/firehose/commit/43a50c6763f718b4c8163b645bf5ce7a328f6efa (I hope I got my RELAX-NG correct) - the sut content choice is a bit weird on one side you have text on the other you have rpm, I would still allow a free form description but in an element at the same level of rpm something like choice element name=description text/ /element element name=rpm ... element For the sake of larger usage, i would also make some room for debian, and also expand that to be able to express a given file to give an example allowing extra details there, and make some if not all of the attributes optionals, for example to be able to express independance say on the arch: sut file/usr/bin/xmllint/file package type=rpm name=libxml2 version=2.9.0 release=1.fc17 /sut so optional file element, extra type attribute, use package to not feel tied to rpm, but use a type attribute to distinguish :-) Yeah, I hadn't thought out that part of the schema very well. I've already made it optional, since I'm finding it easier to add during post-processing. I'm thinking that there are several cases: * analysis done of a source rpm * name, version, release, build architecture * what would Debian want? * analysis done of a tarball or other archive * name, url, sha1sum, build architecture * analysis done of an scm checkout (e.g. from upstream git) * kind (git, svn, etc), url * etc (what am I missing?) Some possible examples of these sut source-rpm name=python-ethtool version=0.7 release=4.fc19 build-arch=x86_64/ /sut sut tarball name=python-ethtool-0.7.tar.bz2 hash alg=sha1d8334fe3e1a9b31c8f94a4e10e516ddea617cfd2/hash /tarball /sut sut checkout scm=git url=http://git.fedorahosted.org/cgit/python-ethtool.git/tag/?id=v0.7; /checkout /sut - for notes i would separate them notes note.../note note.../note /notes since they are likely to me entered manually, and you may want to track who entered them as you go. I wasn't very clear in my posting; I'd meant these notes for extra descriptive data emitted by the static analysis tool, with a vague idea of a mini markup vocabulary for describing functions, variables, etc. My cpychecker tool has knowledge about much of the CPython C API, and knows the URLs for the API docs, so I was hoping to have some way of providing links to those docs whenever it sees an API call within a problematic function. - I would use where instead of point myself but i understand your logic too There seem to be multiple kinds of location that checkers emit: * file and line * file, line and column * file with range, expressed as a pair of the above (LLVM can emit ranges of start line/column end line/column) Long reply but overall that look mostly fine from my very narrow POV Thanks for the review Dave -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel
Re: Static Analysis: proposed interchange format (firehose)
On Wednesday, January 16, 2013 15:53:56 David Malcolm wrote: This is a followup to my proposal in http://lists.fedoraproject.org/pipermail/devel/2012-December/175232.html I want a common output format for static analysis tools so that we can easily slurp the results from different tools into a database and have a common system for managing the results (marking false positives, having automated de-duplication, etc). (I like the name firehose for the overall system since it describes the issue we'll have of managing the flood of data). I came up with an XML format, which I've uploaded code to here: https://github.com/fedora-static-analysis/firehose Does this look sane? I think that it should be possible to write converters that turn the output from other tools into this, and I think it's possible to hack up my static analyzers to emit this format. The firehose.py script is able to turn such an XML report into a text format mimicking what GCC emits, which is useful in Emacs (and probably other editors) which can parse that text format for clicking through to the underlying source code being tested. Thoughts? We usually need to maintain more metadata about the scan itself together with the results: arguments given to the analyzer, date/time the scan started/finished, total count of lines processed, hostname, mock config, etc. Also if the results are obtained by subtracting the results of an old version of the package (to report only newly introduced defects), it is good to keep metadata of both the scans. Then you can check that both of them ran with the same configuration, or prevent reporting newly added defects if the old build partially failed. Kamil -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel
Static Analysis: proposed interchange format (firehose)
This is a followup to my proposal in http://lists.fedoraproject.org/pipermail/devel/2012-December/175232.html I want a common output format for static analysis tools so that we can easily slurp the results from different tools into a database and have a common system for managing the results (marking false positives, having automated de-duplication, etc). (I like the name firehose for the overall system since it describes the issue we'll have of managing the flood of data). I came up with an XML format, which I've uploaded code to here: https://github.com/fedora-static-analysis/firehose Does this look sane? I think that it should be possible to write converters that turn the output from other tools into this, and I think it's possible to hack up my static analyzers to emit this format. The firehose.py script is able to turn such an XML report into a text format mimicking what GCC emits, which is useful in Emacs (and probably other editors) which can parse that text format for clicking through to the underlying source code being tested. Thoughts? BTW, I hope to run a hackfest on Static Analysis in Fedora at FUDCon Lawrence this weekend. Anyone around? [there are plenty of different tasks requiring different skill sets: Python scripting, web development, etc - you don't need to know about compiler internals! though that would help also :) ] Dave -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel
Re: Static Analysis: proposed interchange format (firehose)
On Wed, Jan 16, 2013 at 03:53:56PM -0500, David Malcolm wrote: This is a followup to my proposal in http://lists.fedoraproject.org/pipermail/devel/2012-December/175232.html I want a common output format for static analysis tools so that we can easily slurp the results from different tools into a database and have a common system for managing the results (marking false positives, having automated de-duplication, etc). (I like the name firehose for the overall system since it describes the issue we'll have of managing the flood of data). I came up with an XML format, which I've uploaded code to here: https://github.com/fedora-static-analysis/firehose Does this look sane? I think that it should be possible to write okay, taking the question from the XML side, so analysing the firehose.rng schemas driving the format. Points and remarks as i go through it: - the cwe attribute is a number or free form ? if a number add and explicit rule to check its type. - the sut content choice is a bit weird on one side you have text on the other you have rpm, I would still allow a free form description but in an element at the same level of rpm something like choice element name=description text/ /element element name=rpm ... element For the sake of larger usage, i would also make some room for debian, and also expand that to be able to express a given file to give an example allowing extra details there, and make some if not all of the attributes optionals, for example to be able to express independance say on the arch: sut file/usr/bin/xmllint/file package type=rpm name=libxml2 version=2.9.0 release=1.fc17 /sut so optional file element, extra type attribute, use package to not feel tied to rpm, but use a type attribute to distinguish :-) - for notes i would separate them notes note.../note note.../note /notes since they are likely to me entered manually, and you may want to track who entered them as you go. - I would use where instead of point myself but i understand your logic too Long reply but overall that look mostly fine from my very narrow POV Daniel -- Daniel Veillard | Open Source and Standards, Red Hat veill...@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/ -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel