[PATCH] D53814: Allow the analyzer to output to a SARIF file

2018-10-30 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman closed this revision.
aaron.ballman added a comment.

In https://reviews.llvm.org/D53814#1280581, @george.karpenkov wrote:

> I much prefer this version.
>  We've had the same problem with diffing plist output.
>  One thing we have learned is using a FileCheck was definitely a bad idea, as 
> it leads to unreadable, unmaintainable, and very hard to update tests,
>  so either diff or your custom tool is way better.
>
> As for the ultimate solution, I'm still not sure: I agree that maintaining 
> those `-I` flags is annoying.


We can go with this approach until we need something more complicated. I 
suspect that as we add SARIF features, we may want to bring back the Python 
script for handling things like "Does every file in the 'files' list appear 
only once and do the files listed correspond exactly to ones in the diagnostic 
locations?". Diff definitely won't handle that sort of thing.

> One option is having an extra flag to produce "stable" output, which does not 
> include absolute URLs/versions/etc.

Worth thinking about. SARIF has the ability to output relative paths as well as 
absolute paths. It also has the notion of redacted paths so that you can remove 
sensitive information from analysis reports. So there's plenty of room for 
changes here.

Thank you for the reviews! I've commit in r345628.


https://reviews.llvm.org/D53814



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D53814: Allow the analyzer to output to a SARIF file

2018-10-30 Thread George Karpenkov via Phabricator via cfe-commits
george.karpenkov accepted this revision.
george.karpenkov added a comment.
This revision is now accepted and ready to land.

I much prefer this version.
We've had the same problem with diffing plist output.
One thing we have learned is using a FileCheck was definitely a bad idea, as it 
leads to unreadable, unmaintainable, and very hard to update tests,
so either diff or your custom tool is way better.

As for the ultimate solution, I'm still not sure: I agree that maintaining 
those `-I` flags is annoying.
One option is having an extra flag to produce "stable" output, which does not 
include absolute URLs/versions/etc.


https://reviews.llvm.org/D53814



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D53814: Allow the analyzer to output to a SARIF file

2018-10-30 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added inline comments.



Comment at: Analysis/diagnostics/sarif-diagnostics-taint-test.c:18
+// Test the basics for sanity.
+// CHECK: sarifLog['version'].startswith("2.0.0")
+// CHECK: sarifLog['runs'][0]['tool']['fullName'] == "clang static analyzer"

aaron.ballman wrote:
> george.karpenkov wrote:
> > aaron.ballman wrote:
> > > george.karpenkov wrote:
> > > > Would it make more sense to just use `diff` + json pretty-formatter to 
> > > > write a test?
> > > > With this test I can't even quite figure out how the output should look 
> > > > like.
> > > I'm not super comfortable with that approach, but perhaps I'm thinking of 
> > > something different than what you're proposing. The reason I went with 
> > > this approach is because diff would be fragile (depends heavily on field 
> > > ordering, which the JSON support library doesn't give control over) and 
> > > the physical layout of the file isn't what needs to be tested anyway. 
> > > SARIF has a fair amount of optional data that can be provided as well, so 
> > > using a purely textual diff worried me that exporting additional optional 
> > > data in the future would require extensive unrelated changes to all SARIF 
> > > diffs in the test directory.
> > > 
> > > The goal for this test was to demonstrate that we can validate that the 
> > > interesting bits of information are present in the output without 
> > > worrying about the details.
> > > 
> > > Also, the python approach allows us to express relationships between data 
> > > items more easily than a textual diff tool would. I've not used that 
> > > here, but I could imagine a test where we want to check that each code 
> > > location has a corresponding file entry in another list.
> > Using a sample file + diff would have an advantage of being easier to read 
> > (just glance at the "Inputs/blah.serif" to see a sample output), and 
> > consistent with how we already do checking for plists.
> > 
> > > depends heavily on field ordering
> > 
> > Is it an issue in practice though? I would assume that JSON support library 
> > would not switch field ordering too often (and even if it does, we can have 
> > a python wrapper testing that)
> > 
> > > SARIF has a fair amount of optional data
> > 
> > Would diff `--ignore-matching-lines` be enough for those?
> Diff testing was what I originally tried and I abandoned it because it was 
> not viable. The optional data cannot be handled by ignoring matching lines 
> because the lines won't match from system to system. For instance, there are 
> absolute URIs to files included in the output, clang git hashes and version 
> numbers that will change from revision to revision, etc.
> 
> What you see as an advantage, I see as more of a distraction -- the actual 
> layout of the information in the file isn't that important so long as it 
> validates as valid SARIF (unfortunately, there are no cross-platform command 
> line tools to validate SARIF yet). What is important are just a few pieces of 
> the content information (where are the diagnostics, what source ranges and 
> paths are involved, etc) compared to the overall whole.
> 
> I can give diff testing another shot, but I was unable to find a way to make 
> `-I` work so that I could ignore everything that needs to be ignored due to 
> natural variance. Do you have ideas there?
> 
> To give you an idea of the ultimate form of the output:
> ```
> {
>   "$schema": "http://json.schemastore.org/sarif-2.0.0;,
>   "runs": [
> {
>   "files": {
> "file:///C:/Users/aballman.GRAMMATECH/Desktop/test.c": {
>   "fileLocation": {
> "uri": "file:///C:/Users/aballman.GRAMMATECH/Desktop/test.c"
>   },
>   "length": 3850,
>   "mimeType": "text/plain",
>   "roles": [
> "resultFile"
>   ]
> }
>   },
>   "results": [
> {
>   "codeFlows": [
> {
>   "threadFlows": [
> {
>   "locations": [
> {
>   "importance": "essential",
>   "location": {
> "message": {
>   "text": "Calling 'f'"
> },
> "physicalLocation": {
>   "fileLocation": {
> "uri": 
> "file:///C:/Users/aballman.GRAMMATECH/Desktop/test.c"
>   },
>   "region": {
> "endColumn": 5,
> "endLine": 119,
> "startColumn": 3,
> "startLine": 119
>   }
> }
>   },
>   "step": 1
> },
> {
>   "importance": "essential",
>   "location": {
> 

[PATCH] D53814: Allow the analyzer to output to a SARIF file

2018-10-30 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman updated this revision to Diff 171712.
aaron.ballman added a comment.

Switched to using diff for testing.


https://reviews.llvm.org/D53814

Files:
  
Analysis/diagnostics/Inputs/expected-sarif/sarif-diagnostics-taint-test.c.sarif
  Analysis/diagnostics/sarif-diagnostics-taint-test.c
  StaticAnalyzer/Core/CMakeLists.txt
  StaticAnalyzer/Core/SarifDiagnostics.cpp
  clang/StaticAnalyzer/Core/Analyses.def
  clang/StaticAnalyzer/Core/BugReporter/PathDiagnostic.h

Index: Analysis/diagnostics/sarif-diagnostics-taint-test.c
===
--- /dev/null
+++ Analysis/diagnostics/sarif-diagnostics-taint-test.c
@@ -0,0 +1,15 @@
+// RUN: %clang_analyze_cc1 -analyzer-checker=alpha.security.taint,debug.TaintTest %s -verify -analyzer-output=sarif -o - | diff -u1 -w -I ".*file:.*sarif-diagnostics-taint-test.c" -I "clang version" -I "2\.0\.0\-beta\." - %S/Inputs/expected-sarif/sarif-diagnostics-taint-test.c.sarif
+#include "../Inputs/system-header-simulator.h"
+
+int atoi(const char *nptr);
+
+void f(void) {
+  char s[80];
+  scanf("%s", s);
+  int d = atoi(s); // expected-warning {{tainted}}
+}
+
+int main(void) {
+  f();
+  return 0;
+}
Index: Analysis/diagnostics/Inputs/expected-sarif/sarif-diagnostics-taint-test.c.sarif
===
--- /dev/null
+++ Analysis/diagnostics/Inputs/expected-sarif/sarif-diagnostics-taint-test.c.sarif
@@ -0,0 +1,99 @@
+{
+  "$schema": "http://json.schemastore.org/sarif-2.0.0;,
+  "runs": [
+{
+  "files": {
+"file:sarif-diagnostics-taint-test.c": {
+  "fileLocation": {
+"uri": "file:sarif-diagnostics-taint-test.c"
+  },
+  "length": 500,
+  "mimeType": "text/plain",
+  "roles": [
+"resultFile"
+  ]
+}
+  },
+  "results": [
+{
+  "codeFlows": [
+{
+  "threadFlows": [
+{
+  "locations": [
+{
+  "importance": "essential",
+  "location": {
+"message": {
+  "text": "Calling 'f'"
+},
+"physicalLocation": {
+  "fileLocation": {
+"uri": "file:sarif-diagnostics-taint-test.c"
+  },
+  "region": {
+"endColumn": 5,
+"endLine": 13,
+"startColumn": 3,
+"startLine": 13
+  }
+}
+  },
+  "step": 1
+},
+{
+  "importance": "essential",
+  "location": {
+"message": {
+  "text": "tainted"
+},
+"physicalLocation": {
+  "fileLocation": {
+"uri": "file:sarif-diagnostics-taint-test.c"
+  },
+  "region": {
+"endColumn": 17,
+"endLine": 9,
+"startColumn": 11,
+"startLine": 9
+  }
+}
+  },
+  "step": 2
+}
+  ]
+}
+  ]
+}
+  ],
+  "locations": [
+{
+  "physicalLocation": {
+"fileLocation": {
+  "uri": "file:sarif-diagnostics-taint-test.c"
+},
+"region": {
+  "endColumn": 17,
+  "endLine": 9,
+  "startColumn": 11,
+  "startLine": 9
+}
+  }
+}
+  ],
+  "message": {
+"text": "tainted"
+  },
+  "ruleId": "debug.TaintTest"
+}
+  ],
+  "tool": {
+"fullName": "clang static analyzer",
+"language": "en-US",
+"name": "clang",
+"version": "clang version 8.0.0 (https://github.com/llvm-project/clang.git a5ccb257a7a70928ede717a7c282f5fc8cbed310) (https://github.com/llvm-mirror/llvm.git 73cebd79c512f7129eca16b0f3a7abd21d2881e8)"
+  }
+}
+  ],
+  "version": "2.0.0-beta.2018-09-26"
+}
Index: StaticAnalyzer/Core/SarifDiagnostics.cpp
===
--- /dev/null
+++ StaticAnalyzer/Core/SarifDiagnostics.cpp
@@ -0,0 +1,270 @@
+//===--- SarifDiagnostics.cpp - Sarif Diagnostics for Paths -*- C++ -*-===//
+//
+// The LLVM Compiler Infrastructure
+//
+// This file is distributed under 

[PATCH] D53814: Allow the analyzer to output to a SARIF file

2018-10-30 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman updated this revision to Diff 171685.
aaron.ballman added a comment.

Updated based on review feedback -- removed the `Sarif` path generation scheme 
as it isn't currently needed, and replaced a FIXME with a better comment.

Testing remains an open question, however.


https://reviews.llvm.org/D53814

Files:
  Analysis/diagnostics/sarif-check.py
  Analysis/diagnostics/sarif-diagnostics-taint-test.c
  StaticAnalyzer/Core/CMakeLists.txt
  StaticAnalyzer/Core/SarifDiagnostics.cpp
  clang/StaticAnalyzer/Core/Analyses.def
  clang/StaticAnalyzer/Core/BugReporter/PathDiagnostic.h

Index: Analysis/diagnostics/sarif-diagnostics-taint-test.c
===
--- /dev/null
+++ Analysis/diagnostics/sarif-diagnostics-taint-test.c
@@ -0,0 +1,32 @@
+// RUN: %clang_analyze_cc1 -analyzer-checker=alpha.security.taint,debug.TaintTest %s -verify -analyzer-output=sarif -o %t.sarif | %python %S/sarif-check.py %s %t.sarif
+#include "../Inputs/system-header-simulator.h"
+
+int atoi(const char *nptr);
+
+void f(void) {
+  char s[80];
+  scanf("%s", s);
+  int d = atoi(s); // expected-warning {{tainted}}
+}
+
+int main(void) {
+  f();
+  return 0;
+}
+
+// Test the basics for sanity.
+// CHECK: sarifLog['version'].startswith("2.0.0")
+// CHECK: sarifLog['runs'][0]['tool']['fullName'] == "clang static analyzer"
+// CHECK: sarifLog['runs'][0]['tool']['name'] == "clang"
+// CHECK: sarifLog['runs'][0]['tool']['language'] == "en-US"
+
+// Test the specifics of this taint test.
+// CHECK: len(result(0)['codeFlows'][0]['threadFlows'][0]['locations']) == 2
+// CHECK: flow(0)['locations'][0]['step'] == 1
+// CHECK: flow(0)['locations'][0]['importance'] == "essential"
+// CHECK: flow(0)['locations'][0]['location']['message']['text'] == "Calling 'f'"
+// CHECK: flow(0)['locations'][0]['location']['physicalLocation']['region']['startLine'] == 13
+// CHECK: flow(0)['locations'][1]['step'] == 2
+// CHECK: flow(0)['locations'][1]['importance'] == "essential"
+// CHECK: flow(0)['locations'][1]['location']['message']['text'] == "tainted"
+// CHECK: flow(0)['locations'][1]['location']['physicalLocation']['region']['startLine'] == 9
Index: Analysis/diagnostics/sarif-check.py
===
--- /dev/null
+++ Analysis/diagnostics/sarif-check.py
@@ -0,0 +1,61 @@
+#!/usr/bin/env python
+
+import sys
+import subprocess
+import traceback
+import json
+
+testfile = sys.argv[1]
+with open(sys.argv[2]) as datafh:
+data = json.load(datafh)
+
+prefix = "CHECK: "
+
+def sarifLogFirstResult(idx):
+return data['runs'][0]['results'][idx]
+
+def threadFlow(idx):
+return data['runs'][0]['results'][0]['codeFlows'][0]['threadFlows'][idx]
+
+fails = 0
+passes = 0
+with open(testfile) as testfh:
+lineno = 0
+for line in iter(testfh.readline, ""):
+lineno += 1
+line = line.rstrip("\r\n")
+try:
+prefix_pos = line.index(prefix)
+except ValueError:
+continue
+check_expr = line[prefix_pos + len(prefix):]
+
+try:
+exception = None
+result = eval(check_expr, {"sarifLog":data,
+   "result":sarifLogFirstResult,
+   "flow":threadFlow})
+except Exception:
+result = False
+exception = traceback.format_exc().splitlines()[-1]
+
+if exception is not None:
+sys.stderr.write(
+"{file}:{line:d}: check threw exception: {expr}\n"
+"{file}:{line:d}: exception was: {exception}\n".format(
+file=testfile, line=lineno,
+expr=check_expr, exception=exception))
+fails += 1
+elif not result:
+sys.stderr.write(
+"{file}:{line:d}: check returned False: {expr}\n".format(
+file=testfile, line=lineno, expr=check_expr))
+fails += 1
+else:
+passes += 1
+
+if fails != 0:
+sys.exit("{} checks failed".format(fails))
+else:
+sys.stdout.write("{} checks passed\n".format(passes))
+
Index: StaticAnalyzer/Core/SarifDiagnostics.cpp
===
--- /dev/null
+++ StaticAnalyzer/Core/SarifDiagnostics.cpp
@@ -0,0 +1,270 @@
+//===--- SarifDiagnostics.cpp - Sarif Diagnostics for Paths -*- C++ -*-===//
+//
+// The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===--===//
+//
+//  This file defines the SarifDiagnostics object.
+//
+//===--===//
+
+#include "clang/Basic/Version.h"
+#include "clang/Lex/Preprocessor.h"
+#include "clang/StaticAnalyzer/Core/AnalyzerOptions.h"

[PATCH] D53814: Allow the analyzer to output to a SARIF file

2018-10-30 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman marked 9 inline comments as done.
aaron.ballman added inline comments.



Comment at: Analysis/diagnostics/sarif-diagnostics-taint-test.c:18
+// Test the basics for sanity.
+// CHECK: sarifLog['version'].startswith("2.0.0")
+// CHECK: sarifLog['runs'][0]['tool']['fullName'] == "clang static analyzer"

george.karpenkov wrote:
> aaron.ballman wrote:
> > george.karpenkov wrote:
> > > Would it make more sense to just use `diff` + json pretty-formatter to 
> > > write a test?
> > > With this test I can't even quite figure out how the output should look 
> > > like.
> > I'm not super comfortable with that approach, but perhaps I'm thinking of 
> > something different than what you're proposing. The reason I went with this 
> > approach is because diff would be fragile (depends heavily on field 
> > ordering, which the JSON support library doesn't give control over) and the 
> > physical layout of the file isn't what needs to be tested anyway. SARIF has 
> > a fair amount of optional data that can be provided as well, so using a 
> > purely textual diff worried me that exporting additional optional data in 
> > the future would require extensive unrelated changes to all SARIF diffs in 
> > the test directory.
> > 
> > The goal for this test was to demonstrate that we can validate that the 
> > interesting bits of information are present in the output without worrying 
> > about the details.
> > 
> > Also, the python approach allows us to express relationships between data 
> > items more easily than a textual diff tool would. I've not used that here, 
> > but I could imagine a test where we want to check that each code location 
> > has a corresponding file entry in another list.
> Using a sample file + diff would have an advantage of being easier to read 
> (just glance at the "Inputs/blah.serif" to see a sample output), and 
> consistent with how we already do checking for plists.
> 
> > depends heavily on field ordering
> 
> Is it an issue in practice though? I would assume that JSON support library 
> would not switch field ordering too often (and even if it does, we can have a 
> python wrapper testing that)
> 
> > SARIF has a fair amount of optional data
> 
> Would diff `--ignore-matching-lines` be enough for those?
Diff testing was what I originally tried and I abandoned it because it was not 
viable. The optional data cannot be handled by ignoring matching lines because 
the lines won't match from system to system. For instance, there are absolute 
URIs to files included in the output, clang git hashes and version numbers that 
will change from revision to revision, etc.

What you see as an advantage, I see as more of a distraction -- the actual 
layout of the information in the file isn't that important so long as it 
validates as valid SARIF (unfortunately, there are no cross-platform command 
line tools to validate SARIF yet). What is important are just a few pieces of 
the content information (where are the diagnostics, what source ranges and 
paths are involved, etc) compared to the overall whole.

I can give diff testing another shot, but I was unable to find a way to make 
`-I` work so that I could ignore everything that needs to be ignored due to 
natural variance. Do you have ideas there?

To give you an idea of the ultimate form of the output:
```
{
  "$schema": "http://json.schemastore.org/sarif-2.0.0;,
  "runs": [
{
  "files": {
"file:///C:/Users/aballman.GRAMMATECH/Desktop/test.c": {
  "fileLocation": {
"uri": "file:///C:/Users/aballman.GRAMMATECH/Desktop/test.c"
  },
  "length": 3850,
  "mimeType": "text/plain",
  "roles": [
"resultFile"
  ]
}
  },
  "results": [
{
  "codeFlows": [
{
  "threadFlows": [
{
  "locations": [
{
  "importance": "essential",
  "location": {
"message": {
  "text": "Calling 'f'"
},
"physicalLocation": {
  "fileLocation": {
"uri": 
"file:///C:/Users/aballman.GRAMMATECH/Desktop/test.c"
  },
  "region": {
"endColumn": 5,
"endLine": 119,
"startColumn": 3,
"startLine": 119
  }
}
  },
  "step": 1
},
{
  "importance": "essential",
  "location": {
"message": {
  "text": "tainted"
},
"physicalLocation": {
  

[PATCH] D53814: Allow the analyzer to output to a SARIF file

2018-10-29 Thread George Karpenkov via Phabricator via cfe-commits
george.karpenkov requested changes to this revision.
george.karpenkov added a comment.
This revision now requires changes to proceed.

I don't think a new PathGenerationScheme is needed, unless you plan changes to 
BugReporter.cpp.

The code is fine otherwise, but could we try to use `diff` for testing?




Comment at: Analysis/diagnostics/sarif-diagnostics-taint-test.c:18
+// Test the basics for sanity.
+// CHECK: sarifLog['version'].startswith("2.0.0")
+// CHECK: sarifLog['runs'][0]['tool']['fullName'] == "clang static analyzer"

aaron.ballman wrote:
> george.karpenkov wrote:
> > Would it make more sense to just use `diff` + json pretty-formatter to 
> > write a test?
> > With this test I can't even quite figure out how the output should look 
> > like.
> I'm not super comfortable with that approach, but perhaps I'm thinking of 
> something different than what you're proposing. The reason I went with this 
> approach is because diff would be fragile (depends heavily on field ordering, 
> which the JSON support library doesn't give control over) and the physical 
> layout of the file isn't what needs to be tested anyway. SARIF has a fair 
> amount of optional data that can be provided as well, so using a purely 
> textual diff worried me that exporting additional optional data in the future 
> would require extensive unrelated changes to all SARIF diffs in the test 
> directory.
> 
> The goal for this test was to demonstrate that we can validate that the 
> interesting bits of information are present in the output without worrying 
> about the details.
> 
> Also, the python approach allows us to express relationships between data 
> items more easily than a textual diff tool would. I've not used that here, 
> but I could imagine a test where we want to check that each code location has 
> a corresponding file entry in another list.
Using a sample file + diff would have an advantage of being easier to read 
(just glance at the "Inputs/blah.serif" to see a sample output), and consistent 
with how we already do checking for plists.

> depends heavily on field ordering

Is it an issue in practice though? I would assume that JSON support library 
would not switch field ordering too often (and even if it does, we can have a 
python wrapper testing that)

> SARIF has a fair amount of optional data

Would diff `--ignore-matching-lines` be enough for those?



Comment at: StaticAnalyzer/Core/SarifDiagnostics.cpp:69
+return std::string(, 1);
+  return "%" + llvm::toHex(StringRef(, 1));
+}

+1, I would use this in other consumers.



Comment at: clang/StaticAnalyzer/Core/BugReporter/PathDiagnostic.h:128
+/// Used for SARIF output.
+Sarif,
   };

Do you actually need a new generation scheme here?
I'm pretty sure that using "Minimal" would give you the same effect.


https://reviews.llvm.org/D53814



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D53814: Allow the analyzer to output to a SARIF file

2018-10-29 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman updated this revision to Diff 171545.
aaron.ballman marked 3 inline comments as done.
aaron.ballman added a comment.

Updated based on initial review feedback, and added more context to the patch.


https://reviews.llvm.org/D53814

Files:
  Analysis/diagnostics/sarif-check.py
  Analysis/diagnostics/sarif-diagnostics-taint-test.c
  StaticAnalyzer/Core/CMakeLists.txt
  StaticAnalyzer/Core/SarifDiagnostics.cpp
  clang/StaticAnalyzer/Core/Analyses.def
  clang/StaticAnalyzer/Core/BugReporter/PathDiagnostic.h

Index: Analysis/diagnostics/sarif-diagnostics-taint-test.c
===
--- /dev/null
+++ Analysis/diagnostics/sarif-diagnostics-taint-test.c
@@ -0,0 +1,32 @@
+// RUN: %clang_analyze_cc1 -analyzer-checker=alpha.security.taint,debug.TaintTest %s -verify -analyzer-output=sarif -o %t.sarif | %python %S/sarif-check.py %s %t.sarif
+#include "../Inputs/system-header-simulator.h"
+
+int atoi(const char *nptr);
+
+void f(void) {
+  char s[80];
+  scanf("%s", s);
+  int d = atoi(s); // expected-warning {{tainted}}
+}
+
+int main(void) {
+  f();
+  return 0;
+}
+
+// Test the basics for sanity.
+// CHECK: sarifLog['version'].startswith("2.0.0")
+// CHECK: sarifLog['runs'][0]['tool']['fullName'] == "clang static analyzer"
+// CHECK: sarifLog['runs'][0]['tool']['name'] == "clang"
+// CHECK: sarifLog['runs'][0]['tool']['language'] == "en-US"
+
+// Test the specifics of this taint test.
+// CHECK: len(result(0)['codeFlows'][0]['threadFlows'][0]['locations']) == 2
+// CHECK: flow(0)['locations'][0]['step'] == 1
+// CHECK: flow(0)['locations'][0]['importance'] == "essential"
+// CHECK: flow(0)['locations'][0]['location']['message']['text'] == "Calling 'f'"
+// CHECK: flow(0)['locations'][0]['location']['physicalLocation']['region']['startLine'] == 13
+// CHECK: flow(0)['locations'][1]['step'] == 2
+// CHECK: flow(0)['locations'][1]['importance'] == "essential"
+// CHECK: flow(0)['locations'][1]['location']['message']['text'] == "tainted"
+// CHECK: flow(0)['locations'][1]['location']['physicalLocation']['region']['startLine'] == 9
Index: Analysis/diagnostics/sarif-check.py
===
--- /dev/null
+++ Analysis/diagnostics/sarif-check.py
@@ -0,0 +1,61 @@
+#!/usr/bin/env python
+
+import sys
+import subprocess
+import traceback
+import json
+
+testfile = sys.argv[1]
+with open(sys.argv[2]) as datafh:
+data = json.load(datafh)
+
+prefix = "CHECK: "
+
+def sarifLogFirstResult(idx):
+return data['runs'][0]['results'][idx]
+
+def threadFlow(idx):
+return data['runs'][0]['results'][0]['codeFlows'][0]['threadFlows'][idx]
+
+fails = 0
+passes = 0
+with open(testfile) as testfh:
+lineno = 0
+for line in iter(testfh.readline, ""):
+lineno += 1
+line = line.rstrip("\r\n")
+try:
+prefix_pos = line.index(prefix)
+except ValueError:
+continue
+check_expr = line[prefix_pos + len(prefix):]
+
+try:
+exception = None
+result = eval(check_expr, {"sarifLog":data,
+   "result":sarifLogFirstResult,
+   "flow":threadFlow})
+except Exception:
+result = False
+exception = traceback.format_exc().splitlines()[-1]
+
+if exception is not None:
+sys.stderr.write(
+"{file}:{line:d}: check threw exception: {expr}\n"
+"{file}:{line:d}: exception was: {exception}\n".format(
+file=testfile, line=lineno,
+expr=check_expr, exception=exception))
+fails += 1
+elif not result:
+sys.stderr.write(
+"{file}:{line:d}: check returned False: {expr}\n".format(
+file=testfile, line=lineno, expr=check_expr))
+fails += 1
+else:
+passes += 1
+
+if fails != 0:
+sys.exit("{} checks failed".format(fails))
+else:
+sys.stdout.write("{} checks passed\n".format(passes))
+
Index: StaticAnalyzer/Core/SarifDiagnostics.cpp
===
--- /dev/null
+++ StaticAnalyzer/Core/SarifDiagnostics.cpp
@@ -0,0 +1,267 @@
+//===--- SarifDiagnostics.cpp - Sarif Diagnostics for Paths -*- C++ -*-===//
+//
+// The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===--===//
+//
+//  This file defines the SarifDiagnostics object.
+//
+//===--===//
+
+#include "clang/Basic/Version.h"
+#include "clang/Lex/Preprocessor.h"
+#include "clang/StaticAnalyzer/Core/AnalyzerOptions.h"
+#include "clang/StaticAnalyzer/Core/BugReporter/PathDiagnostic.h"

[PATCH] D53814: Allow the analyzer to output to a SARIF file

2018-10-29 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added inline comments.



Comment at: Analysis/diagnostics/sarif-check.py:22
+passes = 0
+with open(testfile) as testfh:
+lineno = 0

george.karpenkov wrote:
> Wow, this is super neat!
> Since you are quite active in LLVM community, would you think it's better to 
> have this tool in `llvm/utils` next to FileCheck? The concept is very 
> general, and I think a lot of people really feel FileCheck limitations.
The concept was pulled from test\TableGen\JSON-check.py, so it likely could be 
generalized. However, I suspect that each JSON test may want to expose 
different helper capabilities, so perhaps it won't be super general? I don't 
know enough about good Python design to know.



Comment at: Analysis/diagnostics/sarif-diagnostics-taint-test.c:18
+// Test the basics for sanity.
+// CHECK: sarifLog['version'].startswith("2.0.0")
+// CHECK: sarifLog['runs'][0]['tool']['fullName'] == "clang static analyzer"

george.karpenkov wrote:
> Would it make more sense to just use `diff` + json pretty-formatter to write 
> a test?
> With this test I can't even quite figure out how the output should look like.
I'm not super comfortable with that approach, but perhaps I'm thinking of 
something different than what you're proposing. The reason I went with this 
approach is because diff would be fragile (depends heavily on field ordering, 
which the JSON support library doesn't give control over) and the physical 
layout of the file isn't what needs to be tested anyway. SARIF has a fair 
amount of optional data that can be provided as well, so using a purely textual 
diff worried me that exporting additional optional data in the future would 
require extensive unrelated changes to all SARIF diffs in the test directory.

The goal for this test was to demonstrate that we can validate that the 
interesting bits of information are present in the output without worrying 
about the details.

Also, the python approach allows us to express relationships between data items 
more easily than a textual diff tool would. I've not used that here, but I 
could imagine a test where we want to check that each code location has a 
corresponding file entry in another list.



Comment at: StaticAnalyzer/Core/SarifDiagnostics.cpp:74
+  // Add the rest of the path components, encoding any reserved characters.
+  std::for_each(std::next(sys::path::begin(Filename)), 
sys::path::end(Filename),
+[](StringRef Component) {

george.karpenkov wrote:
> Nitpicking style, but I don't see why for-each loop, preferably with a range 
> wrapping the iterators would not be more readable.
I tend to prefer using algorithms when the logic is simple -- it makes it more 
clear that the loop is an unimportant detail. I don't have strong opinions on 
this particular loop, however.



Comment at: StaticAnalyzer/Core/SarifDiagnostics.cpp:88
+  // URI encode the part.
+  for (char C : Component) {
+// RFC 3986 claims alpha, numeric, and this handful of

ZaMaZaN4iK wrote:
> I don't know about const corectness policy in LLVM. But I prefer here const 
> char C .
We typically do not put top-level `const` on locals, so I'd prefer to leave it 
off here rather than be inconsistent.



Comment at: StaticAnalyzer/Core/SarifDiagnostics.cpp:182
+  case PathDiagnosticPiece::Kind::Note:
+// FIXME: What should be reported here?
+break;

george.karpenkov wrote:
> "Note" are notes which do not have to be attached to a particular path 
> element.
Good to know!



Comment at: StaticAnalyzer/Core/SarifDiagnostics.cpp:243
+
+  llvm::for_each(Diags, [&](const PathDiagnostic *D) {
+Results.push_back(createResult(*D, Files));

george.karpenkov wrote:
> I like closures, but what's wrong with just using a `for` loop here?
Same as above: clarity of exposition. This one I'd feel pretty strongly about 
keeping as an algorithm given how trivial the loop body is.



Comment at: StaticAnalyzer/Core/SarifDiagnostics.cpp:254
+std::vector , FilesMade *) {
+  // FIXME: if the file already exists, do we overwrite it with a single run,
+  // or do we append a run into the file if it's a valid SARIF log?

george.karpenkov wrote:
> Usually we overwrite the file and note that on stderr in such cases.
We took the decision internally to overwrite as well, but the SARIF format 
allows for multiple runs within the same output file (so you can compare 
analysis results for the same project over time). I think this will eventually 
have to be user-controlled because I can see some users wanting to append to 
the run and others wanting to overwrite. However, these log files can become 
quite large in practice (GBs of data), so "read in the JSON and add to it" may 
be implausible, hence why I 

[PATCH] D53814: Allow the analyzer to output to a SARIF file

2018-10-29 Thread George Karpenkov via Phabricator via cfe-commits
george.karpenkov requested changes to this revision.
george.karpenkov added a comment.
This revision now requires changes to proceed.

Minor style notes + context missing.

I think using `diff` would be better than a custom python tool.


https://reviews.llvm.org/D53814



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D53814: Allow the analyzer to output to a SARIF file

2018-10-29 Thread George Karpenkov via Phabricator via cfe-commits
george.karpenkov added a comment.

Patch context is missing.




Comment at: Analysis/diagnostics/sarif-check.py:22
+passes = 0
+with open(testfile) as testfh:
+lineno = 0

Wow, this is super neat!
Since you are quite active in LLVM community, would you think it's better to 
have this tool in `llvm/utils` next to FileCheck? The concept is very general, 
and I think a lot of people really feel FileCheck limitations.



Comment at: Analysis/diagnostics/sarif-diagnostics-taint-test.c:18
+// Test the basics for sanity.
+// CHECK: sarifLog['version'].startswith("2.0.0")
+// CHECK: sarifLog['runs'][0]['tool']['fullName'] == "clang static analyzer"

Would it make more sense to just use `diff` + json pretty-formatter to write a 
test?
With this test I can't even quite figure out how the output should look like.



Comment at: StaticAnalyzer/Core/SarifDiagnostics.cpp:74
+  // Add the rest of the path components, encoding any reserved characters.
+  std::for_each(std::next(sys::path::begin(Filename)), 
sys::path::end(Filename),
+[](StringRef Component) {

Nitpicking style, but I don't see why for-each loop, preferably with a range 
wrapping the iterators would not be more readable.



Comment at: StaticAnalyzer/Core/SarifDiagnostics.cpp:182
+  case PathDiagnosticPiece::Kind::Note:
+// FIXME: What should be reported here?
+break;

"Note" are notes which do not have to be attached to a particular path element.



Comment at: StaticAnalyzer/Core/SarifDiagnostics.cpp:243
+
+  llvm::for_each(Diags, [&](const PathDiagnostic *D) {
+Results.push_back(createResult(*D, Files));

I like closures, but what's wrong with just using a `for` loop here?



Comment at: StaticAnalyzer/Core/SarifDiagnostics.cpp:254
+std::vector , FilesMade *) {
+  // FIXME: if the file already exists, do we overwrite it with a single run,
+  // or do we append a run into the file if it's a valid SARIF log?

Usually we overwrite the file and note that on stderr in such cases.


https://reviews.llvm.org/D53814



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D53814: Allow the analyzer to output to a SARIF file

2018-10-29 Thread Alexander Zaitsev via Phabricator via cfe-commits
ZaMaZaN4iK added inline comments.



Comment at: StaticAnalyzer/Core/CMakeLists.txt:43
   PlistDiagnostics.cpp
+  SarifDiagnostics.cpp
   ProgramState.cpp

Sort alphabetically



Comment at: StaticAnalyzer/Core/SarifDiagnostics.cpp:88
+  // URI encode the part.
+  for (char C : Component) {
+// RFC 3986 claims alpha, numeric, and this handful of

I don't know about const corectness policy in LLVM. But I prefer here const 
char C .



Comment at: StaticAnalyzer/Core/SarifDiagnostics.cpp:94
+// reserved character.
+if (llvm::isAlnum(C) ||
+StringRef::npos !=

Probably this piece of code will be better to write separately as function


https://reviews.llvm.org/D53814



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D53814: Allow the analyzer to output to a SARIF file

2018-10-29 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman created this revision.
aaron.ballman added reviewers: dcoughlin, zaks.anna.
Herald added subscribers: dkrupp, donat.nagy, Szelethus, a.sidorin, mgorny.
Herald added a reviewer: george.karpenkov.

SARIF (https://github.com/oasis-tcs/sarif-spec) is a new draft standard 
interchange format for static analysis results that allows result viewers to be 
decoupled from the tool producing the analysis results. This patch allows users 
to specify SARIF as the output from the clang static analyzer so that the 
results can be read in by other tools. There are several such tools for 
consuming SARIF, such as extensions to Visual Studio and VSCode, as well as 
static analyzers like CodeSonar.

SARIF is JSON-based and the latest provisional specification can be found at: 
https://github.com/oasis-tcs/sarif-spec/blob/master/Documents/ProvisionalDrafts/sarif-v2.0-csd02-provisional.docx.
 GrammaTech sponsored the work to produce this patch and we will make any 
necessary changes if the draft standard changes before publication.


https://reviews.llvm.org/D53814

Files:
  Analysis/diagnostics/sarif-check.py
  Analysis/diagnostics/sarif-diagnostics-taint-test.c
  StaticAnalyzer/Core/CMakeLists.txt
  StaticAnalyzer/Core/SarifDiagnostics.cpp
  clang/StaticAnalyzer/Core/Analyses.def
  clang/StaticAnalyzer/Core/BugReporter/PathDiagnostic.h

Index: Analysis/diagnostics/sarif-diagnostics-taint-test.c
===
--- /dev/null
+++ Analysis/diagnostics/sarif-diagnostics-taint-test.c
@@ -0,0 +1,32 @@
+// RUN: %clang_analyze_cc1 -analyzer-checker=alpha.security.taint,debug.TaintTest %s -verify -analyzer-output=sarif -o %t.sarif | %python %S/sarif-check.py %s %t.sarif
+#include "../Inputs/system-header-simulator.h"
+
+int atoi(const char *nptr);
+
+void f(void) {
+  char s[80];
+  scanf("%s", s);
+  int d = atoi(s); // expected-warning {{tainted}}
+}
+
+int main(void) {
+  f();
+  return 0;
+}
+
+// Test the basics for sanity.
+// CHECK: sarifLog['version'].startswith("2.0.0")
+// CHECK: sarifLog['runs'][0]['tool']['fullName'] == "clang static analyzer"
+// CHECK: sarifLog['runs'][0]['tool']['name'] == "clang"
+// CHECK: sarifLog['runs'][0]['tool']['language'] == "en-US"
+
+// Test the specifics of this taint test.
+// CHECK: len(result(0)['codeFlows'][0]['threadFlows'][0]['locations']) == 2
+// CHECK: flow(0)['locations'][0]['step'] == 1
+// CHECK: flow(0)['locations'][0]['importance'] == "essential"
+// CHECK: flow(0)['locations'][0]['location']['message']['text'] == "Calling 'f'"
+// CHECK: flow(0)['locations'][0]['location']['physicalLocation']['region']['startLine'] == 13
+// CHECK: flow(0)['locations'][1]['step'] == 2
+// CHECK: flow(0)['locations'][1]['importance'] == "essential"
+// CHECK: flow(0)['locations'][1]['location']['message']['text'] == "tainted"
+// CHECK: flow(0)['locations'][1]['location']['physicalLocation']['region']['startLine'] == 9
Index: Analysis/diagnostics/sarif-check.py
===
--- /dev/null
+++ Analysis/diagnostics/sarif-check.py
@@ -0,0 +1,61 @@
+#!/usr/bin/env python
+
+import sys
+import subprocess
+import traceback
+import json
+
+testfile = sys.argv[1]
+with open(sys.argv[2]) as datafh:
+data = json.load(datafh)
+
+prefix = "CHECK: "
+
+def sarifLogFirstResult(idx):
+return data['runs'][0]['results'][idx]
+
+def threadFlow(idx):
+return data['runs'][0]['results'][0]['codeFlows'][0]['threadFlows'][idx]
+
+fails = 0
+passes = 0
+with open(testfile) as testfh:
+lineno = 0
+for line in iter(testfh.readline, ""):
+lineno += 1
+line = line.rstrip("\r\n")
+try:
+prefix_pos = line.index(prefix)
+except ValueError:
+continue
+check_expr = line[prefix_pos + len(prefix):]
+
+try:
+exception = None
+result = eval(check_expr, {"sarifLog":data,
+   "result":sarifLogFirstResult,
+   "flow":threadFlow})
+except Exception:
+result = False
+exception = traceback.format_exc().splitlines()[-1]
+
+if exception is not None:
+sys.stderr.write(
+"{file}:{line:d}: check threw exception: {expr}\n"
+"{file}:{line:d}: exception was: {exception}\n".format(
+file=testfile, line=lineno,
+expr=check_expr, exception=exception))
+fails += 1
+elif not result:
+sys.stderr.write(
+"{file}:{line:d}: check returned False: {expr}\n".format(
+file=testfile, line=lineno, expr=check_expr))
+fails += 1
+else:
+passes += 1
+
+if fails != 0:
+sys.exit("{} checks failed".format(fails))
+else:
+sys.stdout.write("{} checks passed\n".format(passes))
+
Index: StaticAnalyzer/Core/SarifDiagnostics.cpp