[PATCH] D70411: [analyzer] CERT: StrChecker: 31.c

2019-12-09 Thread Artem Dergachev via Phabricator via cfe-commits
NoQ added a comment.

Ok, so, what should the checker warn on? What should be the intended state 
machine for this checker?

We basically have two possible categories of answers to this question:

- "The checker warns every time it finds an execution path on which unintended 
behavior occurs" - describe unintended behavior (say, "out-of-bounds buffer 
access") and define the checker in such a way that all warnings describe actual 
bugs of this kind (in our example, out-of-bound buffer accesses) as long as the 
rest of the static analyzer works correctly.
- Or, you could say "The checker enforces a certain coding guideline on the 
user" - and then you should describe the guideline in a very short and 
easy-to-understand manner (say, "don't pass the string by pointer anywhere 
before you null-terminate it"), and then make checker check exactly this 
guideline. You may introduce some suppressions for intentional violations of 
the guideline, but the guideline itself should be simple, it should be always 
possible to follow it, and it should sound nice enough for developers to feel 
good about themselves when they follow it.

> - The problem is not the size, but the missing `'\0'` (which you can have 
> multiple of at any point).

The caption of the CERT rule suggests that both of these are problematic.

In D70411#1773703 , @Charusso wrote:

> In D70411#1769786 , @NoQ wrote:
>
> > So i suggest that we make one more step back and agree upon what exactly 
> > are we checking here. I.e., basically, agree on the tests. Because right 
> > now it looks like you're trying to blindly generalize over the examples: 
> > //"The CERT example has a `strcpy()` into a fixed-size buffer, let's warn 
> > on all `strcpy()`s into fixed-size buffers!"//.
>
>
> Just for clarification, I made the entire `MemoryBlockRegion` stuff because I 
> do not care how do you allocate a memory block (even a `string` is counted as 
> that).


Yup, i mean, you can go as far as you want with generalizing over "fixed size 
buffer" in this approach, but what i'm saying is that you're still not 
addressing the whole train of thought about the length of the source string.

In D70411#1773703 , @Charusso wrote:

> > F10967277: ftp.c_8cb07866de61ef56be82135a4d3a5b7e.plist.html 
> > 
> >  The source string is an IP address and port, which has a known limit on 
> > the number of digits it can have.
>
> The known size does not mean that the string is going to be null-terminated.


It does, because `strcpy` is guaranteed to null-terminate as long as it has 
enough storage capacity.

>> For example, in the example titled "Noncompliant Code Example (argv)", it is 
>> explicitly stated that the problem is not only that the buffer is fixed-size 
>> and `strcpy()` is made, but also that the original string //is controlled by 
>> the attacker//. The right analysis to catch such issues is //taint 
>> analysis//. It's a typical taint problem. I honestly believe you should try 
>> to solve it by combining the taint checker with the string checker: "if 
>> string length metadata is tainted (in particular, if the string itself is 
>> tainted) and is potentially larger than the destination buffer size (eg., 
>> unconstrained), warn". With the recent advancements in the development of 
>> the taint checker, i think you can get pretty far this way without 
>> constantly struggling with false positives.
> 
> This checker indeed would require an additional taint analysis, but as we do 
> not have a non-alpha variant, I am fine to call every non-null-terminated 
> string reading an issue (as it is), whatever is the source. That is a 
> generalization, yes, in the best possible manner what we have got today. Also 
> I am not a big fan of relying on non-alpha stuff and I am sure I have defined 
> my checker enough well to catch real issues.

Checks that are part of the generic taint checker are currently in a pretty bad 
shape, but the taint analysis itself is pretty good, and checks that rely on 
taint but aren't part of the generic taint checker are also pretty good. I 
actually believe taint analysis could be enabled by default as soon as somebody 
goes through the broken checks and disables/removes them. If you rely on the 
existing taint analysis infrastructure and make a good check, that'll be 
wonderful and would further encourage us to move taint analysis out of alpha.

> - We heavily need to swap the value-tracking with `NoteTags` to make this 
> useful.

Before i forget, i'm opposed to explaining problems as notes. We should emit a 
warning as soon as we've reached far enough on the execution path to conclude 
that there's a problem.




Comment at: clang/include/clang/StaticAnalyzer/Checkers/Checkers.td:798
+
+def Str31cChecker : Checker<"31.c">,
+  HelpText<"SEI CERT checker of rules defined 

[PATCH] D70411: [analyzer] CERT: StrChecker: 31.c

2019-12-06 Thread Csaba Dabis via Phabricator via cfe-commits
Charusso added a comment.

Please let us measure your examples at first to have a consensus what we would 
like to see from this checker.

In D70411#1769803 , @NoQ wrote:

> In D70411#1769628 , @Charusso wrote:
>
> > F10966837: str31-c.tar.gz 
>
>
> These reports seem to confirm my point. I took a look at first 5 and in all 
> of them the code does in fact ensure that the buffer space is sufficient, but 
> in 5 different ways. You most likely won't be able to enumerate all possible 
> ways in which the user can ensure that the destination buffer size is 
> sufficient.


I believe I can enhance the checker with the help of `NoteTags` and give more 
information for the user why the buffer space is Not sufficient. The size 
property's upper-bound is not that difficult to measure, I think we can 
implement this. For now, let me analyze each report.

---

> F10967274: content_encoding.c_5ae4f30ce29f14441139e7bbe20eeaaa.plist.html 
> 
>  The source string is taken from a global table and therefore has known 
> maximum size.
>  F10967280: imap.c_fd80e0804acd9e7ecb9c2483151625a9.plist.html 
> 
>  The source string is a literal `'*'`.

When you encounter a member that is when the hand-waving begins. I have made a 
dump to see what we have. I have not seen any immediate way to connect the 
dynamic size and the members. For now we have that:
The source region is:
`SymRegion{reg_$36}`
and the allocation's dynamic size is:
`extent_$41{SymRegion{conj_$34{void *, LC7, S161233, #1}}}`

My assumption is that, the size of a member is very arbitrary and we cannot 
track the member's allocation, so we even could drop every of these reports as 
being useless. Because I make this checker alpha I am fine with that assumption 
of members, and we definitely need better ideas.

---

> F10967277: ftp.c_8cb07866de61ef56be82135a4d3a5b7e.plist.html 
> 
>  The source string is an IP address and port, which has a known limit on the 
> number of digits it can have.

The known size does not mean that the string is going to be null-terminated. 
For example the VIM project has one single global macro to serve every 
allocation's size, and that is why the CERT rule made. If the attacker passes a 
10k characters long string and you allocate 10k+1 characters for the `'\0'` you 
can work with the data safely. Of course you would need to check some upper 
bound, but that is another story. The key is, any arbitrary buffer size could 
be a problem. Here:

  954 | size_t addrlen = INET6_ADDRSTRLEN > strlen(string_ftpport) // Assuming 
the condition is true
 ? INET6_ADDRSTRLEN
 : strlen(string_ftpport)

`addrlen` is an arbitrary macro size `INET6_ADDRSTRLEN`, which not necessarily 
could hold `string_ftpport` in `strcpy(addr, string_ftpport);`. As a 
human-being you know the most appropriate size, yes. But the program could be 
ill-formed very easily so we warn on the arbitrary-length path. Here you do not 
see something like `addr[INET6_ADDRSTRLEN - 1] = '\0'` so when the string is 
read you cannot be sure whether it is null-terminated. If line 954 would be 
evaluated to false, allocating `calloc(addrlen + 1, 1)` so when `addrlen` is 
`strlen(string_ftpport)` is fine of course, that is the appropriate size to 
store `string_ftpport`.

---

> F10967295: tftp.c_eee38be8d783d25f2e733fa3740d13fc.plist.html 
> 
>  There's an if-statement that checks that the storage size is sufficient.

Well, this is the most suspicious report of all the time. Let me reconstruct 
what is going on:

  513 | sbytes += tftp_option_add(state, sbytes,
  (char *)state->spacket.data + sbytes, // 
warning
  TFTP_OPTION_TSIZE);

My guess was that to change the `check::PostCall` to `check::PreCall`, but if I 
remember right the same report occurred. Here we return from the function call 
`tftp_option_add`, then we warn on its argument which already has been passed 
to the call. We should not warn when the array is passed to a function which we 
already have finished reading from. I have not seen any immediate solution, so 
I count this as an internal ~~bug~~ feature.

---

> F10967300: tool_dirhie.c_87dd00a845a927c9b8ed587c6b314af1.plist.html 
> 
>  The source string is a token (obtained via `strtok`) of a string that has 
> the same size as the destination buffer.

Let me simplify the report as there is no such feature:

  111 | outlen = strlen(outfile);
  112 | outdup = strdup(outfile);
  116 | dirbuildup = malloc(outlen + 1);
  
  125 | tempdir = strtok(outdup, PATH_DELIMITERS);
  136 | if(outdup == tempdir)
  138 |   strcpy(dirbuildup, tempdir);

Here we see both 

[PATCH] D70411: [analyzer] CERT: StrChecker: 31.c

2019-12-06 Thread Csaba Dabis via Phabricator via cfe-commits
Charusso updated this revision to Diff 232671.
Charusso marked 10 inline comments as done.
Charusso added a comment.

- Make the checker alpha.
- Add docs.
- Copy-paste @NoQ's test.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70411/new/

https://reviews.llvm.org/D70411

Files:
  clang/docs/analyzer/checkers.rst
  clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
  clang/include/clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h
  clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicSizeInfo.h
  clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymExpr.h
  clang/lib/StaticAnalyzer/Checkers/AllocationState.h
  clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
  clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
  clang/lib/StaticAnalyzer/Core/CommonBugCategories.cpp
  clang/test/Analysis/Inputs/system-header-simulator.h
  clang/test/Analysis/cert/str31-c-fp-suppression.cpp
  clang/test/Analysis/cert/str31-c-notes.cpp
  clang/test/Analysis/cert/str31-c.cpp

Index: clang/test/Analysis/cert/str31-c.cpp
===
--- /dev/null
+++ clang/test/Analysis/cert/str31-c.cpp
@@ -0,0 +1,205 @@
+// RUN: %clang_analyze_cc1 \
+// RUN:  -analyzer-checker=core,unix,alpha.security.cert.str.31.c \
+// RUN:  -verify %s
+
+// See the examples on the page of STR31-C:
+// https://wiki.sei.cmu.edu/confluence/display/c/STR31-C.+Guarantee+that+storage+for+strings+has+sufficient+space+for+character+data+and+the+null+terminator
+
+#include "../Inputs/system-header-simulator.h"
+
+#define EOF -1
+typedef __SIZE_TYPE__ size_t;
+
+void free(void *memblock);
+void *malloc(size_t size);
+
+void do_something(char *buffer);
+
+namespace test_gets_bad {
+#define BUFFER_SIZE 1024
+
+void func(void) {
+  char buf[BUFFER_SIZE];
+  if (gets(buf)) {
+do_something(buf);
+// expected-warning@-1 {{'buf' is not null-terminated}}
+  }
+}
+} // namespace test_gets_bad
+
+namespace test_gets_good {
+enum { BUFFERSIZE = 32 };
+
+void func(void) {
+  char buff[BUFFERSIZE];
+
+  if (fgets(buff, sizeof(buff), stdin)) {
+do_something(buff);
+  }
+}
+} // namespace test_gets_good
+
+namespace test_sprintf_bad {
+void func(const char *name) {
+  char buf[128];
+  sprintf(buf, "%s.txt", name);
+
+  do_something(buf);
+  // expected-warning@-1 {{'buf' is not null-terminated}}
+}
+} // namespace test_sprintf_bad
+
+namespace test_sprintf_good {
+void func(const char *name) {
+  char buff[128];
+  snprintf(buff, sizeof(buff), "%s.txt", name);
+
+  do_something(buff);
+}
+} // namespace test_sprintf_good
+
+namespace test_fscanf_bad {
+enum { BUF_LENGTH = 1024 };
+
+void get_data(void) {
+  char buf[BUF_LENGTH];
+  if (fscanf(stdin, "%s", buf)) {
+do_something(buf);
+// expected-warning@-1 {{'buf' is not null-terminated}}
+  }
+}
+} // namespace test_fscanf_bad
+
+namespace test_fscanf_good {
+enum { BUF_LENGTH = 1024 };
+
+void get_data(void) {
+  char buff[BUF_LENGTH];
+  if (fscanf(stdin, "%1023s", buff)) {
+do_something(buff);
+  }
+}
+} // namespace test_fscanf_good
+
+namespace test_strcpy_bad {
+int main(int argc, char *argv[]) {
+  const char *const name = (argc && argv[0]) ? argv[0] : "";
+  char prog_name[128];
+  strcpy(prog_name, name);
+
+  do_something(prog_name);
+  // expected-warning@-1 {{'prog_name' is not null-terminated}}
+
+  return 0;
+}
+
+void func(void) {
+  char buff[256];
+  char *editor = getenv("EDITOR");
+  if (editor != NULL) {
+strcpy(buff, editor);
+
+do_something(buff);
+// expected-warning@-1 {{'buff' is not null-terminated}}
+  }
+}
+} // namespace test_strcpy_bad
+
+namespace test_strcpy_good {
+int main(int argc, char *argv[]) {
+  const char *const name = (argc && argv[0]) ? argv[0] : "";
+  char *prog_name2 = (char *)malloc(strlen(name) + 1);
+  if (prog_name2 != NULL) {
+strcpy(prog_name2, name);
+  }
+
+  do_something(prog_name2);
+
+  free(prog_name2);
+  return 0;
+}
+
+void func(void) {
+  char *buff2;
+  char *editor = getenv("EDITOR");
+  if (editor != NULL) {
+size_t len = strlen(editor) + 1;
+buff2 = (char *)malloc(len);
+if (buff2 != NULL) {
+  strcpy(buff2, editor);
+}
+
+do_something(buff2);
+free(buff2);
+  }
+}
+} // namespace test_strcpy_good
+
+//===--===//
+// The following are from the rule's page which we do not handle yet.
+//===--===//
+
+namespace test_loop_index_bad {
+void copy(size_t n, char src[n], char dest[n]) {
+  size_t i;
+
+  for (i = 0; src[i] && (i < n); ++i) {
+dest[i] = src[i];
+  }
+  dest[i] = '\0';
+}
+} // namespace test_loop_index_bad
+
+namespace test_loop_index_good {
+void copy(size_t n, char src[n], char dest[n]) {
+  size_t i;
+
+  for (i = 0; src[i] && (i < n - 1); ++i) {
+dest[i] = src[i];
+  }
+  dest[i] = '\0';
+}
+} // 

[PATCH] D70411: [analyzer] CERT: StrChecker: 31.c

2019-12-04 Thread Artem Dergachev via Phabricator via cfe-commits
NoQ added a comment.

In D70411#1769628 , @Charusso wrote:

> F10966837: str31-c.tar.gz 


These reports seem to confirm my point. I took a look at first 5 and in all of 
them the code does in fact ensure that the buffer space is sufficient, but in 5 
different ways. You most likely won't be able to enumerate all possible ways in 
which the user can ensure that the destination buffer size is sufficient.

F10967274: content_encoding.c_5ae4f30ce29f14441139e7bbe20eeaaa.plist.html 

The source string is taken from a global table and therefore has known maximum 
size.

F10967277: ftp.c_8cb07866de61ef56be82135a4d3a5b7e.plist.html 

The source string is an IP address and port, which has a known limit on the 
number of digits it can have.

F10967280: imap.c_fd80e0804acd9e7ecb9c2483151625a9.plist.html 

The source string is a literal `'*'`.

F10967295: tftp.c_eee38be8d783d25f2e733fa3740d13fc.plist.html 

There's an if-statement that checks that the storage size is sufficient.

F10967300: tool_dirhie.c_87dd00a845a927c9b8ed587c6b314af1.plist.html 

The source string is a token (obtained via `strtok`) of a string that has the 
same size as the destination buffer.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70411/new/

https://reviews.llvm.org/D70411



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70411: [analyzer] CERT: StrChecker: 31.c

2019-12-04 Thread Artem Dergachev via Phabricator via cfe-commits
NoQ added a comment.

A while back I had a look into implementing some of the CERT checkers and my 
vague recollection from these attempts was that it's a bad idea to take CERT 
examples literally. Most of the CERT rules are explained in an informal natural 
language and it makes CERT great for demonstrating common mistakes that 
programmers make when they write in C/C++. Humans can easily understand the 
problem by reading CERT pages. But it doesn't necessarily mean that static 
analysis tools can be built by generalizing over the examples from the CERT 
wiki.

So i suggest that we make one more step back and agree upon what exactly are we 
checking here. I.e., basically, agree on the tests. Because right now it looks 
like you're trying to blindly generalize over the examples: //"The CERT example 
has a `strcpy()` into a fixed-size buffer, let's warn on all `strcpy()`s into 
fixed-size buffers!"//. This way your check does indeed pass the tests, but it 
doesn't make sense both from the rule's perspective (the rule never said that 
it's wrong to `strcpy()` into a fixed-size buffer, it only said that you should 
anyhow guarantee that the storage is sufficient) and from the user's 
perspective (you won't be able to reduce the gap between false positives and 
false negatives enough for the check to be useful, even with the subsequent 
whack-a-mole of adding more heuristics, because what the check is checking is 
relatively orthogonal to the actual problem you're trying to solve).

For example, in the example titled "Noncompliant Code Example (argv)", it is 
explicitly stated that the problem is not only that the buffer is fixed-size 
and `strcpy()` is made, but also that the original string //is controlled by 
the attacker//. The right analysis to catch such issues is //taint analysis//. 
It's a typical taint problem. I honestly believe you should try to solve it by 
combining the taint checker with the string checker: "if string length metadata 
is tainted (in particular, if the string itself is tainted) and is potentially 
larger than the destination buffer size (eg., unconstrained), warn". With the 
recent advancements in the development of the taint checker, i think you can 
get pretty far this way without constantly struggling with false positives.




Comment at: clang/include/clang/StaticAnalyzer/Checkers/Checkers.td:801
+  Dependencies<[StrCheckerBase]>,
+  Documentation;
+

Charusso wrote:
> In my mind the documentation is the CERT rule's page. Is it okay?
I'd prefer our own documentation that may link to the CERT page but should also 
explain, say, which parts of the rule are covered by the checker (even if it's 
"all of them", they may add more rules in the future), and probably some 
specifics of the checker's behavior if they aren't obvious from the rule.



Comment at: clang/test/Analysis/cert/str31-c-fp-suppression.cpp:33-34
+  // string, so that we could see whether the string is null-terminated on
+  // every path or not. Because the 'src' could be not null-terminated this
+  // report is fine, but if 'src' is null-terminated this report is a false
+  // positive because the 'src' fits into 'dest' with the null terminator

Mmm, no, it's not fine. The warning is saying something that's not correct. 
Even if `src` is not null-terminated, under the assumption that no 
out-of-bounds read UB occurs in the `strcpy()` call, `dest` is going to always 
be null-terminated and have a `strlen` of at most 127. And by continuing our 
analysis after `strcpy()`, we inform the user that we assume that no UB has 
happened on the current execution path yet.

Traditionally checkers either warn immediately when they can detect an error, 
or assume that the error has not happened. For example, when we dereference a 
pointer, if the pointer is not known to be null, we actively assume that it's 
non-null. Similarly, if the `src` string is not null-terminated, we should warn 
at the invocation of `strcpy`; if it's not known whether it's null-terminated 
or not, we should actively assume that it's null-terminated at `strcpy`.

Also, I usually don't agree with the statements like "This report helped us 
find a real bug, therefore it's a true positive". You can find a bug in 
literally any code if you stare at it long enough! I'm glad you found the bug 
anyway, but if the report is not pointing at the problem directly, we're not 
doing a great job.



Comment at: clang/test/Analysis/cert/str31-c-fp-suppression.cpp:51-53
+  strcpy(dest, src);
+  do_something(dest);
+  // expected-warning@-1 {{'dest' is not null-terminated}}

Looks like a false positive. Consider the following perfectly correct (even if 
a bit inefficient) code that only differs from the current code only in names 
and context:
```lang=c++
void copy_and_test_if_short(const char *src) {
  char dest[128];
  strcpy(dest, src);
  warn_if_starts_with_foo(dest);
}


[PATCH] D70411: [analyzer] CERT: StrChecker: 31.c

2019-12-04 Thread Csaba Dabis via Phabricator via cfe-commits
Charusso added a comment.

I have picked `curl` as an example, because it has a measurable amount of 
reports. I still recommend to make this alpha, because the expression-tracking 
cannot track members, and peoples are using members everywhere.

F10966837: str31-c.tar.gz 

- It is made with CodeChecker:

  --analyzers clangsa \
  --disable default \
  --enable core \
  --enable unix \
  --enable security.cert.str.StrCheckerBase \
  --enable security.cert.str.31.c \
  --saargs sa_args.txt



  $ cat sa_args.txt
  
  -Xclang -analyzer-config -Xclang silence-checkers="core;unix"


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70411/new/

https://reviews.llvm.org/D70411



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70411: [analyzer] CERT: StrChecker: 31.c

2019-12-04 Thread Csaba Dabis via Phabricator via cfe-commits
Charusso updated this revision to Diff 232198.
Charusso added a comment.

- Tweaking.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70411/new/

https://reviews.llvm.org/D70411

Files:
  clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
  clang/include/clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h
  clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicSizeInfo.h
  clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymExpr.h
  clang/lib/StaticAnalyzer/Checkers/AllocationState.h
  clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
  clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
  clang/lib/StaticAnalyzer/Core/CommonBugCategories.cpp
  clang/test/Analysis/Inputs/system-header-simulator.h
  clang/test/Analysis/cert/str31-c-fp-suppression.cpp
  clang/test/Analysis/cert/str31-c-notes.cpp
  clang/test/Analysis/cert/str31-c.cpp

Index: clang/test/Analysis/cert/str31-c.cpp
===
--- /dev/null
+++ clang/test/Analysis/cert/str31-c.cpp
@@ -0,0 +1,205 @@
+// RUN: %clang_analyze_cc1 \
+// RUN:  -analyzer-checker=core,unix,security.cert.str.31.c \
+// RUN:  -verify %s
+
+// See the examples on the page of STR31-C:
+// https://wiki.sei.cmu.edu/confluence/display/c/STR31-C.+Guarantee+that+storage+for+strings+has+sufficient+space+for+character+data+and+the+null+terminator
+
+#include "../Inputs/system-header-simulator.h"
+
+#define EOF -1
+typedef __SIZE_TYPE__ size_t;
+
+void free(void *memblock);
+void *malloc(size_t size);
+
+void do_something(char *buffer);
+
+namespace test_gets_bad {
+#define BUFFER_SIZE 1024
+
+void func(void) {
+  char buf[BUFFER_SIZE];
+  if (gets(buf)) {
+do_something(buf);
+// expected-warning@-1 {{'buf' is not null-terminated}}
+  }
+}
+} // namespace test_gets_bad
+
+namespace test_gets_good {
+enum { BUFFERSIZE = 32 };
+
+void func(void) {
+  char buff[BUFFERSIZE];
+
+  if (fgets(buff, sizeof(buff), stdin)) {
+do_something(buff);
+  }
+}
+} // namespace test_gets_good
+
+namespace test_sprintf_bad {
+void func(const char *name) {
+  char buf[128];
+  sprintf(buf, "%s.txt", name);
+
+  do_something(buf);
+  // expected-warning@-1 {{'buf' is not null-terminated}}
+}
+} // namespace test_sprintf_bad
+
+namespace test_sprintf_good {
+void func(const char *name) {
+  char buff[128];
+  snprintf(buff, sizeof(buff), "%s.txt", name);
+
+  do_something(buff);
+}
+} // namespace test_sprintf_good
+
+namespace test_fscanf_bad {
+enum { BUF_LENGTH = 1024 };
+
+void get_data(void) {
+  char buf[BUF_LENGTH];
+  if (fscanf(stdin, "%s", buf)) {
+do_something(buf);
+// expected-warning@-1 {{'buf' is not null-terminated}}
+  }
+}
+} // namespace test_fscanf_bad
+
+namespace test_fscanf_good {
+enum { BUF_LENGTH = 1024 };
+
+void get_data(void) {
+  char buff[BUF_LENGTH];
+  if (fscanf(stdin, "%1023s", buff)) {
+do_something(buff);
+  }
+}
+} // namespace test_fscanf_good
+
+namespace test_strcpy_bad {
+int main(int argc, char *argv[]) {
+  const char *const name = (argc && argv[0]) ? argv[0] : "";
+  char prog_name[128];
+  strcpy(prog_name, name);
+
+  do_something(prog_name);
+  // expected-warning@-1 {{'prog_name' is not null-terminated}}
+
+  return 0;
+}
+
+void func(void) {
+  char buff[256];
+  char *editor = getenv("EDITOR");
+  if (editor != NULL) {
+strcpy(buff, editor);
+
+do_something(buff);
+// expected-warning@-1 {{'buff' is not null-terminated}}
+  }
+}
+} // namespace test_strcpy_bad
+
+namespace test_strcpy_good {
+int main(int argc, char *argv[]) {
+  const char *const name = (argc && argv[0]) ? argv[0] : "";
+  char *prog_name2 = (char *)malloc(strlen(name) + 1);
+  if (prog_name2 != NULL) {
+strcpy(prog_name2, name);
+  }
+
+  do_something(prog_name2);
+
+  free(prog_name2);
+  return 0;
+}
+
+void func(void) {
+  char *buff2;
+  char *editor = getenv("EDITOR");
+  if (editor != NULL) {
+size_t len = strlen(editor) + 1;
+buff2 = (char *)malloc(len);
+if (buff2 != NULL) {
+  strcpy(buff2, editor);
+}
+
+do_something(buff2);
+free(buff2);
+  }
+}
+} // namespace test_strcpy_good
+
+//===--===//
+// The following are from the rule's page which we do not handle yet.
+//===--===//
+
+namespace test_loop_index_bad {
+void copy(size_t n, char src[n], char dest[n]) {
+  size_t i;
+
+  for (i = 0; src[i] && (i < n); ++i) {
+dest[i] = src[i];
+  }
+  dest[i] = '\0';
+}
+} // namespace test_loop_index_bad
+
+namespace test_loop_index_good {
+void copy(size_t n, char src[n], char dest[n]) {
+  size_t i;
+
+  for (i = 0; src[i] && (i < n - 1); ++i) {
+dest[i] = src[i];
+  }
+  dest[i] = '\0';
+}
+} // namespace test_loop_index_good
+
+namespace test_getchar_bad {
+enum { BUFFERSIZE = 32 };
+
+void func(void) {
+  char buf[BUFFERSIZE];
+ 

[PATCH] D70411: [analyzer] CERT: StrChecker: 31.c

2019-11-27 Thread Csaba Dabis via Phabricator via cfe-commits
Charusso marked 2 inline comments as done.
Charusso added a comment.

The false positive suppression by going backwards on the bug-path of stored 
reports was a very simple concept, which turned out to be a useless one. The 
rough idea was that to invalidate every report in a path, and every other 
report would be left because they are reachable from a different path and they 
are valid errors. So that:

  void test_wonky_null_termination(const char *src) {
char dest[128];
strcpy(dest, src);
dest[sizeof(dest) - 1] = '\0';
do_something(dest); // no-warning
  }

The above example null-terminates the string on every path, so we remove the 
report, and:

  void test_may_not_every_path_null_terminated(const char *src) {
char dest[128];
strcpy(dest, src);
if (strlen(src) > sizeof(dest)) {
  dest[sizeof(dest) - 1] = '\0';
}
do_something(dest);
// expected-warning@-1 {{'dest' is not null-terminated}}
  }

We say that the above example has a path where we cannot invalidate the report. 
Invalidating one single control-flow path I think should be safe and keeps 
every other report like the above examples. In general there is no such simple 
case exist because we check for the destination array being null so we 
encounter a state-split whatever we do. Also may it was a bad idea to rewrite 
the `BugReporter` for a simple checker.

This null-termination-by-hand made false positives because even the buffer 
could overflow, we stop up the flow, so the Vasa will not sink. I wanted to 
emit reports on problematic function calls, but changing the main logic to emit 
an error on the string-reading made the null-termination store-able. So that we 
could drop this backward path-traversing logic, that is why I have not answered 
yet. However, if we still want to emit an error on function-calls, and every of 
them, we need to be able to chain the calls to the same region, which made by 
the reports in the previous revisions. Also may we would use this 
report-storing idea later.




Comment at: clang/include/clang/StaticAnalyzer/Checkers/Checkers.td:801
+  Dependencies<[StrCheckerBase]>,
+  Documentation;
+

In my mind the documentation is the CERT rule's page. Is it okay?



Comment at: clang/test/Analysis/cert/str31-c-fp-suppression.cpp:15
+
+void do_something(char *destfer);
+

Facepalm.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70411/new/

https://reviews.llvm.org/D70411



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70411: [analyzer] CERT: StrChecker: 31.c

2019-11-27 Thread Csaba Dabis via Phabricator via cfe-commits
Charusso updated this revision to Diff 231341.
Charusso added a comment.

- Do not emit a report on re-binding a not null-terminated string, it may will 
be properly terminated (test added).
- Added a test for the necessity of `SValVisitor` in this checker.
- Some refactor.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70411/new/

https://reviews.llvm.org/D70411

Files:
  clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
  clang/include/clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h
  clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicSizeInfo.h
  clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymExpr.h
  clang/lib/StaticAnalyzer/Checkers/AllocationState.h
  clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
  clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
  clang/lib/StaticAnalyzer/Core/CommonBugCategories.cpp
  clang/test/Analysis/Inputs/system-header-simulator.h
  clang/test/Analysis/cert/str31-c-fp-suppression.cpp
  clang/test/Analysis/cert/str31-c-notes.cpp
  clang/test/Analysis/cert/str31-c.cpp

Index: clang/test/Analysis/cert/str31-c.cpp
===
--- /dev/null
+++ clang/test/Analysis/cert/str31-c.cpp
@@ -0,0 +1,205 @@
+// RUN: %clang_analyze_cc1 \
+// RUN:  -analyzer-checker=core,unix,security.cert.str.31.c \
+// RUN:  -verify %s
+
+// See the examples on the page of STR31:
+// https://wiki.sei.cmu.edu/confluence/display/c/STR31-C.+Guarantee+that+storage+for+strings+has+sufficient+space+for+character+data+and+the+null+terminator
+
+#include "../Inputs/system-header-simulator.h"
+
+#define EOF -1
+typedef __SIZE_TYPE__ size_t;
+
+void free(void *memblock);
+void *malloc(size_t size);
+
+void do_something(char *buffer);
+
+namespace test_gets_bad {
+#define BUFFER_SIZE 1024
+
+void func(void) {
+  char buf[BUFFER_SIZE];
+  if (gets(buf)) {
+do_something(buf);
+// expected-warning@-1 {{'buf' is not null-terminated}}
+  }
+}
+} // namespace test_gets_bad
+
+namespace test_gets_good {
+enum { BUFFERSIZE = 32 };
+
+void func(void) {
+  char buff[BUFFERSIZE];
+
+  if (fgets(buff, sizeof(buff), stdin)) {
+do_something(buff);
+  }
+}
+} // namespace test_gets_good
+
+namespace test_sprintf_bad {
+void func(const char *name) {
+  char buf[128];
+  sprintf(buf, "%s.txt", name);
+
+  do_something(buf);
+  // expected-warning@-1 {{'buf' is not null-terminated}}
+}
+} // namespace test_sprintf_bad
+
+namespace test_sprintf_good {
+void func(const char *name) {
+  char buff[128];
+  snprintf(buff, sizeof(buff), "%s.txt", name);
+
+  do_something(buff);
+}
+} // namespace test_sprintf_good
+
+namespace test_fscanf_bad {
+enum { BUF_LENGTH = 1024 };
+
+void get_data(void) {
+  char buf[BUF_LENGTH];
+  if (fscanf(stdin, "%s", buf)) {
+do_something(buf);
+// expected-warning@-1 {{'buf' is not null-terminated}}
+  }
+}
+} // namespace test_fscanf_bad
+
+namespace test_fscanf_good {
+enum { BUF_LENGTH = 1024 };
+
+void get_data(void) {
+  char buff[BUF_LENGTH];
+  if (fscanf(stdin, "%1023s", buff)) {
+do_something(buff);
+  }
+}
+} // namespace test_fscanf_good
+
+namespace test_strcpy_bad {
+int main(int argc, char *argv[]) {
+  const char *const name = (argc && argv[0]) ? argv[0] : "";
+  char prog_name[128];
+  strcpy(prog_name, name);
+
+  do_something(prog_name);
+  // expected-warning@-1 {{'prog_name' is not null-terminated}}
+
+  return 0;
+}
+
+void func(void) {
+  char buff[256];
+  char *editor = getenv("EDITOR");
+  if (editor != NULL) {
+strcpy(buff, editor);
+
+do_something(buff);
+// expected-warning@-1 {{'buff' is not null-terminated}}
+  }
+}
+} // namespace test_strcpy_bad
+
+namespace test_strcpy_good {
+int main(int argc, char *argv[]) {
+  const char *const name = (argc && argv[0]) ? argv[0] : "";
+  char *prog_name2 = (char *)malloc(strlen(name) + 1);
+  if (prog_name2 != NULL) {
+strcpy(prog_name2, name);
+  }
+
+  do_something(prog_name2);
+
+  free(prog_name2);
+  return 0;
+}
+
+void func(void) {
+  char *buff2;
+  char *editor = getenv("EDITOR");
+  if (editor != NULL) {
+size_t len = strlen(editor) + 1;
+buff2 = (char *)malloc(len);
+if (buff2 != NULL) {
+  strcpy(buff2, editor);
+}
+
+do_something(buff2);
+free(buff2);
+  }
+}
+} // namespace test_strcpy_good
+
+//===--===//
+// The following is from the rule's page which we do not handle yet.
+//===--===//
+
+namespace test_loop_index_bad {
+void copy(size_t n, char src[n], char dest[n]) {
+  size_t i;
+
+  for (i = 0; src[i] && (i < n); ++i) {
+dest[i] = src[i];
+  }
+  dest[i] = '\0';
+}
+} // namespace test_loop_index_bad
+
+namespace test_loop_index_good {
+void copy(size_t n, char src[n], char dest[n]) {
+  size_t i;
+
+  for (i = 0; src[i] && (i < n - 1); ++i) {
+

[PATCH] D70411: [analyzer] CERT: StrChecker: 31.c

2019-11-22 Thread Csaba Dabis via Phabricator via cfe-commits
Charusso updated this revision to Diff 230702.
Charusso marked an inline comment as done.
Charusso retitled this revision from "[analyzer][WIP] CERT: StrChecker: 31.c" 
to "[analyzer] CERT: StrChecker: 31.c".
Charusso added a comment.

- Remove the report storing map so we do not traverse backwards on the bug-path.
- Use `NoteTags` instead of reports on problematic function calls.
- Emit a report only if a not null-terminated string is read.
- Store whether the string is null-terminated.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70411/new/

https://reviews.llvm.org/D70411

Files:
  clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
  clang/include/clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h
  clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicSizeInfo.h
  clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymExpr.h
  clang/lib/StaticAnalyzer/Checkers/AllocationState.h
  clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
  clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
  clang/lib/StaticAnalyzer/Core/CommonBugCategories.cpp
  clang/test/Analysis/Inputs/system-header-simulator.h
  clang/test/Analysis/cert/str31-c-fp-suppression.cpp
  clang/test/Analysis/cert/str31-c-notes.cpp
  clang/test/Analysis/cert/str31-c.cpp

Index: clang/test/Analysis/cert/str31-c.cpp
===
--- /dev/null
+++ clang/test/Analysis/cert/str31-c.cpp
@@ -0,0 +1,205 @@
+// RUN: %clang_analyze_cc1 \
+// RUN:  -analyzer-checker=core,unix,security.cert.str.31.c \
+// RUN:  -verify %s
+
+// See the examples on the page of STR31:
+// https://wiki.sei.cmu.edu/confluence/display/c/STR31-C.+Guarantee+that+storage+for+strings+has+sufficient+space+for+character+data+and+the+null+terminator
+
+#include "../Inputs/system-header-simulator.h"
+
+#define EOF -1
+typedef __SIZE_TYPE__ size_t;
+
+void free(void *memblock);
+void *malloc(size_t size);
+
+void do_something(char *buffer);
+
+namespace test_gets_bad {
+#define BUFFER_SIZE 1024
+
+void func(void) {
+  char buf[BUFFER_SIZE];
+  if (gets(buf)) {
+do_something(buf);
+// expected-warning@-1 {{'buf' is not null-terminated}}
+  }
+}
+} // namespace test_gets_bad
+
+namespace test_gets_good {
+enum { BUFFERSIZE = 32 };
+
+void func(void) {
+  char buff[BUFFERSIZE];
+
+  if (fgets(buff, sizeof(buff), stdin)) {
+do_something(buff);
+  }
+}
+} // namespace test_gets_good
+
+namespace test_sprintf_bad {
+void func(const char *name) {
+  char buf[128];
+  sprintf(buf, "%s.txt", name);
+
+  do_something(buf);
+  // expected-warning@-1 {{'buf' is not null-terminated}}
+}
+} // namespace test_sprintf_bad
+
+namespace test_sprintf_good {
+void func(const char *name) {
+  char buff[128];
+  snprintf(buff, sizeof(buff), "%s.txt", name);
+
+  do_something(buff);
+}
+} // namespace test_sprintf_good
+
+namespace test_fscanf_bad {
+enum { BUF_LENGTH = 1024 };
+
+void get_data(void) {
+  char buf[BUF_LENGTH];
+  if (fscanf(stdin, "%s", buf)) {
+do_something(buf);
+// expected-warning@-1 {{'buf' is not null-terminated}}
+  }
+}
+} // namespace test_fscanf_bad
+
+namespace test_fscanf_good {
+enum { BUF_LENGTH = 1024 };
+
+void get_data(void) {
+  char buff[BUF_LENGTH];
+  if (fscanf(stdin, "%1023s", buff)) {
+do_something(buff);
+  }
+}
+} // namespace test_fscanf_good
+
+namespace test_strcpy_bad {
+int main(int argc, char *argv[]) {
+  const char *const name = (argc && argv[0]) ? argv[0] : "";
+  char prog_name[128];
+  strcpy(prog_name, name);
+
+  do_something(prog_name);
+  // expected-warning@-1 {{'prog_name' is not null-terminated}}
+
+  return 0;
+}
+
+void func(void) {
+  char buff[256];
+  char *editor = getenv("EDITOR");
+  if (editor != NULL) {
+strcpy(buff, editor);
+
+do_something(buff);
+// expected-warning@-1 {{'buff' is not null-terminated}}
+  }
+}
+} // namespace test_strcpy_bad
+
+namespace test_strcpy_good {
+int main(int argc, char *argv[]) {
+  const char *const name = (argc && argv[0]) ? argv[0] : "";
+  char *prog_name2 = (char *)malloc(strlen(name) + 1);
+  if (prog_name2 != NULL) {
+strcpy(prog_name2, name);
+  }
+
+  do_something(prog_name2);
+
+  free(prog_name2);
+  return 0;
+}
+
+void func(void) {
+  char *buff2;
+  char *editor = getenv("EDITOR");
+  if (editor != NULL) {
+size_t len = strlen(editor) + 1;
+buff2 = (char *)malloc(len);
+if (buff2 != NULL) {
+  strcpy(buff2, editor);
+}
+
+do_something(buff2);
+free(buff2);
+  }
+}
+} // namespace test_strcpy_good
+
+//===--===//
+// The following is from the rule's page which we do not handle yet.
+//===--===//
+
+namespace test_loop_index_bad {
+void copy(size_t n, char src[n], char dest[n]) {
+  size_t i;
+
+  for (i = 0; src[i] && (i < n); ++i) {
+dest[i] = 

[PATCH] D70411: [analyzer] CERT: StrChecker: 31.c

2019-11-22 Thread Csaba Dabis via Phabricator via cfe-commits
Charusso added a comment.

Now I have simplified the checker so we do not need any ASCII-art, I believe. 
Do we have any better logic than the current implementation to catch when the 
string is read?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70411/new/

https://reviews.llvm.org/D70411



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits