[PATCH] D70411: [analyzer] CERT: StrChecker: Implementing most of the STR31-C

2019-12-10 Thread Csaba Dabis via Phabricator via cfe-commits
Charusso added a comment.

In D70411#1776460 , @NoQ wrote:

> Ok, so, what should the checker warn on? What should be the intended state 
> machine for this checker?
>
> We basically have two possible categories of answers to this question:
>
> - "The checker warns every time it finds an execution path on which 
> unintended behavior occurs" - describe unintended behavior (say, 
> "out-of-bounds buffer access") and define the checker in such a way that all 
> warnings describe actual bugs of this kind (in our example, out-of-bound 
> buffer accesses) as long as the rest of the static analyzer works correctly.
> - Or, you could say "The checker enforces a certain coding guideline on the 
> user" - and then you should describe the guideline in a very short and 
> easy-to-understand manner (say, "don't pass the string by pointer anywhere 
> before you null-terminate it"), and then make checker check exactly this 
> guideline. You may introduce some suppressions for intentional violations of 
> the guideline, but the guideline itself should be simple, it should be always 
> possible to follow it, and it should sound nice enough for developers to feel 
> good about themselves when they follow it.


If I am right you are thinking about warn on the unsafe function calls which is 
the first case. I did that on by default. The idea is that, in a security 
manner we cannot allow hand-waving fixes. To measure stuff we could introduce 
options which serves the second case. I have added an option to suppress that 
common null-termination-by-hand idiom, so that I do not recommend anything. The 
CERT rules are not recommendations so I think now the warning is fine.

> Yup, i mean, you can go as far as you want with generalizing over "fixed size 
> buffer" in this approach, but what i'm saying is that you're still not 
> addressing the whole train of thought about the length of the source string.

I see, and you are right, but we are getting closer and closer to catch the 
most of.

> In D70411#1773703 , @Charusso wrote:
> 
>> > F10967277: ftp.c_8cb07866de61ef56be82135a4d3a5b7e.plist.html 
>> > 
>> >  The source string is an IP address and port, which has a known limit on 
>> > the number of digits it can have.
>>
>> The known size does not mean that the string is going to be null-terminated.
> 
> 
> It does, because `strcpy` is guaranteed to null-terminate as long as it has 
> enough storage capacity.

It does not, because the storage capacity is arbitrary. If we would be sure the 
**copied stuff's length** cannot be larger than the destination's arbitrary 
capacity, we would not warn. There are tests for that case.

> Checks that are part of the generic taint checker are currently in a pretty 
> bad shape, but the taint analysis itself is pretty good, and checks that rely 
> on taint but aren't part of the generic taint checker are also pretty good. I 
> actually believe taint analysis could be enabled by default as soon as 
> somebody goes through the broken checks and disables/removes them. If you 
> rely on the existing taint analysis infrastructure and make a good check, 
> that'll be wonderful and would further encourage us to move taint analysis 
> out of alpha.

I think it is too far away, sadly. I like the idea because it would be the root 
of security checkers, but I am mostly interested in data-flow analysis.




Comment at: clang/test/Analysis/cert/str31-c-fp-suppression.cpp:51-53
+  strcpy(dest, src);
+  do_something(dest);
+  // expected-warning@-1 {{'dest' is not null-terminated}}

NoQ wrote:
> Charusso wrote:
> > NoQ wrote:
> > > Looks like a false positive. Consider the following perfectly correct 
> > > (even if a bit inefficient) code that only differs from the current code 
> > > only in names and context:
> > > ```lang=c++
> > > void copy_and_test_if_short(const char *src) {
> > >   char dest[128];
> > >   strcpy(dest, src);
> > >   warn_if_starts_with_foo(dest);
> > > }
> > > 
> > > void copy_and_test(const char *src) {
> > >   if (strlen(src) < 64)
> > > copy_and_test_if_short(src);
> > >   else
> > > copy_and_test_if_long(src);
> > > }
> > > ```
> > That is why I have asked whether we have a way to say "when the string is 
> > read". I like that heuristic for now, because any kind of not 
> > null-terminated string is the root of the evil for [[ 
> > https://wiki.sei.cmu.edu/confluence/display/c/STR31-C.+Guarantee+that+storage+for+strings+has+sufficient+space+for+character+data+and+the+null+terminator
> >  | STR31-C ]].
> > 
> > In this case the attacker is the programmer who sends a possibly not 
> > null-terminated string around and the function which receives an arbitrary 
> > input would carry the issue.
> > 
> > Thanks for the example, I wanted to cover ranges, but I have forgotten it, 
> > because in the wild peoples do not check for 

[PATCH] D70411: [analyzer] CERT: StrChecker: Implementing most of the STR31-C

2019-12-10 Thread Csaba Dabis via Phabricator via cfe-commits
Charusso updated this revision to Diff 233252.
Charusso marked 7 inline comments as done.
Charusso retitled this revision from "[analyzer] CERT: StrChecker: 31.c" to 
"[analyzer] CERT: StrChecker: Implementing most of the STR31-C".
Charusso edited the summary of this revision.
Charusso added a comment.

- Fix.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70411/new/

https://reviews.llvm.org/D70411

Files:
  clang/docs/analyzer/checkers.rst
  clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
  clang/include/clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h
  clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicSizeInfo.h
  clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymExpr.h
  clang/lib/StaticAnalyzer/Checkers/AllocationState.h
  clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
  clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
  clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
  clang/lib/StaticAnalyzer/Core/CommonBugCategories.cpp
  clang/test/Analysis/Inputs/system-header-simulator.h
  clang/test/Analysis/analyzer-config.c
  clang/test/Analysis/cert/str31-c-notes-warn-on-call-off.cpp
  clang/test/Analysis/cert/str31-c-notes-warn-on-call-on.cpp
  clang/test/Analysis/cert/str31-c-warn-on-call-off.cpp
  clang/test/Analysis/cert/str31-c-warn-on-call-on.cpp
  clang/test/Analysis/cert/str31-c.cpp

Index: clang/test/Analysis/cert/str31-c.cpp
===
--- /dev/null
+++ clang/test/Analysis/cert/str31-c.cpp
@@ -0,0 +1,196 @@
+// RUN: %clang_analyze_cc1 \
+// RUN:  -analyzer-checker=core,unix,alpha.security.cert.str.31c \
+// RUN:  -verify %s
+
+// See the examples on the page of STR31-C:
+// https://wiki.sei.cmu.edu/confluence/display/c/STR31-C.+Guarantee+that+storage+for+strings+has+sufficient+space+for+character+data+and+the+null+terminator
+
+#include "../Inputs/system-header-simulator.h"
+
+#define EOF -1
+typedef __SIZE_TYPE__ size_t;
+
+void free(void *memblock);
+void *malloc(size_t size);
+
+void do_something(char *buffer);
+
+namespace test_gets_bad {
+#define BUFFER_SIZE 1024
+
+void func(void) {
+  char buf[BUFFER_SIZE];
+  if (gets(buf)) {
+// expected-warning@-1 {{'gets' could write outside of 'buf'}}
+  }
+}
+} // namespace test_gets_bad
+
+namespace test_gets_good {
+enum { BUFFERSIZE = 32 };
+
+void func(void) {
+  char buff[BUFFERSIZE];
+
+  if (fgets(buff, sizeof(buff), stdin)) {
+  }
+}
+} // namespace test_gets_good
+
+namespace test_sprintf_bad {
+void func(const char *name) {
+  char buf[128];
+  sprintf(buf, "%s.txt", name);
+  // expected-warning@-1 {{'sprintf' could write outside of 'buf'}}
+}
+} // namespace test_sprintf_bad
+
+namespace test_sprintf_good {
+void func(const char *name) {
+  char buff[128];
+  snprintf(buff, sizeof(buff), "%s.txt", name);
+
+  do_something(buff);
+}
+} // namespace test_sprintf_good
+
+namespace test_fscanf_bad {
+enum { BUF_LENGTH = 1024 };
+
+void get_data(void) {
+  char buf[BUF_LENGTH];
+  if (fscanf(stdin, "%s", buf)) {
+// expected-warning@-1 {{'fscanf' could write outside of 'buf'}}
+  }
+}
+} // namespace test_fscanf_bad
+
+namespace test_fscanf_good {
+enum { BUF_LENGTH = 1024 };
+
+void get_data(void) {
+  char buff[BUF_LENGTH];
+  if (fscanf(stdin, "%1023s", buff)) {
+do_something(buff);
+  }
+}
+} // namespace test_fscanf_good
+
+namespace test_strcpy_bad {
+int main(int argc, char *argv[]) {
+  const char *const name = (argc && argv[0]) ? argv[0] : "";
+  char prog_name[128];
+  strcpy(prog_name, name);
+  // expected-warning@-1 {{'strcpy' could write outside of 'prog_name'}}
+
+  return 0;
+}
+
+void func(void) {
+  char buff[256];
+  char *editor = getenv("EDITOR");
+  if (editor != NULL) {
+strcpy(buff, editor);
+// expected-warning@-1 {{'strcpy' could write outside of 'buff'}}
+  }
+}
+} // namespace test_strcpy_bad
+
+namespace test_strcpy_good {
+int main(int argc, char *argv[]) {
+  const char *const name = (argc && argv[0]) ? argv[0] : "";
+  char *prog_name2 = (char *)malloc(strlen(name) + 1);
+  if (prog_name2 != NULL) {
+strcpy(prog_name2, name);
+  }
+
+  do_something(prog_name2);
+
+  free(prog_name2);
+  return 0;
+}
+
+void func(void) {
+  char *buff2;
+  char *editor = getenv("EDITOR");
+  if (editor != NULL) {
+size_t len = strlen(editor) + 1;
+buff2 = (char *)malloc(len);
+if (buff2 != NULL) {
+  strcpy(buff2, editor);
+}
+
+do_something(buff2);
+free(buff2);
+  }
+}
+} // namespace test_strcpy_good
+
+//===--===//
+// The following are from the rule's page which we do not handle yet.
+//===--===//
+
+namespace test_loop_index_bad {
+void copy(size_t n, char src[n], char dest[n]) {
+  size_t i;
+
+  for (i = 0; src[i] && (i < n); ++i) {
+dest[i] = src[i];
+  }
+  dest[i] = '\0';
+}
+} // namespace test_loop_index_bad
+