subject:"\[PATCH 00\/49\] RFC\: Add a static analysis framework to GCC"

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

2019-12-16 Thread David Malcolm

On Tue, 2019-12-03 at 18:17 +0100, Jakub Jelinek wrote:
> On Tue, Dec 03, 2019 at 11:52:13AM -0500, David Malcolm wrote:
> > > > Our plugin "interface" as such is very broad.
> > > 
> > > Just to sneak in here I don't like exposing our current plugin
> > > "non-
> > > API"
> > > more.  In fact I'd just build the analyzer into GCC with maybe an
> > > option to disable its build (in case it is very fat?).
> > 
> > My aim here is to provide a way for distributors to be able to
> > disable
> > its build - indeed, for now, for it to be disabled by default,
> > requiring opting-in.
> > 
> > My reasoning here is that the analyzer is middle-end code, but
> > isn't as
> > mature as the rest of the middle-end (but I'm working on getting it
> > more mature).
> > 
> > I want some way to label the code as a "technology preview", that
> > people may want to experiment with, but to set expectations that
> > this
> > is a lot of new code and there will be bugs - but to make it
> > available
> > to make it easier for adventurous users to try it out.
> > 
> > I hope that makes sense.
> > 
> > I went down the "in-tree plugin" path by seeing the analogy with
> > frontends, but yes, it would probably be simpler to just build it
> > into
> > GCC, guarded with a configure-time variable.  It's many thousand
> > lines
> > of non-trivial C++ code, and associated selftests and DejaGnu
> > tests.
> 
> I think it is enough to document it as tech preview in the
> documentation,
> no need to have it as an in-tree plugin.  We have lots of options
> that had
> such a state (perhaps undeclared) over the years, I'd consider
> -fvtable-verify= to be such an option, or in the past e.g.
> -fipa-matrix-reorg or -fipa-struct-reorg.  And 2.5% code growth isn't
> that
> bad.  So, as long as the option isn't enabled by default, I think
> we'd be
> fine.

FWIW I did some testing of v4 of the patch kit [1], which drops the in-
tree plugin idea in favor of simply building the analyzer into the
compiler as a regular IPA pass.  The pass is disabled by default
(enabled by -fanalyzer).  There is also a configure-time option to
disable building it (it's built by default).

I did 3 bootstraps of a release build of x86_64-pc-linux-gnu:
- unpatched,
- with the kit but with --disable-analyzer, and
- with the kit, with the analyzer enabled.

Here are the sizes of cc1 and cc1plus in bytes in each build, after
stripping debuginfo (and showing the change relative to the unpatched
build:

  Unpatched:   With kit:
   Disabled  change: Enabled   change:
cc1   25778720 25815672  +36952 (+0.1%)  26270328  +491608 (+1.9%)
cc1plus   27355296 27388152  +32856 (+0.1%)  27842808  +487512 (+1.8%)


So it's a little less than 2% code growth.

Dave

[1] see https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer for the
various links

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

2019-12-11 Thread David Malcolm

On Mon, 2019-12-09 at 09:10 +0100, Richard Biener wrote:
> On Fri, Dec 6, 2019 at 11:31 PM Jeff Law  wrote:
> > On Wed, 2019-12-04 at 12:55 -0700, Martin Sebor wrote:
> > > On 11/15/19 6:22 PM, David Malcolm wrote:
> > > > This patch kit introduces a static analysis pass for GCC that
> > > > can
> > > > diagnose
> > > > various kinds of problems in C code at compile-time (e.g.
> > > > double-
> > > > free,
> > > > use-after-free, etc).
> > > 
> > > I haven't looked at the analyzer bits in any detail yet so I have
> > > just some very high-level questions.  But first let me say I'm
> > > excited to see this project! :)
> > > 
> > > It looks like the analyzer detects some of the same problems as
> > > some existing middle-end warnings (e.g., -Wnonnull,
> > > -Wuninitialized),
> > > and some that I have been working toward implementing (invalid
> > > uses
> > > of freed pointers such as returning them from functions or
> > > passing
> > > them to others), and others still that I have been thinking about
> > > as possible future projects (e.g., detecting uses of
> > > uninitialized
> > > arrays in string functions).
> > > 
> > > What are your thoughts about this sort of overlap?  Do you expect
> > > us to enhance both sets of warnings in parallel, or do you see us
> > > moving away from issuing warnings in the middle-end and toward
> > > making the analyzer the main source of these kinds of
> > > diagnostics?
> > > Maybe even replace some of the problematic middle-end warnings
> > > with the analyzer?  What (if anything) should we do about
> > > warnings
> > > issued for the same problems by both the middle-end and the
> > > analyzer?
> > > Or about false negatives?  E.g., a bug detected by the middle-end
> > > but not the analyzer or vice versa.
> > > 
> > > What do you see as the biggest pros and cons of either approach?
> > > (Middle-end vs analyzer.)  What limitations is the analyzer
> > > approach inherently subject to that the middle-end warnings
> > > aren't,
> > > and vice versa?
> > > 
> > > How do we prioritize between the two approaches (e.g., choose
> > > where to add a new warning)?
> > Given the cost of David's analyzer, I would tend to prioritize the
> > more
> > localized analysis.  Also note that because of the compile-time
> > complexities we end up pruning paths from the search space and lose
> > precision when we have to merge nodes.   These issues are inherent
> > in
> > the depth of analysis we're looking to do.
> > 
> > So the way to think about things is David's work is a slower,
> > deeper
> > analysis than what we usually do.  So things that are reasonable
> > candidates for -Wall would need to use the traditional mechansisms.
> > Things that require deeper analysis would be done in David's
> > framework.
> > 
> > Also note that part of David's work is to bring a fairly generic
> > engine
> > that we can expand with different domain specific analyzers.  It
> > just
> > happens to be the case that the first place he's focused is on
> > double-
> > free and use-after-free.  But (IMHO) the gem is really the generic
> > engine.
> 
> So if the "generic engine" lives inside GCC can the actual analyzers
> be plugins on a (stable) "analyzer plugin API"?

I like the idea of having plugins be able to support the analyzer
itself, so that new checkers can be registered by a plugin, analogous
to plugins that register new passes.  AIUI the clang static analyzer
works in such a fashion.

However, speaking to the "(stable)" part of your question: to do
anything useful, the checkers have to query GCC's IR (as well as
interact with the state of the analyzer), and so this reopens the
question of what the plugin API to GCC's IR is.

I'm focusing on building a concrete example of a checker (double-free)
and a few other examples; trying to generalize it into something
pluggable feels very much like something not to attempt in the initial
version.

> Does the analyzer work with LTO at whole-program scope btw?

My understanding of LTO is a little hazy, but yes, I think.

The first thing the analyzer does (in engine.cc) is:

  /* If using LTO, ensure that the cgraph nodes have function bodies.  */
  cgraph_node *node;
  FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
node->get_untransformed_body ();

before then building a "supergraph" that combines CFGs and the callgraph.

BTW, for more on implementation details, prebuilt HTML of the internal
docs are at:
https://dmalcolm.fedorapeople.org/gcc/static-analyzer/gccint/Static-Analyzer.html

Dave

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

2019-12-09 Thread Richard Biener

On Fri, Dec 6, 2019 at 11:31 PM Jeff Law  wrote:
>
> On Wed, 2019-12-04 at 12:55 -0700, Martin Sebor wrote:
> > On 11/15/19 6:22 PM, David Malcolm wrote:
> > > This patch kit introduces a static analysis pass for GCC that can
> > > diagnose
> > > various kinds of problems in C code at compile-time (e.g. double-
> > > free,
> > > use-after-free, etc).
> >
> > I haven't looked at the analyzer bits in any detail yet so I have
> > just some very high-level questions.  But first let me say I'm
> > excited to see this project! :)
> >
> > It looks like the analyzer detects some of the same problems as
> > some existing middle-end warnings (e.g., -Wnonnull, -Wuninitialized),
> > and some that I have been working toward implementing (invalid uses
> > of freed pointers such as returning them from functions or passing
> > them to others), and others still that I have been thinking about
> > as possible future projects (e.g., detecting uses of uninitialized
> > arrays in string functions).
> >
> > What are your thoughts about this sort of overlap?  Do you expect
> > us to enhance both sets of warnings in parallel, or do you see us
> > moving away from issuing warnings in the middle-end and toward
> > making the analyzer the main source of these kinds of diagnostics?
> > Maybe even replace some of the problematic middle-end warnings
> > with the analyzer?  What (if anything) should we do about warnings
> > issued for the same problems by both the middle-end and the analyzer?
> > Or about false negatives?  E.g., a bug detected by the middle-end
> > but not the analyzer or vice versa.
> >
> > What do you see as the biggest pros and cons of either approach?
> > (Middle-end vs analyzer.)  What limitations is the analyzer
> > approach inherently subject to that the middle-end warnings aren't,
> > and vice versa?
> >
> > How do we prioritize between the two approaches (e.g., choose
> > where to add a new warning)?
> Given the cost of David's analyzer, I would tend to prioritize the more
> localized analysis.  Also note that because of the compile-time
> complexities we end up pruning paths from the search space and lose
> precision when we have to merge nodes.   These issues are inherent in
> the depth of analysis we're looking to do.
>
> So the way to think about things is David's work is a slower, deeper
> analysis than what we usually do.  So things that are reasonable
> candidates for -Wall would need to use the traditional mechansisms.
> Things that require deeper analysis would be done in David's framework.
>
> Also note that part of David's work is to bring a fairly generic engine
> that we can expand with different domain specific analyzers.  It just
> happens to be the case that the first place he's focused is on double-
> free and use-after-free.  But (IMHO) the gem is really the generic
> engine.

So if the "generic engine" lives inside GCC can the actual analyzers
be plugins on a (stable) "analyzer plugin API"?

Does the analyzer work with LTO at whole-program scope btw?

Richard.

> jeff
>

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

2019-12-06 Thread Jeff Law

On Wed, 2019-12-04 at 12:55 -0700, Martin Sebor wrote:
> On 11/15/19 6:22 PM, David Malcolm wrote:
> > This patch kit introduces a static analysis pass for GCC that can
> > diagnose
> > various kinds of problems in C code at compile-time (e.g. double-
> > free,
> > use-after-free, etc).
> 
> I haven't looked at the analyzer bits in any detail yet so I have
> just some very high-level questions.  But first let me say I'm
> excited to see this project! :)
> 
> It looks like the analyzer detects some of the same problems as
> some existing middle-end warnings (e.g., -Wnonnull, -Wuninitialized),
> and some that I have been working toward implementing (invalid uses
> of freed pointers such as returning them from functions or passing
> them to others), and others still that I have been thinking about
> as possible future projects (e.g., detecting uses of uninitialized
> arrays in string functions).
> 
> What are your thoughts about this sort of overlap?  Do you expect
> us to enhance both sets of warnings in parallel, or do you see us
> moving away from issuing warnings in the middle-end and toward
> making the analyzer the main source of these kinds of diagnostics?
> Maybe even replace some of the problematic middle-end warnings
> with the analyzer?  What (if anything) should we do about warnings
> issued for the same problems by both the middle-end and the analyzer?
> Or about false negatives?  E.g., a bug detected by the middle-end
> but not the analyzer or vice versa.
> 
> What do you see as the biggest pros and cons of either approach?
> (Middle-end vs analyzer.)  What limitations is the analyzer
> approach inherently subject to that the middle-end warnings aren't,
> and vice versa?
> 
> How do we prioritize between the two approaches (e.g., choose
> where to add a new warning)?
Given the cost of David's analyzer, I would tend to prioritize the more
localized analysis.  Also note that because of the compile-time
complexities we end up pruning paths from the search space and lose
precision when we have to merge nodes.   These issues are inherent in
the depth of analysis we're looking to do.

So the way to think about things is David's work is a slower, deeper
analysis than what we usually do.  So things that are reasonable
candidates for -Wall would need to use the traditional mechansisms. 
Things that require deeper analysis would be done in David's framework.

Also note that part of David's work is to bring a fairly generic engine
that we can expand with different domain specific analyzers.  It just
happens to be the case that the first place he's focused is on double-
free and use-after-free.  But (IMHO) the gem is really the generic
engine.

jeff

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

2019-12-06 Thread Jeff Law

On Tue, 2019-12-03 at 11:52 -0500, David Malcolm wrote:
> On Wed, 2019-11-20 at 11:18 +0100, Richard Biener wrote:
> > On Tue, Nov 19, 2019 at 11:02 PM David Malcolm  > >
> > wrote:
> > > > > The checker is implemented as a GCC plugin.
> > > > > 
> > > > > The patch kit adds support for "in-tree" plugins i.e. GCC
> > > > > plugins
> > > > > that
> > > > > would live in the GCC source tree and be shipped as part of
> > > > > the
> > > > > GCC
> > > > > tarball,
> > > > > with a new:
> > > > >   --enable-plugins=[LIST OF PLUGIN NAMES]
> > > > > configure option, analogous to --enable-languages (the
> > > > > Makefile/configure
> > > > > machinery for handling in-tree GCC plugins is adapted from
> > > > > how
> > > > > we
> > > > > support
> > > > > frontends).
> > > > 
> > > > I like that.  Implementing this as a plugin surely must help to
> > > > either
> > > > document the GCC plugin interface as powerful/mature for such a
> > > > task.  Or
> > > > make it so, if it isn't yet.  ;-)
> > > 
> > > Our plugin "interface" as such is very broad.
> > 
> > Just to sneak in here I don't like exposing our current plugin
> > "non-
> > API"
> > more.  In fact I'd just build the analyzer into GCC with maybe an
> > option to disable its build (in case it is very fat?).
> 
> My aim here is to provide a way for distributors to be able to
> disable
> its build - indeed, for now, for it to be disabled by default,
> requiring opting-in.
It seems like there's some move to have this as part of the core
compiler rather than as a plug-in.  That's a bit of a surprise, but a
good one.


> I want some way to label the code as a "technology preview", that
> people may want to experiment with, but to set expectations that this
> is a lot of new code and there will be bugs - but to make it
> available
> to make it easier for adventurous users to try it out.
> 
> I hope that makes sense.
> 
> I went down the "in-tree plugin" path by seeing the analogy with
> frontends, but yes, it would probably be simpler to just build it
> into
> GCC, guarded with a configure-time variable.  It's many thousand
> lines
> of non-trivial C++ code, and associated selftests and DejaGnu tests.
Given the overall feedback, core component with an opt-out seems like
it'd be best.

jeff
>

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

2019-12-04 Thread Martin Sebor


On 11/15/19 6:22 PM, David Malcolm wrote:

This patch kit introduces a static analysis pass for GCC that can diagnose
various kinds of problems in C code at compile-time (e.g. double-free,
use-after-free, etc).


I haven't looked at the analyzer bits in any detail yet so I have
just some very high-level questions.  But first let me say I'm
excited to see this project! :)

It looks like the analyzer detects some of the same problems as
some existing middle-end warnings (e.g., -Wnonnull, -Wuninitialized),
and some that I have been working toward implementing (invalid uses
of freed pointers such as returning them from functions or passing
them to others), and others still that I have been thinking about
as possible future projects (e.g., detecting uses of uninitialized
arrays in string functions).

What are your thoughts about this sort of overlap?  Do you expect
us to enhance both sets of warnings in parallel, or do you see us
moving away from issuing warnings in the middle-end and toward
making the analyzer the main source of these kinds of diagnostics?
Maybe even replace some of the problematic middle-end warnings
with the analyzer?  What (if anything) should we do about warnings
issued for the same problems by both the middle-end and the analyzer?
Or about false negatives?  E.g., a bug detected by the middle-end
but not the analyzer or vice versa.

What do you see as the biggest pros and cons of either approach?
(Middle-end vs analyzer.)  What limitations is the analyzer
approach inherently subject to that the middle-end warnings aren't,
and vice versa?

How do we prioritize between the two approaches (e.g., choose
where to add a new warning)?

Martin


The analyzer runs as an IPA pass on the gimple SSA representation.
It associates state machines with data, with transitions at certain
statements and edges.  It finds "interesting" interprocedural paths
through the user's code, in which bogus state transitions happen.

For example, given:

free (ptr);
free (ptr);

at the first call, "ptr" transitions to the "freed" state, and
at the second call the analyzer complains, since "ptr" is already in
the "freed" state (unless "ptr" is NULL, in which case it stays in
the NULL state for both calls).

Specific state machines include:
- a checker for malloc/free, for detecting double-free, resource leaks,
   use-after-free, etc (sm-malloc.cc), and
- a checker for stdio's FILE stream API (sm-file.cc)

There are also two state-machine-based checkers that are just
proof-of-concept at this stage:
- a checker for tracking exposure of sensitive data (e.g.
   writing passwords to log files aka CWE-532), and
- a checker for tracking "taint", where data potentially under an
   attacker's control is used without sanitization for things like
   array indices (CWE-129).

There's a separation between the state machines and the analysis
engine, so it ought to be relatively easy to add new warnings.

For any given diagnostic emitted by a state machine, the analysis engine
generates the simplest feasible interprocedural path of control flow for
triggering the diagnostic.


Diagnostic paths


The patch kit adds support to GCC's diagnostic subsystem for optionally
associating a "diagnostic_path" with a diagnostic.  A diagnostic path
describes a sequence of events predicted by the compiler that leads to the
problem occurring, with their locations in the user's source, and text
descriptions.

For example, the following warning has a 6-event interprocedural path:

malloc-ipa-8-unchecked.c: In function 'make_boxed_int':
malloc-ipa-8-unchecked.c:21:13: warning: dereference of possibly-NULL 'result' 
[CWE-690] [-Wanalyzer-possible-null-dereference]
21 |   result->i = i;
   |   ~~^~~
   'make_boxed_int': events 1-2
 |
 |   18 | make_boxed_int (int i)
 |  | ^~
 |  | |
 |  | (1) entry to 'make_boxed_int'
 |   19 | {
 |   20 |   boxed_int *result = (boxed_int *)wrapped_malloc (sizeof 
(boxed_int));
 |  |
~~~
 |  ||
 |  |(2) calling 'wrapped_malloc' 
from 'make_boxed_int'
 |
 +--> 'wrapped_malloc': events 3-4
|
|7 | void *wrapped_malloc (size_t size)
|  |   ^~
|  |   |
|  |   (3) entry to 'wrapped_malloc'
|8 | {
|9 |   return malloc (size);
|  |  ~
|  |  |
|  |  (4) this call could return NULL
|
 <--+
 |
   'make_boxed_int': events 5-6
 |
 |   20 |   boxed_int *result = (boxed_int *)wrapped_malloc (sizeof 
(boxed_int));
 |  |
^~~
 |  |

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

2019-12-04 Thread Richard Biener

On Tue, Dec 3, 2019 at 5:52 PM David Malcolm  wrote:
>
> On Wed, 2019-11-20 at 11:18 +0100, Richard Biener wrote:
> > On Tue, Nov 19, 2019 at 11:02 PM David Malcolm 
> > wrote:
> > > > > The checker is implemented as a GCC plugin.
> > > > >
> > > > > The patch kit adds support for "in-tree" plugins i.e. GCC
> > > > > plugins
> > > > > that
> > > > > would live in the GCC source tree and be shipped as part of the
> > > > > GCC
> > > > > tarball,
> > > > > with a new:
> > > > >   --enable-plugins=[LIST OF PLUGIN NAMES]
> > > > > configure option, analogous to --enable-languages (the
> > > > > Makefile/configure
> > > > > machinery for handling in-tree GCC plugins is adapted from how
> > > > > we
> > > > > support
> > > > > frontends).
> > > >
> > > > I like that.  Implementing this as a plugin surely must help to
> > > > either
> > > > document the GCC plugin interface as powerful/mature for such a
> > > > task.  Or
> > > > make it so, if it isn't yet.  ;-)
> > >
> > > Our plugin "interface" as such is very broad.
> >
> > Just to sneak in here I don't like exposing our current plugin "non-
> > API"
> > more.  In fact I'd just build the analyzer into GCC with maybe an
> > option to disable its build (in case it is very fat?).
>
> My aim here is to provide a way for distributors to be able to disable
> its build - indeed, for now, for it to be disabled by default,
> requiring opting-in.
>
> My reasoning here is that the analyzer is middle-end code, but isn't as
> mature as the rest of the middle-end (but I'm working on getting it
> more mature).
>
> I want some way to label the code as a "technology preview", that
> people may want to experiment with, but to set expectations that this
> is a lot of new code and there will be bugs - but to make it available
> to make it easier for adventurous users to try it out.
>
> I hope that makes sense.
>
> I went down the "in-tree plugin" path by seeing the analogy with
> frontends, but yes, it would probably be simpler to just build it into
> GCC, guarded with a configure-time variable.  It's many thousand lines
> of non-trivial C++ code, and associated selftests and DejaGnu tests.
>
> Building with --enable-checking=release, and stripping the binaries and
> the plugin, I see:
>
> $ ls -al cc1 cc1plus plugin/analyzer_plugin.so
> -rwxrwxr-x. 1 david david 25921600 Dec  3 11:22 cc1
> -rwxrwxr-x. 1 david david 27473568 Dec  3 11:22 cc1plus
> -rwxrwxr-x. 1 david david   645256 Dec  3 11:22
> plugin/analyzer_plugin.so
>
> $ ls -alh cc1 cc1plus plugin/analyzer_plugin.so
> -rwxrwxr-x. 1 david david  25M Dec  3 11:22 cc1
> -rwxrwxr-x. 1 david david  27M Dec  3 11:22 cc1plus
> -rwxrwxr-x. 1 david david 631K Dec  3 11:22 plugin/analyzer_plugin.so
>
> so the plugin is about 2.5% of the size of the existing compiler.
>
> The analysis pass is very time-consuming when enabled via -fanalyzer.
> I'm aiming for "x2 compile-time in exchange for finding lots of bugs"
> as a tradeoff that users will be happy to make (by supplying
> -fanalyzer) - that's faster than comparable static analyzers I've been
> playing with.
>
> > From what I read it seems the analyzer could do with a proper
> > plugin API that just exposes introspection - and I really hope
> > somebody finds the time to complete (or rewrite...) the
> > proposed introspection API that ideally is even cross-compiler
> > (proven by implementing said API ontop of both GCC and clang/llvm).
> > That way the Analyzer would work with both GCC and clang [and golang
> > and rustc...].
>
> We've gone back and forth about what a GCC plugin API should look like;
> I'm not sure what the objectives are.
>
> For example, are we hoping to offer some kind of ABI guarantee to
> plugins so that we can patch GCC without plugins needing to be rebuilt?

Yes, I think that's desirable.

> If so, how strong is the ABI guarantee?  For example, do we directly
> expose the tree code enums and the gimple code enums?

No, we'd remap them semantically.

> Or is it more ambitious, and hoping to be cross-compiler, in which case
> are these enums themselves hidden?

Well, my original idea was to see what people really would use
(when just considering introspection and maybe very simple instrumentation).
And then sketch something independent of the underlying compiler.
And then have that API [or even ABI] implemented by more than one
compiler to see if that's viable.

> This feels like opening a massive can of worms, and orthogonal to my
> goal of giving GCC a static analysis framework.

Sure it is orthogonal.  The only reason it comes up here is that you
propose a "plugin" ;)

I'd rather have the current plugin non-API go away so having it
"fixed" by introducing in-tree plugins looks backwards to me in
that regard.

> > So it would be interesting if you could try to sketch the kind of API
> > the Analyzer needs?  That is, merely the detail on which it inspects
> > statements, the CFG and the callgraph.
>
> FWIW the symbols consumed by the plugin can be seen at:
>

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

2019-12-03 Thread Jakub Jelinek

On Tue, Dec 03, 2019 at 11:52:13AM -0500, David Malcolm wrote:
> > > Our plugin "interface" as such is very broad.
> > 
> > Just to sneak in here I don't like exposing our current plugin "non-
> > API"
> > more.  In fact I'd just build the analyzer into GCC with maybe an
> > option to disable its build (in case it is very fat?).
> 
> My aim here is to provide a way for distributors to be able to disable
> its build - indeed, for now, for it to be disabled by default,
> requiring opting-in.
> 
> My reasoning here is that the analyzer is middle-end code, but isn't as
> mature as the rest of the middle-end (but I'm working on getting it
> more mature).
> 
> I want some way to label the code as a "technology preview", that
> people may want to experiment with, but to set expectations that this
> is a lot of new code and there will be bugs - but to make it available
> to make it easier for adventurous users to try it out.
> 
> I hope that makes sense.
> 
> I went down the "in-tree plugin" path by seeing the analogy with
> frontends, but yes, it would probably be simpler to just build it into
> GCC, guarded with a configure-time variable.  It's many thousand lines
> of non-trivial C++ code, and associated selftests and DejaGnu tests.

I think it is enough to document it as tech preview in the documentation,
no need to have it as an in-tree plugin.  We have lots of options that had
such a state (perhaps undeclared) over the years, I'd consider
-fvtable-verify= to be such an option, or in the past e.g.
-fipa-matrix-reorg or -fipa-struct-reorg.  And 2.5% code growth isn't that
bad.  So, as long as the option isn't enabled by default, I think we'd be
fine.

Jakub

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

2019-12-03 Thread David Malcolm

On Wed, 2019-11-20 at 11:18 +0100, Richard Biener wrote:
> On Tue, Nov 19, 2019 at 11:02 PM David Malcolm 
> wrote:
> > > > The checker is implemented as a GCC plugin.
> > > > 
> > > > The patch kit adds support for "in-tree" plugins i.e. GCC
> > > > plugins
> > > > that
> > > > would live in the GCC source tree and be shipped as part of the
> > > > GCC
> > > > tarball,
> > > > with a new:
> > > >   --enable-plugins=[LIST OF PLUGIN NAMES]
> > > > configure option, analogous to --enable-languages (the
> > > > Makefile/configure
> > > > machinery for handling in-tree GCC plugins is adapted from how
> > > > we
> > > > support
> > > > frontends).
> > > 
> > > I like that.  Implementing this as a plugin surely must help to
> > > either
> > > document the GCC plugin interface as powerful/mature for such a
> > > task.  Or
> > > make it so, if it isn't yet.  ;-)
> > 
> > Our plugin "interface" as such is very broad.
> 
> Just to sneak in here I don't like exposing our current plugin "non-
> API"
> more.  In fact I'd just build the analyzer into GCC with maybe an
> option to disable its build (in case it is very fat?).

My aim here is to provide a way for distributors to be able to disable
its build - indeed, for now, for it to be disabled by default,
requiring opting-in.

My reasoning here is that the analyzer is middle-end code, but isn't as
mature as the rest of the middle-end (but I'm working on getting it
more mature).

I want some way to label the code as a "technology preview", that
people may want to experiment with, but to set expectations that this
is a lot of new code and there will be bugs - but to make it available
to make it easier for adventurous users to try it out.

I hope that makes sense.

I went down the "in-tree plugin" path by seeing the analogy with
frontends, but yes, it would probably be simpler to just build it into
GCC, guarded with a configure-time variable.  It's many thousand lines
of non-trivial C++ code, and associated selftests and DejaGnu tests.

Building with --enable-checking=release, and stripping the binaries and
the plugin, I see:

$ ls -al cc1 cc1plus plugin/analyzer_plugin.so 
-rwxrwxr-x. 1 david david 25921600 Dec  3 11:22 cc1
-rwxrwxr-x. 1 david david 27473568 Dec  3 11:22 cc1plus
-rwxrwxr-x. 1 david david   645256 Dec  3 11:22
plugin/analyzer_plugin.so

$ ls -alh cc1 cc1plus plugin/analyzer_plugin.so 
-rwxrwxr-x. 1 david david  25M Dec  3 11:22 cc1
-rwxrwxr-x. 1 david david  27M Dec  3 11:22 cc1plus
-rwxrwxr-x. 1 david david 631K Dec  3 11:22 plugin/analyzer_plugin.so

so the plugin is about 2.5% of the size of the existing compiler.

The analysis pass is very time-consuming when enabled via -fanalyzer. 
I'm aiming for "x2 compile-time in exchange for finding lots of bugs"
as a tradeoff that users will be happy to make (by supplying
-fanalyzer) - that's faster than comparable static analyzers I've been
playing with.

> From what I read it seems the analyzer could do with a proper
> plugin API that just exposes introspection - and I really hope
> somebody finds the time to complete (or rewrite...) the
> proposed introspection API that ideally is even cross-compiler
> (proven by implementing said API ontop of both GCC and clang/llvm).
> That way the Analyzer would work with both GCC and clang [and golang
> and rustc...].

We've gone back and forth about what a GCC plugin API should look like;
I'm not sure what the objectives are.

For example, are we hoping to offer some kind of ABI guarantee to
plugins so that we can patch GCC without plugins needing to be rebuilt?
If so, how strong is the ABI guarantee?  For example, do we directly
expose the tree code enums and the gimple code enums?

Or is it more ambitious, and hoping to be cross-compiler, in which case
are these enums themselves hidden?

This feels like opening a massive can of worms, and orthogonal to my
goal of giving GCC a static analysis framework.

> So it would be interesting if you could try to sketch the kind of API
> the Analyzer needs?  That is, merely the detail on which it inspects
> statements, the CFG and the callgraph.

FWIW the symbols consumed by the plugin can be seen at:
 https://dmalcolm.fedorapeople.org/gcc/2019-11-27/symbols-used.txt

This is the result of:
  eu-readelf -s plugin/analyzer_plugin.so |c++filt|grep UNDEF

Surveying that, the plugin:
- creates a pass
- views the callgraph and the functions (e.g. ipa_reverse_postorder)
- views CFGs and SSA representation (including statements)
- uses the diagnostic subsystem (which parts of the patch kit extend,
adding e.g. control flow paths), e.g. creating and subclassing
rich_locations, subclassing diagnostic_path and diagnostic_event
- calls into middle-end support functions like
useless_type_conversion_p
- uses GCC types such as bitmap, inchash, wideint
- creates temporary trees
- has selftests
...etc.

But there are inline uses of various functions that don't show up in
that list (e.g. the various gimple_* accessor functions - grepping the
sourc

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

2019-12-01 Thread Eric Gallager

On 11/16/19, Thomas Schwinge  wrote:
> Hi David!
>
> On 2019-11-15T20:22:47-0500, David Malcolm  wrote:
>> This patch kit
>
> (I have not looked at the patches.)  ;-)
>
>> introduces a static analysis pass for GCC that can diagnose
>> various kinds of problems in C code at compile-time (e.g. double-free,
>> use-after-free, etc).
>
> Sounds great from the description!
>
>
> Would it make sense to add to the wiki page
>  a (high-level)
> comparison to other static analyzers (Coverity, cppcheck,
> clang-static-analyzer, others?), in terms of how they work, what their
> respective benefits are, what their design goals are, etc.  (Of course
> understanding that yours is much less mature at this point; talking about
> high-level design rather than current implementation status.)
>
> For example, why do we want that in GCC instead of an external tool -- in
> part covered in your Rationale.

There are a lot of bug reports open for requests for warnings that
this analyzer could solve. Users clearly want this in GCC, or else
they wouldn't keep making these requests.

> Can a compiler-side implementation
> benefit from having more information available than an external tool?
> GCC-side implementation is readily available (modulo GCC plugin
> installation?) vs. external ones need to be installed/set up first.
> GCC-side one only works with GCC-supported languages.  GCC-side one
> analyzes actual code being compiled -- thinking about preprocessor-level
> '#if' etc., which surely are problematic for external tools that are not
> actually replicating a real build.  And so on.  (If you don't want to
> spell out Coverity, cppcheck, clang-static-analyzer, etc., maybe just
> compare yours to external tools.)
>
> Just an idea, because I wondered about these things.
>
>
>> The analyzer runs as an IPA pass on the gimple SSA representation.
>> It associates state machines with data, with transitions at certain
>> statements and edges.  It finds "interesting" interprocedural paths
>> through the user's code, in which bogus state transitions happen.
>>
>> For example, given:
>>
>>free (ptr);
>>free (ptr);
>>
>> at the first call, "ptr" transitions to the "freed" state, and
>> at the second call the analyzer complains, since "ptr" is already in
>> the "freed" state (unless "ptr" is NULL, in which case it stays in
>> the NULL state for both calls).
>>
>> Specific state machines include:
>> - a checker for malloc/free, for detecting double-free, resource leaks,
>>   use-after-free, etc (sm-malloc.cc), and
>
> I can immediately see how this can be useful for a bunch of
> 'malloc'/'free'-like etc. OpenACC Runtime Library calls as well as source
> code directives.  ..., and this would've flagged existing code in the
> libgomp OpenACC tests, which recently has given me some grief. Short
> summary/examples:
>
> In addition to host-side 'malloc'/'free', there is device-side (separate
> memory space) 'acc_malloc'/'acc_free'.  Static checking example: don't
> mix up host-side and device-side pointers.  (Both are normal C/C++
> pointers.  Hmm, maybe such checking could easily be implemented even
> outside of your checker by annotating the respective function
> declarations with an attribute describing which in/out arguments are
> host-side vs. device-side pointers.)
>
> Then, there are functions to "map" host-side and device-side memory:
> 'acc_map_data'/'acc_unmap_data'.  Static checking example: you must not
> 'acc_free' memory spaces that are still mapped.
>
> Then, there are functions like 'acc_create' (or equivalent directives
> like '#pragma acc create') doing both 'acc_malloc', 'acc_map_data'
> (plus/depending on internal reference counting).  Static checking
> example: for a pointer returned by 'acc_create" etc., you must use
> 'acc_delete' etc. instead of 'acc_free', which first does
> 'acc_unmap_data' before interal 'acc_free' (..., and again all that
> depending on reference counting).  (Might be "interesting" to teach your
> checker about the reference counting -- if that is actually necessary;
> needs further thought.)
>
>
>> The checker is implemented as a GCC plugin.
>>
>> The patch kit adds support for "in-tree" plugins i.e. GCC plugins that
>> would live in the GCC source tree and be shipped as part of the GCC
>> tarball,
>> with a new:
>>   --enable-plugins=[LIST OF PLUGIN NAMES]
>> configure option, analogous to --enable-languages (the Makefile/configure
>> machinery for handling in-tree GCC plugins is adapted from how we support
>> frontends).
>
> I like that.  Implementing this as a plugin surely must help to either
> document the GCC plugin interface as powerful/mature for such a task.  Or
> make it so, if it isn't yet.  ;-)

Nick Clifton was bringing this up as a point in his talk on his
annobin plugin at Cauldron; this should make him happy.

>
>> The default is for no such plugins to be enabled, so the default would
>> be that the checker isn't built - you'

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

2019-12-01 Thread Eric Gallager

On 11/20/19, Richard Biener  wrote:
> On Tue, Nov 19, 2019 at 11:02 PM David Malcolm  wrote:
>>
>> > > The checker is implemented as a GCC plugin.
>> > >
>> > > The patch kit adds support for "in-tree" plugins i.e. GCC plugins
>> > > that
>> > > would live in the GCC source tree and be shipped as part of the GCC
>> > > tarball,
>> > > with a new:
>> > >   --enable-plugins=[LIST OF PLUGIN NAMES]
>> > > configure option, analogous to --enable-languages (the
>> > > Makefile/configure
>> > > machinery for handling in-tree GCC plugins is adapted from how we
>> > > support
>> > > frontends).
>> >
>> > I like that.  Implementing this as a plugin surely must help to
>> > either
>> > document the GCC plugin interface as powerful/mature for such a
>> > task.  Or
>> > make it so, if it isn't yet.  ;-)
>>
>> Our plugin "interface" as such is very broad.
>
> Just to sneak in here I don't like exposing our current plugin "non-API"
> more.  In fact I'd just build the analyzer into GCC with maybe an
> option to disable its build (in case it is very fat?).
>
> From what I read it seems the analyzer could do with a proper
> plugin API that just exposes introspection - and I really hope
> somebody finds the time to complete (or rewrite...) the
> proposed introspection API that ideally is even cross-compiler
> (proven by implementing said API ontop of both GCC and clang/llvm).
> That way the Analyzer would work with both GCC and clang [and golang
> and rustc...].

That might be a good idea for a long-term goal, but I just hope it
doesn't get in the way too much of the analyzer getting into GCC in
the short-term. The analyzer seems like it could do some really cool
analysis, and I'd like to use it sooner rather than later. Rewriting
the plugin API sounds like it could take a really long time...

>
> So it would be interesting if you could try to sketch the kind of API
> the Analyzer needs?  That is, merely the detail on which it inspects
> statements, the CFG and the callgraph.
>
> Richard.
>

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

2019-11-20 Thread Richard Biener

On Tue, Nov 19, 2019 at 11:02 PM David Malcolm  wrote:
>
> > > The checker is implemented as a GCC plugin.
> > >
> > > The patch kit adds support for "in-tree" plugins i.e. GCC plugins
> > > that
> > > would live in the GCC source tree and be shipped as part of the GCC
> > > tarball,
> > > with a new:
> > >   --enable-plugins=[LIST OF PLUGIN NAMES]
> > > configure option, analogous to --enable-languages (the
> > > Makefile/configure
> > > machinery for handling in-tree GCC plugins is adapted from how we
> > > support
> > > frontends).
> >
> > I like that.  Implementing this as a plugin surely must help to
> > either
> > document the GCC plugin interface as powerful/mature for such a
> > task.  Or
> > make it so, if it isn't yet.  ;-)
>
> Our plugin "interface" as such is very broad.

Just to sneak in here I don't like exposing our current plugin "non-API"
more.  In fact I'd just build the analyzer into GCC with maybe an
option to disable its build (in case it is very fat?).

>From what I read it seems the analyzer could do with a proper
plugin API that just exposes introspection - and I really hope
somebody finds the time to complete (or rewrite...) the
proposed introspection API that ideally is even cross-compiler
(proven by implementing said API ontop of both GCC and clang/llvm).
That way the Analyzer would work with both GCC and clang [and golang
and rustc...].

So it would be interesting if you could try to sketch the kind of API
the Analyzer needs?  That is, merely the detail on which it inspects
statements, the CFG and the callgraph.

Richard.

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

2019-11-19 Thread David Malcolm

On Sat, 2019-11-16 at 21:42 +0100, Thomas Schwinge wrote:
> Hi David!
> 
> On 2019-11-15T20:22:47-0500, David Malcolm 
> wrote:
> > This patch kit
> 
> (I have not looked at the patches.)  ;-)
> 
> > introduces a static analysis pass for GCC that can diagnose
> > various kinds of problems in C code at compile-time (e.g. double-
> > free,
> > use-after-free, etc).
> 
> Sounds great from the description!

Thanks.

> Would it make sense to add to the wiki page
>  a (high-level)
> comparison to other static analyzers (Coverity, cppcheck,
> clang-static-analyzer, others?), in terms of how they work, what
> their
> respective benefits are, what their design goals are, etc.  (Of
> course
> understanding that yours is much less mature at this point; talking
> about
> high-level design rather than current implementation status.)
> 
> For example, why do we want that in GCC instead of an external tool
> -- in
> part covered in your Rationale.  Can a compiler-side implementation
> benefit from having more information available than an external tool?
> GCC-side implementation is readily available (modulo GCC plugin
> installation?) vs. external ones need to be installed/set up first.
> GCC-side one only works with GCC-supported languages.  GCC-side one
> analyzes actual code being compiled -- thinking about preprocessor-
> level
> '#if' etc., which surely are problematic for external tools that are
> not
> actually replicating a real build.  And so on.  (If you don't want to
> spell out Coverity, cppcheck, clang-static-analyzer, etc., maybe just
> compare yours to external tools.)
> 
> Just an idea, because I wondered about these things.

Thanks; I've added some notes to the "Rationale" section of the wiki
page.

A lot of the information you're after is hidden in patch 2 of the kit,
in an analysis.texi (though admittedly that's hard to read in "patch
that adds a .texi file" form).

For now, I've uploaded a prebuilt version of the HTML to:

https://dmalcolm.fedorapeople.org/gcc/2019-11-19/gccint/Static-Analyzer.html


> > The analyzer runs as an IPA pass on the gimple SSA representation.
> > It associates state machines with data, with transitions at certain
> > statements and edges.  It finds "interesting" interprocedural paths
> > through the user's code, in which bogus state transitions happen.
> > 
> > For example, given:
> > 
> >free (ptr);
> >free (ptr);
> > 
> > at the first call, "ptr" transitions to the "freed" state, and
> > at the second call the analyzer complains, since "ptr" is already
> > in
> > the "freed" state (unless "ptr" is NULL, in which case it stays in
> > the NULL state for both calls).
> > 
> > Specific state machines include:
> > - a checker for malloc/free, for detecting double-free, resource
> > leaks,
> >   use-after-free, etc (sm-malloc.cc), and
> 
> I can immediately see how this can be useful for a bunch of
> 'malloc'/'free'-like etc. OpenACC Runtime Library calls as well as
> source
> code directives.  ..., and this would've flagged existing code in the
> libgomp OpenACC tests, which recently has given me some grief. Short
> summary/examples:
> 
> In addition to host-side 'malloc'/'free', there is device-side
> (separate
> memory space) 'acc_malloc'/'acc_free'. 

I've been thinking about generalizing the malloc/free checker to cover
resource acquisition/release pairs, adding a "domain" for the
allocation, where we'd complain if the resource release function isn't
of the same domain as the resource acquisition function.

Allocation domains might be:
  malloc/free
  C++ scalar new/delete
  C++ array new/delete
  FILE * (fopen/fclose)
  "foo_alloc"/"foo_release" for libfoo (i.e. user-extensible, via
attributes)

and thus catch things like deleting with scalar delete when the buffer
was allocated using new[], and various kinds of layering violations.

I'm afraid that I'm not very familiar with OpenACC.  Would
acc_malloc/acc_free fit into that pattern, or would more be needed? 
For example, can you e.g. dereference a device-side pointer in host
code, or would we ideally issue a diagnostic about that?

>  Static checking example: don't
> mix up host-side and device-side pointers.  (Both are normal C/C++
> pointers.  Hmm, maybe such checking could easily be implemented even
> outside of your checker by annotating the respective function
> declarations with an attribute describing which in/out arguments are
> host-side vs. device-side pointers.)
> 
> Then, there are functions to "map" host-side and device-side memory:
> 'acc_map_data'/'acc_unmap_data'.  Static checking example: you must
> not
> 'acc_free' memory spaces that are still mapped.

It sounds like this state machine is somewhat more complicated.

Is there a state transition diagram for this somewhere?  I don't have
that for my state machines, but there are at least lists of states; see
e.g. the various state_t within malloc_state_machine
near the top of:
https://gcc.gnu.or

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

2019-11-16 Thread Thomas Schwinge

Hi David!

On 2019-11-15T20:22:47-0500, David Malcolm  wrote:
> This patch kit

(I have not looked at the patches.)  ;-)

> introduces a static analysis pass for GCC that can diagnose
> various kinds of problems in C code at compile-time (e.g. double-free,
> use-after-free, etc).

Sounds great from the description!

Would it make sense to add to the wiki page
 a (high-level)
comparison to other static analyzers (Coverity, cppcheck,
clang-static-analyzer, others?), in terms of how they work, what their
respective benefits are, what their design goals are, etc.  (Of course
understanding that yours is much less mature at this point; talking about
high-level design rather than current implementation status.)

For example, why do we want that in GCC instead of an external tool -- in
part covered in your Rationale.  Can a compiler-side implementation
benefit from having more information available than an external tool?
GCC-side implementation is readily available (modulo GCC plugin
installation?) vs. external ones need to be installed/set up first.
GCC-side one only works with GCC-supported languages.  GCC-side one
analyzes actual code being compiled -- thinking about preprocessor-level
'#if' etc., which surely are problematic for external tools that are not
actually replicating a real build.  And so on.  (If you don't want to
spell out Coverity, cppcheck, clang-static-analyzer, etc., maybe just
compare yours to external tools.)

Just an idea, because I wondered about these things.

> The analyzer runs as an IPA pass on the gimple SSA representation.
> It associates state machines with data, with transitions at certain
> statements and edges.  It finds "interesting" interprocedural paths
> through the user's code, in which bogus state transitions happen.
>
> For example, given:
>
>free (ptr);
>free (ptr);
>
> at the first call, "ptr" transitions to the "freed" state, and
> at the second call the analyzer complains, since "ptr" is already in
> the "freed" state (unless "ptr" is NULL, in which case it stays in
> the NULL state for both calls).
>
> Specific state machines include:
> - a checker for malloc/free, for detecting double-free, resource leaks,
>   use-after-free, etc (sm-malloc.cc), and

I can immediately see how this can be useful for a bunch of
'malloc'/'free'-like etc. OpenACC Runtime Library calls as well as source
code directives.  ..., and this would've flagged existing code in the
libgomp OpenACC tests, which recently has given me some grief. Short
summary/examples:

In addition to host-side 'malloc'/'free', there is device-side (separate
memory space) 'acc_malloc'/'acc_free'.  Static checking example: don't
mix up host-side and device-side pointers.  (Both are normal C/C++
pointers.  Hmm, maybe such checking could easily be implemented even
outside of your checker by annotating the respective function
declarations with an attribute describing which in/out arguments are
host-side vs. device-side pointers.)

Then, there are functions to "map" host-side and device-side memory:
'acc_map_data'/'acc_unmap_data'.  Static checking example: you must not
'acc_free' memory spaces that are still mapped.

Then, there are functions like 'acc_create' (or equivalent directives
like '#pragma acc create') doing both 'acc_malloc', 'acc_map_data'
(plus/depending on internal reference counting).  Static checking
example: for a pointer returned by 'acc_create" etc., you must use
'acc_delete' etc. instead of 'acc_free', which first does
'acc_unmap_data' before interal 'acc_free' (..., and again all that
depending on reference counting).  (Might be "interesting" to teach your
checker about the reference counting -- if that is actually necessary;
needs further thought.)

> The checker is implemented as a GCC plugin.
>
> The patch kit adds support for "in-tree" plugins i.e. GCC plugins that
> would live in the GCC source tree and be shipped as part of the GCC tarball,
> with a new:
>   --enable-plugins=[LIST OF PLUGIN NAMES]
> configure option, analogous to --enable-languages (the Makefile/configure
> machinery for handling in-tree GCC plugins is adapted from how we support
> frontends).

I like that.  Implementing this as a plugin surely must help to either
document the GCC plugin interface as powerful/mature for such a task.  Or
make it so, if it isn't yet.  ;-)

> The default is for no such plugins to be enabled, so the default would
> be that the checker isn't built - you'd have to opt-in to building it,
> with --enable-plugins=analyzer

I'd favor a default of '--enable-plugins=default' which enables the
"usable" plugins.

> It's not clear to me whether I should focus on:
>
> (a) pruning the scope of the checker so that it works well on
> *intra*procedural C examples (and bail on anything more complex), perhaps
> targetting GCC 10 as an optional extra hidden behind
> --enable-plugins=analyzer, or
>
> (b) work on deeper interprocedural analysis (and fixin

[PATCH 00/49] RFC: Add a static analysis framework to GCC

2019-11-15 Thread David Malcolm

This patch kit introduces a static analysis pass for GCC that can diagnose
various kinds of problems in C code at compile-time (e.g. double-free,
use-after-free, etc).

The analyzer runs as an IPA pass on the gimple SSA representation.
It associates state machines with data, with transitions at certain
statements and edges.  It finds "interesting" interprocedural paths
through the user's code, in which bogus state transitions happen.

For example, given:

   free (ptr);
   free (ptr);

at the first call, "ptr" transitions to the "freed" state, and
at the second call the analyzer complains, since "ptr" is already in
the "freed" state (unless "ptr" is NULL, in which case it stays in
the NULL state for both calls).

Specific state machines include:
- a checker for malloc/free, for detecting double-free, resource leaks,
  use-after-free, etc (sm-malloc.cc), and
- a checker for stdio's FILE stream API (sm-file.cc)

There are also two state-machine-based checkers that are just
proof-of-concept at this stage:
- a checker for tracking exposure of sensitive data (e.g.
  writing passwords to log files aka CWE-532), and
- a checker for tracking "taint", where data potentially under an
  attacker's control is used without sanitization for things like
  array indices (CWE-129).

There's a separation between the state machines and the analysis
engine, so it ought to be relatively easy to add new warnings.

For any given diagnostic emitted by a state machine, the analysis engine
generates the simplest feasible interprocedural path of control flow for
triggering the diagnostic.


Diagnostic paths


The patch kit adds support to GCC's diagnostic subsystem for optionally
associating a "diagnostic_path" with a diagnostic.  A diagnostic path
describes a sequence of events predicted by the compiler that leads to the
problem occurring, with their locations in the user's source, and text
descriptions.

For example, the following warning has a 6-event interprocedural path:

malloc-ipa-8-unchecked.c: In function 'make_boxed_int':
malloc-ipa-8-unchecked.c:21:13: warning: dereference of possibly-NULL 'result' 
[CWE-690] [-Wanalyzer-possible-null-dereference]
   21 |   result->i = i;
  |   ~~^~~
  'make_boxed_int': events 1-2
|
|   18 | make_boxed_int (int i)
|  | ^~
|  | |
|  | (1) entry to 'make_boxed_int'
|   19 | {
|   20 |   boxed_int *result = (boxed_int *)wrapped_malloc (sizeof 
(boxed_int));
|  |
~~~
|  ||
|  |(2) calling 'wrapped_malloc' 
from 'make_boxed_int'
|
+--> 'wrapped_malloc': events 3-4
   |
   |7 | void *wrapped_malloc (size_t size)
   |  |   ^~
   |  |   |
   |  |   (3) entry to 'wrapped_malloc'
   |8 | {
   |9 |   return malloc (size);
   |  |  ~
   |  |  |
   |  |  (4) this call could return NULL
   |
<--+
|
  'make_boxed_int': events 5-6
|
|   20 |   boxed_int *result = (boxed_int *)wrapped_malloc (sizeof 
(boxed_int));
|  |
^~~
|  ||
|  |(5) possible return of NULL to 
'make_boxed_int' from 'wrapped_malloc'
|   21 |   result->i = i;
|  |   ~
|  | |
|  | (6) 'result' could be NULL: unchecked value from (4)
|

The diagnostic-printing code has consolidated the path into 3 runs of events
(where the events are near each other and within the same function), using
ASCII art to show the interprocedural call and return.

A colorized version of the above can be seen at:
  https://dmalcolm.fedorapeople.org/gcc/2019-11-13/test.html

Other examples can be seen at:
  https://dmalcolm.fedorapeople.org/gcc/2019-11-13/malloc-1.c.html
  https://dmalcolm.fedorapeople.org/gcc/2019-11-13/setjmp-4.c.html

An example of detecting a historical double-free CVE can be seen at:
  https://dmalcolm.fedorapeople.org/gcc/2019-11-13/CVE-2005-1689.html
(there are also some false positives in this report)


Diagnostic metadata
===

The patch kit also adds the ability to associate additional metadata with
a diagnostic. The only such metadata added by the patch kit are CWE
classifications (for the new warnings), such as the CWE-690 in the warning
above, or CWE-401 in this example:

malloc-1.c: In function 'test_42a':
malloc-1.c:466:1: warning: leak of 'p' [CWE-401] [-Wanalyzer-malloc-leak]
  466 | }
  | ^
  'test_42a': events 1-2
   |
   |  463 |   void *p = malloc (1024);
   |  | ^
   |  | |

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

[PATCH 00/49] RFC: Add a static analysis framework to GCC

15 matches

Site Navigation

Mail list logo

Footer information