PING: Re: [PATCH] c, analyzer: support named constants in analyzer [PR106302]

2022-11-07 Thread David Malcolm via Gcc-patches
I'd like to "ping" the C frontend parts of this patch:
  https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604739.html

FWIW I've just posted the socket patch that I referred to, here:
  https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605281.html
which depends on this patch.

Thanks
Dave

On Mon, 2022-10-31 at 15:07 -0400, David Malcolm wrote:
> The analyzer's file-descriptor state machine tracks the access mode
> of
> opened files, so that it can emit -Wanalyzer-fd-access-mode-mismatch.
> 
> To do this, its symbolic execution needs to "know" the values of the
> constants "O_RDONLY", "O_WRONLY", and "O_ACCMODE".  Currently
> analyzer/sm-fd.cc simply uses these values directly from the build-
> time
> header files, but these are the values on the host, not those from
> the
> target, which could be different (PR analyzer/106302).
> 
> In an earlier discussion of this issue:
>   https://gcc.gnu.org/pipermail/gcc/2022-June/238954.html
> we talked about adding a target hook for this.
> 
> However, I've also been experimenting with extending the fd state
> machine to track sockets (PR analyzer/106140).  For this, it's useful
> to
> "know" the values of the constants "SOCK_STREAM" and "SOCK_DGRAM".
> Unfortunately, these seem to have many arbitrary differences from
> target
> to target.
> 
> For example: Linux/glibc general has SOCK_STREAM == 1, SOCK_DGRAM ==
> 2,
> as does AIX, but annoyingly, e.g. Linux on MIPS has them the other
> way
> around.
> 
> It seems to me that as the analyzer grows more ambitious modeling of
> the
> behavior of APIs (perhaps via plugins) it's more likely that the
> analyzer will need to know the values of named constants, which might
> not even exist on the host.
> 
> For example, at LPC it was suggested to me that -fanalyzer could
> check
> rules about memory management inside the Linux kernel (probably via a
> plugin), but doing so involves a bunch of GFP_* flags (see PR
> 107472).
> 
> So rather than trying to capture all this knowledge in a target hook,
> this patch attempts to get at named constant values from the user's
> source code.
> 
> The patch adds an interface for frontends to call into the analyzer
> as
> the translation unit finishes.  The analyzer can then call back into
> the
> frontend to ask about the values of the named constants it cares
> about
> whilst the frontend's data structures are still around.
> 
> The patch implements this for the C frontend, which looks up the
> names
> by looking for named CONST_DECLs (which handles enum values). 
> Failing
> that, it attempts to look up the values of macros but only the
> simplest
> cases are supported (a non-traditional macro with a single CPP_NUMBER
> token).  It does this by building a buffer containing the macro
> definition and rerunning a lexer on it.
> 
> The analyzer gracefully handles the cases where named values aren't
> found (such as anything more complicated than described above).
> 
> The patch ports the analyzer to use this mechanism for "O_RDONLY",
> "O_WRONLY", and "O_ACCMODE".  I have successfully tested my socket
> patch
> to also use this for "SOCK_STREAM" and "SOCK_DGRAM", so the technique
> seems to work.
> 
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> 
> Are the C frontend parts OK for trunk?
> 
> Thanks
> Dave
> 
> gcc/ChangeLog:
> PR analyzer/106302
> * Makefile.in (ANALYZER_OBJS): Add analyzer/analyzer-
> language.o.
> (GTFILES): Add analyzer/analyzer-language.cc.
> 
> gcc/analyzer/ChangeLog:
> PR analyzer/106302
> * analyzer-language.cc: New file.
> * analyzer-language.h: New file.
> * analyzer.h (get_stashed_constant_by_name): New decl.
> (log_stashed_constants): New decl.
> * engine.cc (impl_run_checkers): Call log_stashed_constants.
> * region-model-impl-calls.cc
> (region_model::impl_call_analyzer_dump_named_constant): New.
> * region-model.cc (region_model::on_stmt_pre): Handle
> __analyzer_dump_named_constant.
> * region-model.h
> (region_model::impl_call_analyzer_dump_named_constant): New
> decl.
> * sm-fd.cc (fd_state_machine::m_O_ACCMODE): New.
> (fd_state_machine::m_O_RDONLY): New.
> (fd_state_machine::m_O_WRONLY): New.
> (fd_state_machine::fd_state_machine): Initialize the new
> fields.
> (fd_state_machine::get_access_mode_from_flag): Use the new
> fields,
> rather than using the host values.
> 
> gcc/c/ChangeLog:
> PR analyzer/106302
> * c-parser.cc: Include "analyzer/analyzer-language.h" and
> "toplev.h".
> (class ana::c_translation_unit): New.
> (c_parser_translation_unit): Call
> ana::on_finish_translation_unit.
> 
> gcc/ChangeLog:
> * doc/analyzer.texi: Document __analyzer_dump_named_constant.
> 
> gcc/testsuite/ChangeLog:
> * gcc.dg/analyzer/analyzer-decls.h
> (__analyzer_dump_named_constant): New decl.
> * 

[PATCH] c, analyzer: support named constants in analyzer [PR106302]

2022-10-31 Thread David Malcolm via Gcc-patches
The analyzer's file-descriptor state machine tracks the access mode of
opened files, so that it can emit -Wanalyzer-fd-access-mode-mismatch.

To do this, its symbolic execution needs to "know" the values of the
constants "O_RDONLY", "O_WRONLY", and "O_ACCMODE".  Currently
analyzer/sm-fd.cc simply uses these values directly from the build-time
header files, but these are the values on the host, not those from the
target, which could be different (PR analyzer/106302).

In an earlier discussion of this issue:
  https://gcc.gnu.org/pipermail/gcc/2022-June/238954.html
we talked about adding a target hook for this.

However, I've also been experimenting with extending the fd state
machine to track sockets (PR analyzer/106140).  For this, it's useful to
"know" the values of the constants "SOCK_STREAM" and "SOCK_DGRAM".
Unfortunately, these seem to have many arbitrary differences from target
to target.

For example: Linux/glibc general has SOCK_STREAM == 1, SOCK_DGRAM == 2,
as does AIX, but annoyingly, e.g. Linux on MIPS has them the other way
around.

It seems to me that as the analyzer grows more ambitious modeling of the
behavior of APIs (perhaps via plugins) it's more likely that the
analyzer will need to know the values of named constants, which might
not even exist on the host.

For example, at LPC it was suggested to me that -fanalyzer could check
rules about memory management inside the Linux kernel (probably via a
plugin), but doing so involves a bunch of GFP_* flags (see PR 107472).

So rather than trying to capture all this knowledge in a target hook,
this patch attempts to get at named constant values from the user's
source code.

The patch adds an interface for frontends to call into the analyzer as
the translation unit finishes.  The analyzer can then call back into the
frontend to ask about the values of the named constants it cares about
whilst the frontend's data structures are still around.

The patch implements this for the C frontend, which looks up the names
by looking for named CONST_DECLs (which handles enum values).  Failing
that, it attempts to look up the values of macros but only the simplest
cases are supported (a non-traditional macro with a single CPP_NUMBER
token).  It does this by building a buffer containing the macro
definition and rerunning a lexer on it.

The analyzer gracefully handles the cases where named values aren't
found (such as anything more complicated than described above).

The patch ports the analyzer to use this mechanism for "O_RDONLY",
"O_WRONLY", and "O_ACCMODE".  I have successfully tested my socket patch
to also use this for "SOCK_STREAM" and "SOCK_DGRAM", so the technique
seems to work.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

Are the C frontend parts OK for trunk?

Thanks
Dave

gcc/ChangeLog:
PR analyzer/106302
* Makefile.in (ANALYZER_OBJS): Add analyzer/analyzer-language.o.
(GTFILES): Add analyzer/analyzer-language.cc.

gcc/analyzer/ChangeLog:
PR analyzer/106302
* analyzer-language.cc: New file.
* analyzer-language.h: New file.
* analyzer.h (get_stashed_constant_by_name): New decl.
(log_stashed_constants): New decl.
* engine.cc (impl_run_checkers): Call log_stashed_constants.
* region-model-impl-calls.cc
(region_model::impl_call_analyzer_dump_named_constant): New.
* region-model.cc (region_model::on_stmt_pre): Handle
__analyzer_dump_named_constant.
* region-model.h
(region_model::impl_call_analyzer_dump_named_constant): New decl.
* sm-fd.cc (fd_state_machine::m_O_ACCMODE): New.
(fd_state_machine::m_O_RDONLY): New.
(fd_state_machine::m_O_WRONLY): New.
(fd_state_machine::fd_state_machine): Initialize the new fields.
(fd_state_machine::get_access_mode_from_flag): Use the new fields,
rather than using the host values.

gcc/c/ChangeLog:
PR analyzer/106302
* c-parser.cc: Include "analyzer/analyzer-language.h" and "toplev.h".
(class ana::c_translation_unit): New.
(c_parser_translation_unit): Call ana::on_finish_translation_unit.

gcc/ChangeLog:
* doc/analyzer.texi: Document __analyzer_dump_named_constant.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/analyzer-decls.h
(__analyzer_dump_named_constant): New decl.
* gcc.dg/analyzer/fd-4.c (void): Likewise.
(O_ACCMODE): Define.
* gcc.dg/analyzer/fd-access-mode-enum.c: New test, based on .
* gcc.dg/analyzer/fd-5.c: ...this.  Rename to...
* gcc.dg/analyzer/fd-access-mode-macros.c: ...this.
(O_ACCMODE): Define.
* gcc.dg/analyzer/fd-access-mode-target-headers.c: New test, also
based on fd-5.c.
(test_sm_fd_constants): New.
* gcc.dg/analyzer/fd-dup-1.c (O_ACCMODE): Define.
* gcc.dg/analyzer/named-constants-via-enum.c: New test.
* gcc.dg/analyzer/named-constants-via-macros-2.c: New test.