[PATCH 2/2] testsuite: Add more allocation size tests for conjured svalues [PR110014]

2023-06-09 Thread Tim Lange
This patch adds the reproducers reported in PR 110014 as test cases. The
false positives in those cases are already fixed with PR 109577.

2023-06-09  Tim Lange  

PR analyzer/110014

gcc/testsuite/ChangeLog:

* gcc.dg/analyzer/pr110014.c: New tests.

---
 gcc/testsuite/gcc.dg/analyzer/pr110014.c | 25 
 1 file changed, 25 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr110014.c

diff --git a/gcc/testsuite/gcc.dg/analyzer/pr110014.c 
b/gcc/testsuite/gcc.dg/analyzer/pr110014.c
new file mode 100644
index 000..d76b8781413
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr110014.c
@@ -0,0 +1,25 @@
+void *realloc (void *, unsigned long)
+  __attribute__((__nothrow__, __leaf__))
+  __attribute__((__warn_unused_result__)) __attribute__((__alloc_size__ (2)));
+
+long *
+slurp (long *buffer, unsigned long file_size)
+{
+  unsigned long cc;
+  if (!__builtin_add_overflow (file_size - file_size % sizeof (long),
+  2 * sizeof (long), ))
+buffer = realloc (buffer, cc);
+  return buffer;
+}
+
+long *
+slurp1 (long *buffer, unsigned long file_size)
+{
+  return realloc (buffer, file_size - file_size % sizeof (long));
+}
+
+long *
+slurp2 (long *buffer, unsigned long file_size)
+{
+  return realloc (buffer, (file_size / sizeof (long)) * sizeof (long));
+}
-- 
2.40.1



[PATCH 1/2] analyzer: Fix allocation size false positive on conjured svalue [PR109577]

2023-06-09 Thread Tim Lange
Currently, the analyzer tries to prove that the allocation size is a
multiple of the pointee's type size.  This patch reverses the behavior
to try to prove that the expression is not a multiple of the pointee's
type size.  With this change, each unhandled case should be gracefully
considered as correct.  This fixes the bug reported in PR 109577 by
Paul Eggert.

Regression-tested on Linux x86-64 with -m32 and -m64.

2023-06-09  Tim Lange  

PR analyzer/109577

gcc/analyzer/ChangeLog:

* constraint-manager.cc (class sval_finder): Visitor to find
childs in svalue trees.
(constraint_manager::sval_constrained_p): Add new function to
check whether a sval might be part of an constraint.
* constraint-manager.h: Add sval_constrained_p function.
* region-model.cc (class size_visitor): Reverse behavior to not
emit a warning on not explicitly considered cases.
(region_model::check_region_size):
Adapt to size_visitor changes.

gcc/testsuite/ChangeLog:

* gcc.dg/analyzer/allocation-size-2.c: Change expected output
and add new test case.
* gcc.dg/analyzer/pr109577.c: New test.

---
 gcc/analyzer/constraint-manager.cc| 131 ++
 gcc/analyzer/constraint-manager.h |   1 +
 gcc/analyzer/region-model.cc  |  80 ---
 .../gcc.dg/analyzer/allocation-size-2.c   |  24 ++--
 gcc/testsuite/gcc.dg/analyzer/pr109577.c  |  16 +++
 5 files changed, 194 insertions(+), 58 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr109577.c

diff --git a/gcc/analyzer/constraint-manager.cc 
b/gcc/analyzer/constraint-manager.cc
index 2c9c435527e..24cd8960098 100644
--- a/gcc/analyzer/constraint-manager.cc
+++ b/gcc/analyzer/constraint-manager.cc
@@ -2218,6 +2218,137 @@ constraint_manager::get_equiv_class_by_svalue (const 
svalue *sval,
   return false;
 }
 
+/* Tries to find a svalue inside another svalue.  */
+
+class sval_finder : public visitor
+{
+public:
+  sval_finder (const svalue *query) : m_query (query)
+  {
+  }
+
+  bool found_query_p ()
+  {
+return m_found;
+  }
+
+  void visit_region_svalue (const region_svalue *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_constant_svalue (const constant_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_unknown_svalue (const unknown_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_poisoned_svalue (const poisoned_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_setjmp_svalue (const setjmp_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_initial_svalue (const initial_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_unaryop_svalue (const unaryop_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_binop_svalue (const binop_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_sub_svalue (const sub_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_repeated_svalue (const repeated_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_bits_within_svalue (const bits_within_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_unmergeable_svalue (const unmergeable_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_placeholder_svalue (const placeholder_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_widening_svalue (const widening_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_compound_svalue (const compound_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_conjured_svalue (const conjured_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_asm_output_svalue (const asm_output_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+  void visit_const_fn_result_svalue (const const_fn_result_svalue  *sval)
+  {
+m_found |= m_query == sval;
+  }
+
+private:
+  const svalue *m_query;
+  bool m_found;
+};
+
+/* Returns true if SVAL is constrained.  */
+
+bool
+constraint_manager::sval_constrained_p (const svalue *sval) const
+{
+  int i;
+  equiv_class *ec;
+  sval_finder finder (sval);
+  FOR_EACH_VEC_ELT (m_equiv_classes, i, ec)
+{
+  int j;
+  const svalue *iv;
+  FOR_EACH_VEC_ELT (ec->m_vars, j, iv)
+   {
+ iv->accept ();
+ if (finder.found_query_p ())
+   return true;
+   }
+}
+  return false;
+}
+
 /* Ensure that SVAL has an equivalence class within this constraint_manager;
return the ID of the class.  */
 
diff --git a/gcc/analyzer/constraint-manager.h 
b/gcc/analyzer/constraint-manager.h
index 3afbc7f848e..72753e43c96 100644
--- a/gcc/analyzer/constraint-manager.h
+++ b/gcc/analyzer/constraint-manager.h
@@ -459,6 +459,7 @@ public:
 
   bool get_equiv_class_by_svalue (const svalue

Re: Debugging C++ frontend using CLion IDE

2023-03-01 Thread Tim Lange

Hi Berke,

I had the same problem last year. Many IDEs don't really work for 
developing gcc. Most here probably use either emacs or vim. If you want 
to use an IDE, you might have to do some hacks.


The oldschool indentation style of gcc (mix of tab and spaces) is not 
widely supported. IDEs/Editors that support displaying this identation 
style are GNOME Builder (but that doesn't support the way gcc is 
built), Eclipse, VS Code (since Dec 22) and vim/emacs of course.
For VS Code, you need to apply 'unexpand' to the source files as it 
does not support converting 8 spaces to 1 tab automatically. There are 
plugins for Code to run customs commands on save.


To keep your IDE snappy, you should exclude everything that you won't 
need from the indexing, especially the test directory.


At last, for debugging, I had good experiences with launching a 
gdbserver, i.e. replacing "-wrapper gdb" with "-wrapper 
gdbserver,localhost:2345". Then you can connect your IDE to that 
gdbserver and fully use the IDE interface to debug. You can configure 
running gcc with a gdbserver as a pre-task to automate this.


- Tim

PS: When I tried CLion last year, I neither could get the build system 
working nor the indentation. So you might want to look at one of the 
other IDEs but I don't if something changed in the time.


On Mi, Mär 1 2023 at 20:59:17 +0300, Berke Yavas via Gcc 
 wrote:

Hi,

I am trying to set up my environment for the upcoming google summer of
code. One thing I haven't figured out yet, how can I debug C++ 
frontend(or
any other language frontend) with CLion. If anybody managed to do 
this(or

using another IDE), could you please share your settings with me?

Best regards,
Berke





Re: [PATCH] analyzer: consider empty ranges and zero byte accesses [PR106845]

2022-09-11 Thread Tim Lange
> ...it took me a moment to realize that the analyzer "sees" that this is
> "main", and thus buf_size is 0.
> 
> Interestingly, if I rename it to not be "main" (and thus buf_size could
> be non-zero), we still don't complain:
>   https://godbolt.org/z/PezfTo9Mz
> Presumably this is a known limitation of the symbolic bounds checking?

Yeah. I do only try structural equality for binaryop_svalues.  The
example does result in a call to eval_condition_without_cm with two
  unaryop_svalue(NOP_EXPR, initial_svalue ('buf_size'))
that have different types ('unsigned int' and 'sizetype').  Thus,
lhs == rhs is false and eval_condition_without_cm does return UNKNOWN.

Changing the type of buf_size to size_t removes the UNARYOP wrapping and
thus, emits a warning: https://godbolt.org/z/4sh7TM4v1 [0]

Otherwise, we could also do a call to structural_equality for
unaryop_svalue inside eval_condition_without_cm and ignore a type
mismatch for unaryop_svalues.  That way, the analyzer would complain about
your example.  Not 100% sure but I think it is okay to ignore the type
here for unaryop_svalues as long as the leafs match up.  If you agree,
I can prepare a patch [1].

[0] I've seen you pushed a patch that displays the capacity as a new
event at region_creation.  My patches did that by overwriting whats
printed using describe_region_creation_event.  Should I remove all
those now unneccessary describe_region_creation_event overloads?
[1] Below is how that would probably look like.

---
 gcc/analyzer/region-model.cc | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 82006405373..4a9f0ff1e86 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -4190,6 +4190,24 @@ region_model::eval_condition_without_cm (const svalue 
*lhs,
}
 }
 
+  if (lhs->get_kind () == SK_UNARYOP)
+{
+  switch (op)
+   {
+   default:
+ break;
+   case EQ_EXPR:
+   case LE_EXPR:
+   case GE_EXPR:
+ {
+   tristate res = structural_equality (lhs, rhs);
+   if (res.is_true ())
+ return res;
+ }
+ break;
+   }
+}
+
   return tristate::TS_UNKNOWN;
 }
 
@@ -4307,9 +4325,7 @@ region_model::structural_equality (const svalue *a, const 
svalue *b) const
   {
const unaryop_svalue *un_a = as_a  (a);
if (const unaryop_svalue *un_b = dyn_cast  (b))
- return tristate (pending_diagnostic::same_tree_p (un_a->get_type (),
-   un_b->get_type ())
-  && un_a->get_op () == un_b->get_op ()
+ return tristate (un_a->get_op () == un_b->get_op ()
   && structural_equality (un_a->get_arg (),
   un_b->get_arg ()));
   }
-- 
2.37.3



[PATCH] analyzer: consider empty ranges and zero byte accesses [PR106845]

2022-09-10 Thread Tim Lange
Hi,

see my patch below for a fix of pr106845.  I decided to allow bit_ranges
and byte_ranges to have a size of zero and rather only add an assertion
to the functions that assume a non-zero size.  That way is more elegant in
the caller than restricting byte_range to only represent non-empty ranges.

- Tim

This patch adds handling of empty ranges in bit_range and byte_range and
adds an assertion to member functions that assume a positive size.
Further, the patch fixes an ICE caused by an empty byte_range passed to
byte_range::exceeds_p.

Regression-tested on Linux x86_64.

2022-09-10  Tim Lange  

gcc/analyzer/ChangeLog:

PR analyzer/106845
* region-model.cc (region_model::check_region_bounds):
Bail out if 0 bytes were accessed.
* store.cc (byte_range::dump_to_pp):
Add special case for empty ranges.
(byte_range::exceeds_p): Restrict to non-empty ranges.
(byte_range::falls_short_of_p): Restrict to non-empty ranges.
* store.h (bit_range::empty_p): New function.
(bit_range::get_last_byte_offset): Restrict to non-empty ranges.
(byte_range::empty_p): New function.
(byte_range::get_last_byte_offset): Restrict to non-empty ranges.

gcc/testsuite/ChangeLog:

PR analyzer/106845
* gcc.dg/analyzer/out-of-bounds-zero.c: New test.
* gcc.dg/analyzer/pr106845.c: New test.

---
 gcc/analyzer/region-model.cc  |  3 +
 gcc/analyzer/store.cc | 12 +++-
 gcc/analyzer/store.h  | 12 
 .../gcc.dg/analyzer/out-of-bounds-zero.c  | 67 +++
 gcc/testsuite/gcc.dg/analyzer/pr106845.c  | 11 +++
 5 files changed, 103 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-zero.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr106845.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index d321e5b6479..82006405373 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -1826,6 +1826,9 @@ region_model::check_region_bounds (const region *reg,
   /* Find out how many bytes were accessed.  */
   const svalue *num_bytes_sval = reg->get_byte_size_sval (m_mgr);
   tree num_bytes_tree = maybe_get_integer_cst_tree (num_bytes_sval);
+  /* Bail out if 0 bytes are accessed.  */
+  if (num_bytes_tree && zerop (num_bytes_tree))
+return;
 
   /* Get the capacity of the buffer.  */
   const svalue *capacity = get_capacity (base_reg);
diff --git a/gcc/analyzer/store.cc b/gcc/analyzer/store.cc
index ec5232cb055..1857d95f0b6 100644
--- a/gcc/analyzer/store.cc
+++ b/gcc/analyzer/store.cc
@@ -380,7 +380,11 @@ bit_range::as_byte_range (byte_range *out) const
 void
 byte_range::dump_to_pp (pretty_printer *pp) const
 {
-  if (m_size_in_bytes == 1)
+  if (m_size_in_bytes == 0)
+{
+  pp_string (pp, "empty");
+}
+  else if (m_size_in_bytes == 1)
 {
   pp_string (pp, "byte ");
   pp_wide_int (pp, m_start_byte_offset, SIGNED);
@@ -455,7 +459,9 @@ bool
 byte_range::exceeds_p (const byte_range ,
   byte_range *out_overhanging_byte_range) const
 {
-  if (other.get_last_byte_offset () < get_last_byte_offset ())
+  gcc_assert (!empty_p ());
+
+  if (other.get_next_byte_offset () < get_next_byte_offset ())
 {
   /* THIS definitely exceeds OTHER.  */
   byte_offset_t start = MAX (get_start_byte_offset (),
@@ -477,6 +483,8 @@ bool
 byte_range::falls_short_of_p (byte_offset_t offset,
  byte_range *out_fall_short_bytes) const
 {
+  gcc_assert (!empty_p ());
+
   if (get_start_byte_offset () < offset)
 {
   /* THIS falls short of OFFSET.  */
diff --git a/gcc/analyzer/store.h b/gcc/analyzer/store.h
index ac8b6853f4b..d172ee756c8 100644
--- a/gcc/analyzer/store.h
+++ b/gcc/analyzer/store.h
@@ -237,6 +237,11 @@ struct bit_range
   void dump_to_pp (pretty_printer *pp) const;
   void dump () const;
 
+  bool empty_p () const
+  {
+return m_size_in_bits == 0;
+  }
+
   bit_offset_t get_start_bit_offset () const
   {
 return m_start_bit_offset;
@@ -247,6 +252,7 @@ struct bit_range
   }
   bit_offset_t get_last_bit_offset () const
   {
+gcc_assert (!empty_p ());
 return get_next_bit_offset () - 1;
   }
 
@@ -297,6 +303,11 @@ struct byte_range
   void dump_to_pp (pretty_printer *pp) const;
   void dump () const;
 
+  bool empty_p () const
+  {
+return m_size_in_bytes == 0;
+  }
+
   bool contains_p (byte_offset_t offset) const
   {
 return (offset >= get_start_byte_offset ()
@@ -329,6 +340,7 @@ struct byte_range
   }
   byte_offset_t get_last_byte_offset () const
   {
+gcc_assert (!empty_p ());
 return m_start_byte_offset + m_size_in_bytes - 1;
   }
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-zero.c 
b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-zero.c
new file mode 100644
index 000..201ca00ebdb
--- /dev/n

[PATCH v2] analyzer: support for symbolic values in the out-of-bounds checker [PR106625]

2022-09-07 Thread Tim Lange
Hi Dave,

while re-reading that patch, I noticed a small mistake. I accidently did
not move the op == PLUS_EXPR or MULT_EXPR guard in symbolic_greater_than
when implementing the "eliminate operands on both sides" feature, which
lead to the old patch also eliminating operands on both sides if the
operator decreases the value, which is obviously wrong.

I moved the guard outside and added test coverage for this in
symbolic-gt-1.c. The patch passed the regrtests with the fix included.

I assume it is still okay for trunk?

- Tim

This patch adds support for reasoning about the inequality of two symbolic
values in the special case specifically suited for reasoning about
out-of-bounds past the end of the buffer. With this patch, the analyzer
catches off-by-one errors and more even when the offset and capacity is
symbolic.

Regrtested on Linux x86_64 and tested on coreutils, curl, httpd and
openssh as usual.

2022-09-07  Tim Lange  

gcc/analyzer/ChangeLog:

PR analyzer/106625
* analyzer.h (region_offset): Eliminate m_is_symbolic member.
* region-model-impl-calls.cc (region_model::impl_call_realloc):
Refine implementation to be more precise.
* region-model.cc (class symbolic_past_the_end):
Abstract diagnostic class to complain about accesses past the end
with symbolic values.
(class symbolic_buffer_overflow):
Concrete diagnostic class to complain about buffer overflows with
symbolic values.
(class symbolic_buffer_overread):
Concrete diagnostic class to complain about buffer overreads with
symbolic values.
(region_model::check_symbolic_bounds): New function.
(maybe_get_integer_cst_tree): New helper function.
(region_model::check_region_bounds):
Add call to check_symbolic_bounds if offset is not concrete.
(region_model::eval_condition_without_cm):
Add support for EQ_EXPR and GT_EXPR with binaryop_svalues. 
(is_positive_svalue): New hleper function.
(region_model::symbolic_greater_than): 
(region_model::structural_equality): New function to compare
whether two svalues are structured the same, i.e. evaluate to
the same value.
(test_struct): Reflect changes to region::calc_offset.
(test_var): Likewise.
(test_array_2): Likewise and add selftest with symbolic i.
* region-model.h (class region_model): Add check_symbolic_bounds,
symbolic_greater_than and structural_equality.
* region.cc (region::get_offset):
Reflect changes to region::calc_offset.
(region::calc_offset):
Compute the symbolic offset if the offset is not concrete.
(region::get_relative_symbolic_offset): New function to return the
symbolic offset in bytes relative to its parent.
(field_region::get_relative_symbolic_offset): Likewise.
(element_region::get_relative_symbolic_offset): Likewise.
(offset_region::get_relative_symbolic_offset): Likewise.
(bit_range_region::get_relative_symbolic_offset): Likewise.
* region.h: Add get_relative_symbolic_offset.
* store.cc (binding_key::make):
Reflect changes to region::calc_offset.
(binding_map::apply_ctor_val_to_range): Likewise.
(binding_map::apply_ctor_pair_to_child_region): Likewise.
(binding_cluster::bind_compound_sval): Likewise.
(binding_cluster::get_any_binding): Likewise.
(binding_cluster::maybe_get_compound_binding): Likewise.

gcc/ChangeLog:

PR analyzer/106625
* doc/invoke.texi:
State that the checker also reasons about symbolic values.

gcc/testsuite/ChangeLog:

PR analyzer/106625
* gcc.dg/analyzer/data-model-1.c: Change expected result.
* gcc.dg/analyzer/out-of-bounds-5.c: New test.
* gcc.dg/analyzer/out-of-bounds-realloc-grow.c: New test.
* gcc.dg/analyzer/symbolic-gt-1.c: New test.

---
 gcc/analyzer/analyzer.h   |  23 +-
 gcc/analyzer/region-model-impl-calls.cc   |  39 +-
 gcc/analyzer/region-model.cc  | 469 --
 gcc/analyzer/region-model.h   |   9 +
 gcc/analyzer/region.cc| 131 -
 gcc/analyzer/region.h |  17 +-
 gcc/analyzer/store.cc |  18 +-
 gcc/doc/invoke.texi   |   8 +-
 gcc/testsuite/gcc.dg/analyzer/data-model-1.c  |   3 +-
 .../gcc.dg/analyzer/out-of-bounds-5.c | 156 ++
 .../analyzer/out-of-bounds-realloc-grow.c |  87 
 gcc/testsuite/gcc.dg/analyzer/symbolic-gt-1.c |  76 +++
 12 files changed, 941 insertions(+), 95 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-5.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-realloc-grow.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/symbolic-gt-1.c

diff --git a/gcc/analyzer/an

[PATCH][WIP?] analyzer: support for symbolic values in the out-of-bounds checker [PR106625]

2022-09-05 Thread Tim Lange
Hi,

below is my patch, adding support for reasoning about buffer overflows and
overreads with symbolic offsets and capacities.

I've already had one off-list feedback from Dave after sending him my
preliminary work. Below, I'll be also answering some of the questions that
came up during the first round.

To reason about out-of-bounds with symbolic values, I have decided to do
some simplifications:
* Only reason in bytes, i.e. loosing some bits on bit-field accesses and
  bit ranges in the conversion.
* Not trying to handle commutativeness [0].
* Require similar structure of the offset and the capacity svalue to
  complain about symbolic out-of-bounds.
* A symbolic leaf operand [1] is positive if its type is unsigned. I do
  ignore that the operand could be zero. It wouldn't make much sense
  to have an offset that is always zero, but is not inferable statically
  such that the approxmiation here would yield a false-positive. In order
  to fully prevent the false-positive, we would have to give up many true
  positives. Dave also thinks that is a reasonable approximation.

> Whats the slowdown of this patch?
I found another optimization that tries to skip the structural equality
check by trying referential equality (aka comparing pointers) first. That
seems to do the trick and at least in my single run of compiling
coreutils, curl, httpd and openssh with the current master and my patch
enabled, I've found little to no overhead, at most ~5s CPU time [2].

> Why was the realloc implementation changed?
With the patch, the analyzer can reason about simple inequalities of
svalues. The previous way of getting the smaller of the current buffer
size and the new buffer size was less accurate and lead to a false
positive because it chose the wrong svalue. The change fixes that by
using the same comparison function as the out-of-bounds checker. Also, I
changed it to return the OLD_SIZE_SVAL by default, because I think I had
a thinking error in my previous patch: Realloc implementations probably
only move the buffer if the buffer grows. That means the old buffer is
copied to the new one, only touching as many bytes as the old buffer had.
The rest of the bytes is left uninitialized.

I added [WIP?], because the regrtests are not yet finished but a similar
version did pass them, so I assume thats okay to post it now for review
and hand in the regrtests later [3].

- Tim

[0] I have tried that earlier but it turned out to be too slow.
[1] leaf == conjured, inital or constant svalue.
[2] Note that I didn't run multiple tests and the compile farm is not
isolated and I haven't done anything about caching etc. But at least
the results show that there is no heavy slowdown.
[3] The analyzer and analyzer-torture tests pass on my machine for C/C++.

This patch adds support for reasoning about the inequality of two symbolic
values in the special case specifically suited for reasoning about
out-of-bounds past the end of the buffer. With this patch, the analyzer
catches off-by-one errors and more even when the offset and capacity is
symbolic.

Tested on coreutils, curl, httpd and openssh.

gcc/analyzer/ChangeLog:

PR analyzer/106625
* analyzer.h (region_offset): Eliminate m_is_symbolic member.
* region-model-impl-calls.cc (region_model::impl_call_realloc):
Refine implementation to be more precise.
* region-model.cc (class symbolic_past_the_end):
Abstract diagnostic class to complain about accesses past the end
with symbolic values.
(class symbolic_buffer_overflow):
Concrete diagnostic class to complain about buffer overflows with
symbolic values.
(class symbolic_buffer_overread):
Concrete diagnostic class to complain about buffer overreads with
symbolic values.
(region_model::check_symbolic_bounds): New function.
(maybe_get_integer_cst_tree): New helper function.
(region_model::check_region_bounds):
Add call to check_symbolic_bounds if offset is not concrete.
(region_model::eval_condition_without_cm):
Add support for EQ_EXPR and GT_EXPR with binaryop_svalues. 
(is_positive_svalue): New hleper function.
(region_model::symbolic_greater_than): 
(region_model::structural_equality): New function to compare
whether two svalues are structured the same, i.e. evaluate to
the same value.
(test_struct): Reflect changes to region::calc_offset.
(test_var): Likewise.
(test_array_2): Likewise and add selftest with symbolic i.
* region-model.h (class region_model): Add check_symbolic_bounds,
symbolic_greater_than and structural_equality.
* region.cc (region::get_offset):
Reflect changes to region::calc_offset.
(region::calc_offset):
Compute the symbolic offset if the offset is not concrete.
(region::get_relative_symbolic_offset): New function to return the

[PATCH 2/2 v2] analyzer: strcpy semantics

2022-09-04 Thread Tim Lange
Hi Dave,

sorry about the strncpy thing, I should've been more careful. Below is the
patch with just the strcpy part.

- Tim

This patch adds modelling for the semantics of strcpy in the simple case
where the analyzer is able to infer a concrete string size.

Regrtested on Linux x86_64.

2022-09-04  Tim Lange  

gcc/analyzer/ChangeLog:

* region-model-impl-calls.cc (region_model::impl_call_strcpy):
Handle the constant string case.
* region-model.cc (region_model::get_string_size):
New function to get the string size from a region or svalue.
* region-model.h (class region_model): Add get_string_size.

gcc/testsuite/ChangeLog:

* gcc.dg/analyzer/out-of-bounds-4.c: New test.
* gcc.dg/analyzer/strcpy-3.c: New test.

---
 gcc/analyzer/region-model-impl-calls.cc   | 16 -
 gcc/analyzer/region-model.cc  | 29 +
 gcc/analyzer/region-model.h   |  3 +
 .../gcc.dg/analyzer/out-of-bounds-4.c | 65 +++
 gcc/testsuite/gcc.dg/analyzer/strcpy-3.c  | 23 +++
 5 files changed, 133 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/strcpy-3.c

diff --git a/gcc/analyzer/region-model-impl-calls.cc 
b/gcc/analyzer/region-model-impl-calls.cc
index 8eebd122d42..3790eaf2d79 100644
--- a/gcc/analyzer/region-model-impl-calls.cc
+++ b/gcc/analyzer/region-model-impl-calls.cc
@@ -1019,13 +1019,23 @@ region_model::impl_call_strcpy (const call_details )
   const svalue *dest_sval = cd.get_arg_svalue (0);
   const region *dest_reg = deref_rvalue (dest_sval, cd.get_arg_tree (0),
 cd.get_ctxt ());
+  const svalue *src_sval = cd.get_arg_svalue (1);
+  const region *src_reg = deref_rvalue (src_sval, cd.get_arg_tree (1),
+   cd.get_ctxt ());
+  const svalue *src_contents_sval = get_store_value (src_reg,
+cd.get_ctxt ());
 
   cd.maybe_set_lhs (dest_sval);
 
-  check_region_for_write (dest_reg, cd.get_ctxt ());
+  /* Try to get the string size if SRC_REG is a string_region.  */
+  const svalue *copied_bytes_sval = get_string_size (src_reg);
+  /* Otherwise, check if the contents of SRC_REG is a string.  */
+  if (copied_bytes_sval->get_kind () == SK_UNKNOWN)
+copied_bytes_sval = get_string_size (src_contents_sval);
 
-  /* For now, just mark region's contents as unknown.  */
-  mark_region_as_unknown (dest_reg, cd.get_uncertainty ());
+  const region *sized_dest_reg
+= m_mgr->get_sized_region (dest_reg, NULL_TREE, copied_bytes_sval);
+  set_value (sized_dest_reg, src_contents_sval, cd.get_ctxt ());
 }
 
 /* Handle the on_call_pre part of "strlen".  */
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index ec29be259b5..4ec18c86774 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -3218,6 +3218,35 @@ region_model::get_capacity (const region *reg) const
   return m_mgr->get_or_create_unknown_svalue (sizetype);
 }
 
+/* Return the string size, including the 0-terminator, if SVAL is a
+   constant_svalue holding a string.  Otherwise, return an unknown_svalue.  */
+
+const svalue *
+region_model::get_string_size (const svalue *sval) const
+{
+  tree cst = sval->maybe_get_constant ();
+  if (!cst || TREE_CODE (cst) != STRING_CST)
+return m_mgr->get_or_create_unknown_svalue (size_type_node);
+
+  tree out = build_int_cst (size_type_node, TREE_STRING_LENGTH (cst));
+  return m_mgr->get_or_create_constant_svalue (out);
+}
+
+/* Return the string size, including the 0-terminator, if REG is a
+   string_region.  Otherwise, return an unknown_svalue.  */
+
+const svalue *
+region_model::get_string_size (const region *reg) const
+{
+  const string_region *str_reg = dyn_cast  (reg);
+  if (!str_reg)
+return m_mgr->get_or_create_unknown_svalue (size_type_node);
+
+  tree cst = str_reg->get_string_cst ();
+  tree out = build_int_cst (size_type_node, TREE_STRING_LENGTH (cst));
+  return m_mgr->get_or_create_constant_svalue (out);
+}
+
 /* If CTXT is non-NULL, use it to warn about any problems accessing REG,
using DIR to determine if this access is a read or write.  */
 
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 7ce832f6ce4..a1f2165e145 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -793,6 +793,9 @@ class region_model
 
   const svalue *get_capacity (const region *reg) const;
 
+  const svalue *get_string_size (const svalue *sval) const;
+  const svalue *get_string_size (const region *reg) const;
+
   /* Implemented in sm-malloc.cc  */
   void on_realloc_with_move (const call_details ,
 const svalue *old_ptr_sval,
diff --git a/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-4.c 
b/gcc/testsuite/gcc.dg/analyzer/o

[PATCH 2/2] analyzer: strcpy and strncpy semantics

2022-09-02 Thread Tim Lange
Hi,

below is my patch for the strcpy and strncpy semantics inside the
analyzer, enabling the out-of-bounds checker to also complain about
overflows caused by those two functions.

As the plan is to reason about the inequality of symbolic values in the
future, I decided to use eval_condition to compare the number of bytes and
the string size for strncpy [0].

- Tim

[0] instead of only trying to handle cases where svalues are constant;
which was how I did it in an earlier draft discussed off-list.


This patch adds modelling for the semantics of strcpy and strncpy in the
simple case where the analyzer is able to reason about the inequality of
the size argument and the string size.

Regrtested on Linux x86_64.

2022-09-02  Tim Lange  

gcc/analyzer/ChangeLog:

* region-model-impl-calls.cc (region_model::impl_call_strncpy):
New function.
* region-model.cc (region_model::on_call_pre):
Add call to impl_call_strncpy.
(region_model::get_string_size): New function.
* region-model.h (class region_model):
Add impl_call_strncpy and get_string_size.

gcc/testsuite/ChangeLog:

* gcc.dg/analyzer/out-of-bounds-4.c: New test.
* gcc.dg/analyzer/strcpy-3.c: New test.
* gcc.dg/analyzer/strncpy-1.c: New test.

---
 gcc/analyzer/region-model-impl-calls.cc   |  62 -
 gcc/analyzer/region-model.cc  |  33 +
 gcc/analyzer/region-model.h   |   4 +
 .../gcc.dg/analyzer/out-of-bounds-4.c | 122 ++
 gcc/testsuite/gcc.dg/analyzer/strcpy-3.c  |  23 
 gcc/testsuite/gcc.dg/analyzer/strncpy-1.c |  23 
 6 files changed, 264 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/strcpy-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/strncpy-1.c

diff --git a/gcc/analyzer/region-model-impl-calls.cc 
b/gcc/analyzer/region-model-impl-calls.cc
index 8eebd122d42..9f1ae020f4f 100644
--- a/gcc/analyzer/region-model-impl-calls.cc
+++ b/gcc/analyzer/region-model-impl-calls.cc
@@ -1019,13 +1019,69 @@ region_model::impl_call_strcpy (const call_details )
   const svalue *dest_sval = cd.get_arg_svalue (0);
   const region *dest_reg = deref_rvalue (dest_sval, cd.get_arg_tree (0),
 cd.get_ctxt ());
+  const svalue *src_sval = cd.get_arg_svalue (1);
+  const region *src_reg = deref_rvalue (src_sval, cd.get_arg_tree (1),
+   cd.get_ctxt ());
+  const svalue *src_contents_sval = get_store_value (src_reg,
+cd.get_ctxt ());
 
   cd.maybe_set_lhs (dest_sval);
 
-  check_region_for_write (dest_reg, cd.get_ctxt ());
+  /* Try to get the string size if SRC_REG is a string_region.  */
+  const svalue *copied_bytes_sval = get_string_size (src_reg);
+  /* Otherwise, check if the contents of SRC_REG is a string.  */
+  if (copied_bytes_sval->get_kind () == SK_UNKNOWN)
+copied_bytes_sval = get_string_size (src_contents_sval);
+
+  const region *sized_dest_reg
+= m_mgr->get_sized_region (dest_reg, NULL_TREE, copied_bytes_sval);
+  set_value (sized_dest_reg, src_contents_sval, cd.get_ctxt ());
+}
+
+/* Handle the on_call_pre part of "strncpy" and "__builtin_strncpy_chk".  */
 
-  /* For now, just mark region's contents as unknown.  */
-  mark_region_as_unknown (dest_reg, cd.get_uncertainty ());
+void
+region_model::impl_call_strncpy (const call_details )
+{
+  const svalue *dest_sval = cd.get_arg_svalue (0);
+  const region *dest_reg = deref_rvalue (dest_sval, cd.get_arg_tree (0),
+cd.get_ctxt ());
+  const svalue *src_sval = cd.get_arg_svalue (1);
+  const region *src_reg = deref_rvalue (src_sval, cd.get_arg_tree (1),
+   cd.get_ctxt ());
+  const svalue *src_contents_sval = get_store_value (src_reg,
+cd.get_ctxt ());
+  const svalue *num_bytes_sval = cd.get_arg_svalue (2);
+
+  cd.maybe_set_lhs (dest_sval);
+
+  const svalue *string_size_sval = get_string_size (src_reg);
+  if (string_size_sval->get_kind () == SK_UNKNOWN)
+string_size_sval = get_string_size (src_contents_sval);
+
+  /* strncpy copies until a zero terminator is reached or n bytes were copied.
+ Determine the lesser of both here.  */
+  tristate ts = eval_condition (string_size_sval, LT_EXPR, num_bytes_sval);
+  const svalue *copied_bytes_sval;
+  switch (ts.get_value ())
+{
+  case tristate::TS_TRUE:
+   copied_bytes_sval = string_size_sval;
+   break;
+  case tristate::TS_FALSE:
+   copied_bytes_sval = num_bytes_sval;
+   break;
+  case tristate::TS_UNKNOWN:
+   copied_bytes_sval
+ = m_mgr->get_or_create_unknown_svalue (size_type_node);
+   break;
+  default:
+   gcc_unreacha

[PATCH 1/2] analyzer: return a concrete offset for cast_regions

2022-09-02 Thread Tim Lange
This patch fixes a bug where maybe_fold_sub_svalue did not fold the
access of a single char from a string to a char when the offset was zero
because get_relative_concrete_offset did return false for cast_regions.

Regrtested on Linux x86_64.

2022-09-02  Tim Lange  

gcc/analyzer/ChangeLog:

* region.cc (cast_region::get_relative_concrete_offset):
New overloaded method.
* region.h: Add cast_region::get_relative_concrete_offset.

gcc/testsuite/ChangeLog:

* gcc.dg/analyzer/fold-string-to-char.c: New test.

---
 gcc/analyzer/region.cc  | 10 ++
 gcc/analyzer/region.h   |  2 ++
 gcc/testsuite/gcc.dg/analyzer/fold-string-to-char.c |  8 
 3 files changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/fold-string-to-char.c

diff --git a/gcc/analyzer/region.cc b/gcc/analyzer/region.cc
index f4aba6b9c88..9c8279b130d 100644
--- a/gcc/analyzer/region.cc
+++ b/gcc/analyzer/region.cc
@@ -1556,6 +1556,16 @@ cast_region::dump_to_pp (pretty_printer *pp, bool 
simple) const
 }
 }
 
+/* Implementation of region::get_relative_concrete_offset vfunc
+   for cast_region.  */
+
+bool
+cast_region::get_relative_concrete_offset (bit_offset_t *out) const
+{
+  *out = (int) 0;
+  return true;
+}
+
 /* class heap_allocated_region : public region.  */
 
 /* Implementation of region::dump_to_pp vfunc for heap_allocated_region.  */
diff --git a/gcc/analyzer/region.h b/gcc/analyzer/region.h
index d37584b7285..34ce1fa1714 100644
--- a/gcc/analyzer/region.h
+++ b/gcc/analyzer/region.h
@@ -1087,6 +1087,8 @@ public:
   void accept (visitor *v) const final override;
   void dump_to_pp (pretty_printer *pp, bool simple) const final override;
 
+  bool get_relative_concrete_offset (bit_offset_t *out) const final override;
+
   const region *get_original_region () const { return m_original_region; }
 
 private:
diff --git a/gcc/testsuite/gcc.dg/analyzer/fold-string-to-char.c 
b/gcc/testsuite/gcc.dg/analyzer/fold-string-to-char.c
new file mode 100644
index 000..46139216bba
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/fold-string-to-char.c
@@ -0,0 +1,8 @@
+#include "analyzer-decls.h"
+
+void test_1 (void)
+{
+  char str[] = "Hello";
+  char *ptr = str;
+  __analyzer_eval (ptr[0] == 'H'); /* { dg-warning "TRUE" } */
+}
-- 
2.37.2



Re: Usage of the C++ stdlib unordered_map in GCC

2022-08-31 Thread Tim Lange
On Mi, Aug 31 2022 at 10:35:08 -0400, Jason Merrill via Gcc 
 wrote:
Generally we want to use the GCC hash_map because it works with GCC 
garbage

collection (and PCH).  Is that not relevant to your patch?

Jason


The map is only part a short-lived visitor object inside the analyzer 
and is used to map svalues to other svalues (in most cases <15 kv 
pairs). I'm new to GCC, so I started of using what I knew. Only later, 
I have noticed that the unordered_map is used little to nowhere. It is 
not much effort to change it but I just wondered whether the usage is 
so low because GCC only lately switched to C++11 or other reasons. The 
responses answered my question, thanks.


- Tim




Usage of the C++ stdlib unordered_map in GCC

2022-08-30 Thread Tim Lange

Hello,

I was preparing a patch for GCC and used the unordered_map from the C++ 
stdlib in my patch. Later on, I noticed that it is used nowhere else 
inside GCC except for some files in the go frontend.


I wondered, now that building GCC requires a C++11 host compiler, 
whether there is a consensus on which data structure implementation is 
preferred. Should I rather use a hash_map instead of an unordered_map 
or is it on my side to decide which one I choose?


- Tim




[PATCH] analyzer: buffer overlap checker [PR105898]

2022-08-22 Thread Tim Lange
Hi,

below is the patch for the buffer overlap checker in the analyzer. It
contains a working buffer overlap warning as well as most of the code
needed to also warn on the general case aka when arguments alias with an
argument passed to restrict-qualified parameter.

The current C standard draft states that aliases to restrict-qualified
parameters are defined behavior if neither the alias nor the
restrict-qualified parameter is written. We've not yet come to a
conclusion how to handle this and thus, the function doing the general
case check is not used. A possible call site has a TODO stating that and
the current limitations are documented in invoke.texi.

- Tim

This patch adds a new checker to complain about overlapping buffers on
calls to memcpy and mempcpy.

Regression-tested on Linux x86_64 and tested as usual on coreutils, curl,
httpd and openssh.

2022-08-21  Tim Lange  

gcc/analyzer/ChangeLog:

PR analyzer/105898
* analyzer.opt: Add Wanalyzer-restrict.
* region-model-impl-calls.cc (region_model::impl_call_memcpy):
Add call to region_model::check_region_overlap.
(region_model::impl_call_mempcpy): New function.
* region-model.cc (class restrict_alias): Concrete diagnostic to
complain about the disregard of the restrict qualifier.
(class region_overlap): Concrete diagnostic to complain about
overlapping buffers.
(region_model::check_region_aliases): New function.
(region_model::check_region_overlap): New function.
(region_model::on_call_pre):
Add call to region_model::impl_call_mempcpy.
* region-model.h (class region_model):
Add check_region_aliases and check_region_overlap.
* region.cc (region::unwrap_cast): New helper function.
* region.h: Add region::unwrap_cast.
* svalue.cc (svalue::unwrap_cast): New helper function.
* svalue.h: Add svalue::unwrap_cast.

gcc/ChangeLog:

PR analyzer/105898
* doc/invoke.texi: Add Wanalyzer-restrict.

gcc/testsuite/ChangeLog:

PR analyzer/105898
* gcc.dg/analyzer/restrict-1.c: New test.

---
 gcc/analyzer/analyzer.opt  |   4 +
 gcc/analyzer/region-model-impl-calls.cc|  25 +-
 gcc/analyzer/region-model.cc   | 284 ++
 gcc/analyzer/region-model.h|   8 +
 gcc/analyzer/region.cc |  11 +
 gcc/analyzer/region.h  |   1 +
 gcc/analyzer/svalue.cc |  11 +
 gcc/analyzer/svalue.h  |   1 +
 gcc/doc/invoke.texi|  17 +
 gcc/testsuite/gcc.dg/analyzer/restrict-1.c | 413 +
 10 files changed, 774 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/restrict-1.c

diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
index 437ea92e130..ae5ebdb0d41 100644
--- a/gcc/analyzer/analyzer.opt
+++ b/gcc/analyzer/analyzer.opt
@@ -142,6 +142,10 @@ Wanalyzer-putenv-of-auto-var
 Common Var(warn_analyzer_putenv_of_auto_var) Init(1) Warning
 Warn about code paths in which an on-stack buffer is passed to putenv.
 
+Wanalyzer-restrict
+Common Var(warn_analyzer_restrict) Init(1) Warning
+Warn about code paths in which an argument passed to a restrict-qualified 
parameter aliases with another argument.
+
 Wanalyzer-shift-count-negative
 Common Var(warn_analyzer_shift_count_negative) Init(1) Warning
 Warn about code paths in which a shift with negative count is attempted.
diff --git a/gcc/analyzer/region-model-impl-calls.cc 
b/gcc/analyzer/region-model-impl-calls.cc
index 8eebd122d42..bc8e343643a 100644
--- a/gcc/analyzer/region-model-impl-calls.cc
+++ b/gcc/analyzer/region-model-impl-calls.cc
@@ -502,7 +502,6 @@ region_model::impl_call_malloc (const call_details )
 }
 
 /* Handle the on_call_pre part of "memcpy" and "__builtin_memcpy".  */
-// TODO: complain about overlapping src and dest.
 
 void
 region_model::impl_call_memcpy (const call_details )
@@ -516,6 +515,9 @@ region_model::impl_call_memcpy (const call_details )
   const region *src_reg = deref_rvalue (src_ptr_sval, cd.get_arg_tree (1),
cd.get_ctxt ());
 
+  check_region_overlap (src_reg, /* src_idx */ 1, dest_reg, /* dst_idx */ 0,
+   num_bytes_sval, cd);
+
   cd.maybe_set_lhs (dest_ptr_sval);
 
   const region *sized_src_reg
@@ -527,6 +529,27 @@ region_model::impl_call_memcpy (const call_details )
   set_value (sized_dest_reg, src_contents_sval, cd.get_ctxt ());
 }
 
+/* Handle the on_call_pre part of "mempcpy" and "__builtin_mempcpy".  */
+
+void
+region_model::impl_call_mempcpy (const call_details )
+{
+  const svalue *dest_ptr_sval = cd.get_arg_svalue (0);
+  const svalue *src_ptr_sval = cd.get_arg_svalue (1);
+  const svalue *num_bytes_sval = cd.get_arg_svalue (2);
+
+  const region *dest_reg = deref_r

[PATCH v2] analyzer: warn on the use of floating-points operands in the size argument [PR106181]

2022-08-18 Thread Tim Lange
Hi,

this is the revised version of my patch. I had trouble to get your
point regarding the float_visitor:

> If the constant is seen first, then the non-constant won't be favored
> (though perhaps binary ops get canonicalized so that constants are on
> the RHS?).

Only the assignment of m_result in visit_constant_svalue is guarded by
 !m_result, while the other two are not. So, there are two possibilities:
1. A constant is seen first and then assigned to m_result.
1.1. A non-constant float operand is seen later and
 overwrites m_result.
1.2. There's no non-constant float operand, thus the
 constant is the actual floating-point operand and
 is kept inside m_result.
2. A non-constant is seen first, then m_result might be
   overwritten with another non-constant later but never
   with a constant.
Do I have a flaw in my thinking? (But they do seem to get canonicalized,
so that shouldn't matter)

> How about:
>  -Wanalyzer-imprecise-float-arithmetic
>  -Wanalyzer-imprecise-fp-arithmetic
> instead?  (ideas welcome)

I've chosen the second. I mostly tried to avoid float because it is also
a reserved keyword in many languages and I wanted to avoid confusion
(might be overthinking that).

- Tim

This patch fixes the ICE reported in PR106181 and adds a new warning to
the analyzer complaining about the use of floating-point operands.

Regrtested on Linux x86_64.

2022-08-17  Tim Lange  

gcc/analyzer/ChangeLog:

PR analyzer/106181
* analyzer.opt: Add Wanalyzer-imprecise-floating-point-arithmetic.
* region-model.cc (is_any_cast_p): Formatting.
(region_model::check_region_size): Ensure precondition.
(class imprecise_floating_point_arithmetic): New abstract
diagnostic class for all floating-point related warnings.
(class float_as_size_arg): Concrete diagnostic class to complain
about floating-point operands inside the size argument.
(class contains_floating_point_visitor):
New visitor to find floating-point operands inside svalues.
(region_model::check_dynamic_size_for_floats): New function.
(region_model::set_dynamic_extents):
Call to check_dynamic_size_for_floats.
* region-model.h (class region_model):
Add region_model::check_dynamic_size_for_floats.

gcc/ChangeLog:

PR analyzer/106181
* doc/invoke.texi: Add Wanalyzer-imprecise-fp-arithmetic.

gcc/testsuite/ChangeLog:

PR analyzer/106181
* gcc.dg/analyzer/allocation-size-1.c: New test.
* gcc.dg/analyzer/imprecise-floating-point-1.c: New test.
* gcc.dg/analyzer/pr106181.c: New test.

---
 gcc/analyzer/analyzer.opt |   4 +
 gcc/analyzer/region-model.cc  | 179 +++---
 gcc/analyzer/region-model.h   |   2 +
 gcc/doc/invoke.texi   |  14 ++
 .../gcc.dg/analyzer/allocation-size-1.c   |  10 +
 .../analyzer/imprecise-floating-point-1.c |  74 
 gcc/testsuite/gcc.dg/analyzer/pr106181.c  |  11 ++
 7 files changed, 271 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/imprecise-floating-point-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr106181.c

diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
index 61b58c575ff..437ea92e130 100644
--- a/gcc/analyzer/analyzer.opt
+++ b/gcc/analyzer/analyzer.opt
@@ -98,6 +98,10 @@ Wanalyzer-free-of-non-heap
 Common Var(warn_analyzer_free_of_non_heap) Init(1) Warning
 Warn about code paths in which a non-heap pointer is freed.
 
+Wanalyzer-imprecise-fp-arithmetic
+Common Var(warn_analyzer_imprecise_fp_arithmetic) Init(1) Warning
+Warn about code paths in which floating-point arithmetic is used in locations 
where precise computation is needed.
+
 Wanalyzer-jump-through-null
 Common Var(warn_analyzer_jump_through_null) Init(1) Warning
 Warn about code paths in which a NULL function pointer is called.
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index b5bc3efda32..ec29be259b5 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -3301,7 +3301,8 @@ public:
 m.add_cwe (131);
 
 return warning_meta (rich_loc, m, get_controlling_option (),
-  "allocated buffer size is not a multiple of the pointee's size");
+"allocated buffer size is not a multiple"
+" of the pointee's size");
   }
 
   label_text
@@ -3396,21 +3397,20 @@ capacity_compatible_with_type (tree cst, tree 
pointee_size_tree)
 class size_visitor : public visitor
 {
 public:
-  size_visitor (tree size_cst, const svalue *sval, constraint_manager *cm)
-  : m_size_cst (size_cst), m_sval (sval), m_cm (cm)
+  size_visitor (tree size_cst, const svalue *root_sval, constraint_m

[PATCH] analyzer: warn on the use of floating points in the size argument [PR106181]

2022-08-15 Thread Tim Lange
This patch fixes the ICE reported in PR106181 and adds a new warning to
the analyzer complaining about the use of floating point operands.

I decided to move the warning for floats inside the size argument out of
the allocation size checker code and toward the allocation such that the
warning only appears once.
I'm not sure about the wording of the diagnostic, my current wording feels
a bit bulky. Here is how the diagnostics look like:

/path/to/main.c: In function ‘test_1’:
/path/to/main.c:10:14: warning: use of floating point arithmetic inside the 
size argument might yield unexpected results 
[-Wanalyzer-imprecise-floating-point-arithmetic]
   10 |   int *ptr = malloc (sizeof (int) * n); /* { dg-line test_1 } */
  |  ^
  ‘test_1’: event 1
|
|   10 |   int *ptr = malloc (sizeof (int) * n); /* { dg-line test_1 } */
|  |  ^
|  |  |
|  |  (1) operand ‘n’ is of type ‘float’
|
/path/to/main.c:10:14: note: only use operands of a type that represents whole 
numbers inside the size argument
/path/to/main.c: In function ‘test_2’:
/path/to/main.c:20:14: warning: use of floating point arithmetic inside the 
size argument might yield unexpected results 
[-Wanalyzer-imprecise-floating-point-arithmetic]
   20 |   int *ptr = malloc (n * 3.1); /* { dg-line test_2 } */
  |  ^~~~
  ‘test_2’: event 1
|
|   20 |   int *ptr = malloc (n * 3.1); /* { dg-line test_2 } */
|  |  ^~~~
|  |  |
|  |  (1) operand ‘3.1001e+0’ is of type 
‘double’
|
/path/to/main.c:20:14: note: only use operands of a type that represents whole 
numbers inside the size argument

Also, another point to discuss is the event note in case the expression is
wrapped in a variable, such as in test_3:
/path/to/main.c:30:10: warning: use of floating point arithmetic inside the 
size argument might yield unexpected results 
[-Wanalyzer-imprecise-floating-point-arithmetic]
   30 |   return malloc (size); /* { dg-line test_3 } */
  |  ^
  ‘test_3’: events 1-2
|
|   37 | void test_3 (float f)
|  |  ^~
|  |  |
|  |  (1) entry to ‘test_3’
|   38 | {
|   39 |   void *ptr = alloc_me (f); /* { dg-message "calling 'alloc_me' 
from 'test_3'" } */
|  |   
|  |   |
|  |   (2) calling ‘alloc_me’ from ‘test_3’
|
+--> ‘alloc_me’: events 3-4
   |
   |   28 | void *alloc_me (size_t size)
   |  |   ^~~~
   |  |   |
   |  |   (3) entry to ‘alloc_me’
   |   29 | {
   |   30 |   return malloc (size); /* { dg-line test_3 } */
   |  |  ~
   |  |  |
   |  |  (4) operand ‘f’ is of type ‘float’
   |

I'm not sure if it is easily discoverable that event (4) does refer to
'size'. I thought about also printing get_representative_tree (capacity)
in the event but that would clutter the event message if the argument
does hold the full expression. I don't have any strong feelings about the
decision here but if I had to decide I'd leave it as is (especially
because the warning is probably quite unusual).
The index of the argument would also be a possibility, but that would get
tricky for calloc.

Regrtested on Linux x86_64, ran the analyzer & analyzer-torture tests with
the -m32 option enabled and had no false positives on coreutils, httpd,
openssh and curl.

2022-08-15  Tim Lange  

gcc/analyzer/ChangeLog:

PR analyzer/106181
* analyzer.opt: Add Wanalyzer-imprecise-floating-point-arithmetic.
* region-model-impl-calls.cc (region_model::impl_call_alloca):
Add call to region_model::check_region_capacity_for_floats.
(region_model::impl_call_calloc):
Add call to region_model::check_region_capacity_for_floats.
(region_model::impl_call_malloc):
Add call to region_model::check_region_capacity_for_floats.
* region-model.cc (is_any_cast_p): Formatting.
(region_model::check_region_size): Ensure precondition.
(class imprecise_floating_point_arithmetic): New abstract
diagnostic class for all floating point related warnings.
(class float_as_size_arg): Concrete diagnostic class to complain
about floating point operands inside the size argument.
(class contains_floating_point_visitor):
New visitor to find floating point operands inside svalues.
(region_model::check_region_capacity_for_floats):
New function.
* region-model.h (class region_model):
Add region_model::check_region_capacity_for_floats.

gcc/ChangeLog:

PR analyzer/106181
* doc/invoke.

[committed] testsuite: Disable out-of-bounds checker in analyzer/torture/pr93451.c

2022-08-13 Thread Tim Lange
This patch disables Wanalyzer-out-of-bounds for analyzer/torture/pr93451.c
and makes the test case pass when compiled with -m32.

The emitted warning is a true positive but only occurs if
sizeof (long int) is less than sizeof (double). I've already discussed a
similar case with Dave in the context of pr96764.c and we came to the
conclusion that we just disable the checker in such cases.

Committed under the "obvious fix" rule.

gcc/testsuite/ChangeLog:

* gcc.dg/analyzer/torture/pr93451.c:
Disable Wanalyzer-out-of-bounds.

---
 gcc/testsuite/gcc.dg/analyzer/torture/pr93451.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/analyzer/torture/pr93451.c 
b/gcc/testsuite/gcc.dg/analyzer/torture/pr93451.c
index 5908bc4b69f..daac745d504 100644
--- a/gcc/testsuite/gcc.dg/analyzer/torture/pr93451.c
+++ b/gcc/testsuite/gcc.dg/analyzer/torture/pr93451.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-Wno-analyzer-out-of-bounds" } */
+
 void
 mt (double);
 
-- 
2.37.1



[PATCH 2/2 v2] analyzer: out-of-bounds checker [PR106000]

2022-08-11 Thread Tim Lange
This patch adds an experimental out-of-bounds checker to the analyzer.

The checker was tested on coreutils, curl, httpd and openssh. It is mostly
accurate but does produce false-positives on yacc-generated files and
sometimes when the analyzer misses an invariant. These cases will be
documented in bugzilla.
Regression-tested on Linux x86-64, further ran the analyzer tests with
the -m32 option.

2022-08-11  Tim Lange  

gcc/analyzer/ChangeLog:

PR analyzer/106000
* analyzer.opt: Add Wanalyzer-out-of-bounds.
* region-model.cc (class out_of_bounds): Diagnostics base class
for all out-of-bounds diagnostics.
(class past_the_end): Base class derived from out_of_bounds for
the buffer_overflow and buffer_overread diagnostics.
(class buffer_overflow): Buffer overflow diagnostics.
(class buffer_overread): Buffer overread diagnostics.
(class buffer_underflow): Buffer underflow diagnostics.
(class buffer_underread): Buffer overread diagnostics.
(region_model::check_region_bounds): New function to check region
bounds for out-of-bounds accesses.
(region_model::check_region_access):
Add call to check_region_bounds.
(region_model::get_representative_tree): New function that accepts
a region instead of an svalue.
* region-model.h (class region_model):
Add region_model::check_region_bounds.
* region.cc (region::symbolic_p): New predicate.
(offset_region::get_byte_size_sval): Only return the remaining
byte size on offset_regions.
* region.h: Add region::symbolic_p.
* store.cc (byte_range::intersects_p):
Add new function equivalent to bit_range::intersects_p.
(byte_range::exceeds_p): New function.
(byte_range::falls_short_of_p): New function.
* store.h (struct byte_range): Add byte_range::intersects_p,
byte_range::exceeds_p and byte_range::falls_short_of_p.

gcc/ChangeLog:

PR analyzer/106000
* doc/invoke.texi: Add Wanalyzer-out-of-bounds.

gcc/testsuite/ChangeLog:

PR analyzer/106000
* g++.dg/analyzer/pr100244.C: Disable out-of-bounds warning.
* gcc.dg/analyzer/allocation-size-3.c:
Disable out-of-bounds warning.
* gcc.dg/analyzer/memcpy-2.c: Disable out-of-bounds warning.
* gcc.dg/analyzer/pr101962.c: Add dg-warning.
* gcc.dg/analyzer/pr96764.c: Disable out-of-bounds warning.
* gcc.dg/analyzer/pr97029.c:
Add dummy buffer to prevent an out-of-bounds warning.
* gcc.dg/analyzer/test-setjmp.h:
Add dummy buffer to prevent an out-of-bounds warning.
* gcc.dg/analyzer/zlib-3.c: Add dg-bogus.
* g++.dg/analyzer/out-of-bounds-placement-new.C: New test.
* gcc.dg/analyzer/out-of-bounds-1.c: New test.
* gcc.dg/analyzer/out-of-bounds-2.c: New test.
* gcc.dg/analyzer/out-of-bounds-3.c: New test.
* gcc.dg/analyzer/out-of-bounds-container_of.c: New test.
* gcc.dg/analyzer/out-of-bounds-coreutils.c: New test.
* gcc.dg/analyzer/out-of-bounds-curl.c: New test.

---
 gcc/analyzer/analyzer.opt |   4 +
 gcc/analyzer/region-model.cc  | 422 ++
 gcc/analyzer/region-model.h   |   3 +
 gcc/analyzer/region.cc|  32 ++
 gcc/analyzer/region.h |   4 +
 gcc/analyzer/store.cc |  67 +++
 gcc/analyzer/store.h  |   9 +
 gcc/doc/invoke.texi   |  15 +
 .../analyzer/out-of-bounds-placement-new.C|  19 +
 gcc/testsuite/g++.dg/analyzer/pr100244.C  |   5 +-
 .../gcc.dg/analyzer/allocation-size-3.c   |   2 +
 gcc/testsuite/gcc.dg/analyzer/memcpy-2.c  |   2 +-
 .../gcc.dg/analyzer/out-of-bounds-1.c | 120 +
 .../gcc.dg/analyzer/out-of-bounds-2.c |  83 
 .../gcc.dg/analyzer/out-of-bounds-3.c |  91 
 .../analyzer/out-of-bounds-container_of.c |  51 +++
 .../gcc.dg/analyzer/out-of-bounds-coreutils.c |  29 ++
 .../gcc.dg/analyzer/out-of-bounds-curl.c  |  41 ++
 gcc/testsuite/gcc.dg/analyzer/pr101962.c  |   6 +-
 gcc/testsuite/gcc.dg/analyzer/pr96764.c   |   2 +
 gcc/testsuite/gcc.dg/analyzer/pr97029.c   |   4 +-
 gcc/testsuite/gcc.dg/analyzer/test-setjmp.h   |   4 +-
 gcc/testsuite/gcc.dg/analyzer/zlib-3.c|   4 +-
 23 files changed, 1012 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/analyzer/out-of-bounds-placement-new.C
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-container_of.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-coreutils.c
 create mode 100644 gcc

[PATCH 1/2 v2] analyzer: consider that realloc could shrink the buffer [PR106539]

2022-08-11 Thread Tim Lange
This patch adds the "shrinks buffer" case to the success_with_move
modelling of realloc.

Regression-tested on Linux x86-64, further ran the analyzer tests with
the -m32 option.

2022-08-11  Tim Lange  

gcc/analyzer/ChangeLog:

PR analyzer/106539
* region-model-impl-calls.cc (region_model::impl_call_realloc):
Use the result of get_copied_size as the size for the
sized_regions in realloc.
(success_with_move::get_copied_size): New function.

gcc/testsuite/ChangeLog:

PR analyzer/106539
* gcc.dg/analyzer/pr106539.c: New test.
* gcc.dg/analyzer/realloc-5.c: New test.

---
 gcc/analyzer/region-model-impl-calls.cc   | 48 ---
 gcc/testsuite/gcc.dg/analyzer/pr106539.c  | 15 +++
 gcc/testsuite/gcc.dg/analyzer/realloc-5.c | 45 +
 3 files changed, 102 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr106539.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/realloc-5.c

diff --git a/gcc/analyzer/region-model-impl-calls.cc 
b/gcc/analyzer/region-model-impl-calls.cc
index 8c38e9206fa..fa0ec88b1f4 100644
--- a/gcc/analyzer/region-model-impl-calls.cc
+++ b/gcc/analyzer/region-model-impl-calls.cc
@@ -732,15 +732,17 @@ region_model::impl_call_realloc (const call_details )
  const svalue *old_size_sval = model->get_dynamic_extents (freed_reg);
  if (old_size_sval)
{
- const region *sized_old_reg
+ const svalue *copied_size_sval
+   = get_copied_size (old_size_sval, new_size_sval);
+ const region *copied_old_reg
= model->m_mgr->get_sized_region (freed_reg, NULL,
- old_size_sval);
+ copied_size_sval);
  const svalue *buffer_content_sval
-   = model->get_store_value (sized_old_reg, cd.get_ctxt ());
- const region *sized_new_reg
+   = model->get_store_value (copied_old_reg, cd.get_ctxt ());
+ const region *copied_new_reg
= model->m_mgr->get_sized_region (new_reg, NULL,
- old_size_sval);
- model->set_value (sized_new_reg, buffer_content_sval,
+ copied_size_sval);
+ model->set_value (copied_new_reg, buffer_content_sval,
cd.get_ctxt ());
}
  else
@@ -774,6 +776,40 @@ region_model::impl_call_realloc (const call_details )
   else
return true;
 }
+
+  private:
+/* Return the lesser of OLD_SIZE_SVAL and NEW_SIZE_SVAL.
+   If either one is symbolic, the symbolic svalue is returned.  */
+const svalue *get_copied_size (const svalue *old_size_sval,
+  const svalue *new_size_sval) const
+{
+  tree old_size_cst = old_size_sval->maybe_get_constant ();
+  tree new_size_cst = new_size_sval->maybe_get_constant ();
+
+  if (old_size_cst && new_size_cst)
+   {
+ /* Both are constants and comparable.  */
+ tree cmp = fold_binary (LT_EXPR, boolean_type_node,
+ old_size_cst, new_size_cst);
+
+ if (cmp == boolean_true_node)
+   return old_size_sval;
+ else
+   return new_size_sval;
+   }
+  else if (new_size_cst)
+   {
+ /* OLD_SIZE_SVAL is symbolic, so return that.  */
+ return old_size_sval;
+   }
+  else
+   {
+ /* NEW_SIZE_SVAL is symbolic or both are symbolic.
+Return NEW_SIZE_SVAL, because implementations of realloc
+probably only moves the buffer if the new size is larger.  */
+ return new_size_sval;
+   }
+}
   };
 
   /* Body of region_model::impl_call_realloc.  */
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr106539.c 
b/gcc/testsuite/gcc.dg/analyzer/pr106539.c
new file mode 100644
index 000..fd270868e36
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr106539.c
@@ -0,0 +1,15 @@
+#include 
+
+void *test (void)
+{
+  void **p = (void **)malloc (sizeof (void *) * 2);
+  if (!p)
+return NULL;
+  p[0] = malloc(10);
+  p[1] = malloc(20); /* { dg-message "allocated here" }  */
+  void *q = realloc (p, sizeof (void *)); /* { dg-message "when 'realloc' 
succeeds, moving buffer" } */
+  if (!q)
+  /* { dg-warning "leak of ''" "leak of unknown" { target *-*-* } .-1 
} */
+return p;
+  return q;
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/realloc-5.c 
b/gcc/testsuite/gcc.dg/analyzer/realloc-5.c
new file mode 100644
index 000..2efe3371e12
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/realloc-5.c
@@ -0,0 +1,45 @@
+#include "analyzer-decls.h"
+
+typedef __SIZE_TYPE__ size_t;
+
+#define NULL ((void *)0)
+
+extern v

[PATCH 2/2] analyzer: out-of-bounds checker [PR106000]

2022-08-09 Thread Tim Lange
This patch adds an experimental out-of-bounds checker to the analyzer.

The checker was tested on coreutils, curl, httpd and openssh. It is mostly
accurate but does produce false-positives on yacc-generated files and
sometimes when the analyzer misses an invariant. These cases will be
documented in bugzilla.
(Regrtests still running with the latest changes, will report back later.)

2022-08-09  Tim Lange  

gcc/analyzer/ChangeLog:

PR analyzer/106000
* analyzer.opt: Add Wanalyzer-out-of-bounds.
* region-model.cc (class out_of_bounds): Diagnostics base class
for all out-of-bounds diagnostics.
(class past_the_end): Base class derived from out_of_bounds for
the buffer_overflow and buffer_overread diagnostics.
(class buffer_overflow): Buffer overflow diagnostics.
(class buffer_overread): Buffer overread diagnostics.
(class buffer_underflow): Buffer underflow diagnostics.
(class buffer_underread): Buffer overread diagnostics.
(region_model::check_region_bounds): New function to check region
bounds for out-of-bounds accesses.
(region_model::check_region_access):
Add call to check_region_bounds.
(region_model::get_representative_tree): New function that accepts
a region instead of an svalue.
* region-model.h (class region_model):
Add region_model::check_region_bounds.
* region.cc (region::symbolic_p): New predicate. 
(offset_region::get_byte_size_sval): Only return the remaining
byte size on offset_regions.
* region.h: Add region::symbolic_p.
* store.cc (byte_range::intersects_p):
Add new function equivalent to bit_range::intersects_p.
(byte_range::exceeds_p): New function.
(byte_range::falls_short_of_p): New function.
* store.h (struct byte_range): Add byte_range::intersects_p,
byte_range::exceeds_p and byte_range::falls_short_of_p.

gcc/ChangeLog:

PR analyzer/106000
* doc/invoke.texi: Add Wanalyzer-out-of-bounds.

gcc/testsuite/ChangeLog:

PR analyzer/106000
* gcc.dg/analyzer/allocation-size-3.c:
Disable out-of-bounds warning.
* gcc.dg/analyzer/memcpy-2.c: Disable out-of-bounds warning.
* gcc.dg/analyzer/pr101962.c: Add dg-warning.
* gcc.dg/analyzer/pr97029.c:
Add dummy buffer to prevent an out-of-bounds warning.
* gcc.dg/analyzer/test-setjmp.h:
Add dummy buffer to prevent an out-of-bounds warning.
* gcc.dg/analyzer/zlib-3.c: Add dg-bogus.
* gcc.dg/analyzer/out-of-bounds-1.c: New test.
* gcc.dg/analyzer/out-of-bounds-2.c: New test.
* gcc.dg/analyzer/out-of-bounds-3.c: New test.
* gcc.dg/analyzer/out-of-bounds-container_of.c: New test.
* gcc.dg/analyzer/out-of-bounds-coreutils.c: New test.
* gcc.dg/analyzer/out-of-bounds-curl.c: New test.

---
 gcc/analyzer/analyzer.opt |   4 +
 gcc/analyzer/region-model.cc  | 410 ++
 gcc/analyzer/region-model.h   |   3 +
 gcc/analyzer/region.cc|  32 ++
 gcc/analyzer/region.h |   4 +
 gcc/analyzer/store.cc |  67 +++
 gcc/analyzer/store.h  |   9 +
 gcc/doc/invoke.texi   |  12 +
 .../gcc.dg/analyzer/allocation-size-3.c   |   2 +
 gcc/testsuite/gcc.dg/analyzer/memcpy-2.c  |   2 +-
 .../gcc.dg/analyzer/out-of-bounds-1.c | 119 +
 .../gcc.dg/analyzer/out-of-bounds-2.c |  83 
 .../gcc.dg/analyzer/out-of-bounds-3.c |  91 
 .../analyzer/out-of-bounds-container_of.c |  51 +++
 .../gcc.dg/analyzer/out-of-bounds-coreutils.c |  29 ++
 .../gcc.dg/analyzer/out-of-bounds-curl.c  |  41 ++
 gcc/testsuite/gcc.dg/analyzer/pr101962.c  |   5 +-
 gcc/testsuite/gcc.dg/analyzer/pr97029.c   |   4 +-
 gcc/testsuite/gcc.dg/analyzer/test-setjmp.h   |   4 +-
 gcc/testsuite/gcc.dg/analyzer/zlib-3.c|   4 +-
 20 files changed, 970 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-container_of.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-coreutils.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-curl.c

diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
index 5021376b6fb..8e73af60ceb 100644
--- a/gcc/analyzer/analyzer.opt
+++ b/gcc/analyzer/analyzer.opt
@@ -158,6 +158,10 @@ Wanalyzer-tainted-size
 Common Var(warn_analyzer_tainted_size) Init(1) Warning
 Warn about code paths in which an unsanitized value is used as a size.
 
+Wanalyzer-out-of-bounds
+Common Var(warn_analyzer_out_of_bounds) Init(1

[PATCH 1/2] analyzer: consider that realloc could shrink the buffer [PR106539]

2022-08-09 Thread Tim Lange
This patch adds the "shrinks buffer" case to the success_with_move
modelling of realloc.

2022-08-09  Tim Lange  

gcc/analyzer/ChangeLog:

PR analyzer/106539
* region-model-impl-calls.cc (region_model::impl_call_realloc):
Add get_copied_size function and pass the result as the size of the
new sized_region.

---
 gcc/analyzer/region-model-impl-calls.cc | 37 -
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/gcc/analyzer/region-model-impl-calls.cc 
b/gcc/analyzer/region-model-impl-calls.cc
index 8c38e9206fa..50a19a52a21 100644
--- a/gcc/analyzer/region-model-impl-calls.cc
+++ b/gcc/analyzer/region-model-impl-calls.cc
@@ -737,9 +737,11 @@ region_model::impl_call_realloc (const call_details )
  old_size_sval);
  const svalue *buffer_content_sval
= model->get_store_value (sized_old_reg, cd.get_ctxt ());
+ const svalue *copied_size_sval
+   = get_copied_size (old_size_sval, new_size_sval);
  const region *sized_new_reg
= model->m_mgr->get_sized_region (new_reg, NULL,
- old_size_sval);
+ copied_size_sval);
  model->set_value (sized_new_reg, buffer_content_sval,
cd.get_ctxt ());
}
@@ -774,6 +776,39 @@ region_model::impl_call_realloc (const call_details )
   else
return true;
 }
+
+  private:
+/* Return the size svalue for the new region allocated by realloc.  */
+const svalue *get_copied_size (const svalue *old_size_sval,
+  const svalue *new_size_sval) const
+{
+  tree old_size_cst = old_size_sval->maybe_get_constant ();
+  tree new_size_cst = new_size_sval->maybe_get_constant ();
+
+  if (old_size_cst && new_size_cst)
+   {
+ /* Both are constants and comparable.  */
+ tree cmp = fold_binary (LT_EXPR, boolean_type_node,
+ old_size_cst, new_size_cst);
+
+ if (cmp == boolean_true_node)
+   return old_size_sval;
+ else
+   return new_size_sval;
+   }
+  else if (new_size_cst)
+   {
+ /* OLD_SIZE_SVAL is symbolic, so return that.  */
+ return old_size_sval;
+   }
+  else
+   {
+ /* NEW_SIZE_SVAL is symbolic or both are symbolic.
+Return NEW_SIZE_SVAL, because implementations of realloc
+probably only moves the buffer if the new size is larger.  */
+ return new_size_sval;
+   }
+}
   };
 
   /* Body of region_model::impl_call_realloc.  */
-- 
2.37.1



GCC warns on defined behavior with Wrestrict?

2022-07-29 Thread Tim Lange
Hi everyone,

while testing a new buffer overlap and restrict checker in the analyzer,
it emitted a warning inside coreutils. During the discussion [0], Paul
Eggert posted a link to the current draft of the next C standard [1]
with new examples for the definition of 'restrict'. Especially example 3
in 6.7.3.1 is interesting:

  void h(int n, int * restrict p, int * restrict q, int * restrict r)
  {
int i;
for (i = 0; i < n; i++)
  p[i] = q[i] + r[i];
  }

The draft says that a call h(100, a, b, b) is *defined* behavior because
'b' is never modified inside the method. That brings up the question how
GCC should handle this. At the moment, Wrestrict (and my new
Wanalyzer-restrict) warns on defined behavior:

  /path/to/main.c:70:13: warning: passing argument 3 to ‘restrict’-qualified 
parameter aliases with argument 4 [-Wrestrict]
 70 |   h(100, a, b, b);
| ^  ~

But finding out that 'q' and 'r' are never modified needs a pass over
the callee. Further, when the callee implementation isn't available at
the point Wrestrict runs, we couldn't emit a warning at all if we don't
want any false positive.

I thought maybe a tradeoff would be to keep warning without checking for
writes on parameters in the callee but additionally issue a fix-it hint
that if the memory location the parameter points to is never written, to
add the points-to const type qualifier.
The addition of the const qualifier simplifies deducing that the
memory location the parameter point to is never written for Wrestrict
and already silences the warning.

What do you think?

- Tim

[0] https://lists.gnu.org/archive/html/bug-gnulib/2022-07/msg00066.html
[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912


Re: Setting up editors for the GNU/GCC coding style?

2022-07-28 Thread Tim Lange




On Thu, Jul 28 2022 at 02:46:58 PM -0400, David Malcolm via Gcc 
 wrote:

Is there documentation on setting up text editors to work with our
coding style?  A lot of the next generation of developers aren't using
vi or emacs; they's using VS Code, CLion, and other editors.  Does
anyone have docs on e.g. how to set up VS Code, CLion, etc (IntelliJ 
?)

to work well on GCC's own code base.  FWIW I use Emacs; I've dabbed
with VS Code but haven't used it "for real".


I did prepare my first patch(es) with vscode. For debugging, I set up 
vscode to launch gcc with gdbserver as wrapper and then let the vscode 
debugger to connect to the gdbserver. At first, I tried to get the gnu 
coding style to work in the hacky way by using tabSize=8 and rebinding 
tab to 2 spaces but later ditched that because it bothered me more than 
doing just spaces and replacing 8 spaces with 1 tab before sending the 
patch. That still wastes time because all files that I didn't touch 
look ugly unless I temporarily change the tabSize and some comments 
don't use tabs so I can't just replace all 8 spaces with 1 tab. For 
reference, my config files for gcc are available at [0].


- Tim

[0] https://gist.github.com/timll/1c4c542c7c98e3610c14aec19cdf7e91



I'm hoping to add this to my newbies guide.

Thanks
Dave






[PATCH] Fix handling of zero capacity regions in -Wanalyzer-allocation-size [PR106394]

2022-07-22 Thread Tim Lange
This patch unifies the handling of zero capacity regions for structs
and other types in the allocation size checker.
Regression-tested on x86_64 Linux.

2022-07-22  Tim Lange  

gcc/analyzer/ChangeLog:

PR analyzer/106394
* region-model.cc (capacity_compatible_with_type): Always return true
if alloc_size is zero.

gcc/testsuite/ChangeLog:

PR analyzer/106394
* gcc.dg/analyzer/pr106394.c: New test.

---
 gcc/analyzer/region-model.cc |  2 +-
 gcc/testsuite/gcc.dg/analyzer/pr106394.c | 19 +++
 2 files changed, 20 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr106394.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 8b7b4e1f697..e01c30407c4 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -2956,7 +2956,7 @@ capacity_compatible_with_type (tree cst, tree 
pointee_size_tree,
   unsigned HOST_WIDE_INT alloc_size = TREE_INT_CST_LOW (cst);

   if (is_struct)
-return alloc_size >= pointee_size;
+return alloc_size == 0 || alloc_size >= pointee_size;
   return alloc_size % pointee_size == 0;
 }

diff --git a/gcc/testsuite/gcc.dg/analyzer/pr106394.c 
b/gcc/testsuite/gcc.dg/analyzer/pr106394.c
new file mode 100644
index 000..96bb175fc14
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr106394.c
@@ -0,0 +1,19 @@
+struct msm_gpu {
+  // [...snip...]
+  const struct msm_gpu_perfcntr *perfcntrs;
+  // [...snip...]
+};
+
+struct msm_gpu_perfcntr {
+  // [...snip...]
+  const char *name;
+};
+
+static const struct msm_gpu_perfcntr perfcntrs[] = {};
+
+struct msm_gpu *test(struct msm_gpu *gpu) {
+  // [...snip...]
+  gpu->perfcntrs = perfcntrs;
+  // [...snip...]
+  return gpu;
+}
--
2.36.1


Re: Floating-point allocation sizes? (was Re: [PATCH] analyzer: Fix handling of non-ints inside allocation size checker [PR106181])

2022-07-05 Thread Tim Lange




On Tue, Jul 5 2022 at 05:37:46 PM -0400, David Malcolm 
 wrote:

On Tue, 2022-07-05 at 21:49 +0200, Tim Lange wrote:
 This patch fixes the ICE reported in PR106181 by Arseny Solokha. 
With
 this patch, the allocation size checker tries to handle 
floating-point

 operands of allocation size arguments. Constant sizes get implicitly
 converted and symbolic sizes are handled as long as the 
floating-point
 operand could also be represented as a positive integer. In all 
other

 cases and on unhandled constants, the checker falls back to not
 emitting a warning.
 Also, I unified the logic on zero byte allocations.


Hi Tim

Thanks for the patch.

We definitely don't want to crash, but my "gut reaction" to the
testsuite examples was that we ought to be warning on them - using
floating point when calculating an allocation size seems like asking
for trouble.

In particular test_16's:
  int32_t *ptr = malloc (n * 3.1);
feels to me like it deserves a warning.  I suppose it could be valid 
if

n is a multiple of 40 (so that the buffer is a multiple of 31 * 4 and
thus a multiple of 4), for small enough n that we don't lose 
precision,

but that code seems very questionable - the comment says "just assume
that the programmer knows what they are doing", but I think anyone
using -fanalyzer is opting-in to have more fussy checking and would
probably want to be warned about such code.
While fixing that case, I thought what sane person would think of using 
floating-point arithmetic here. The main reason why I chose to give up 
here instead of complain was because the checker can't know the result 
and it is strange enough such that it might be deliberately.
In that sense, we could also talk about allocating 0 bytes. What 
happens there seems to be undefined behavior and 
implementation-specific. I've again decided to say that allocating 0 
bytes is strange enough that it must be deliberately. The same standard 
you've linked also has a article about that case [0]. If all readers 
can't thing of any use case, I can certainly rework that patch to warn 
on such allocations.


- Tim

[0] 
https://wiki.sei.cmu.edu/confluence/display/c/MEM04-C.+Beware+of+zero-length+allocations


I also wondered what happens on NAN, with e.g.

#include 

void test_nan (void)
{
  int *p = malloc (NAN * sizeof (int));
}

but we issue -Woverflow on that.

I'm thinking that perhaps we should have a new warning for floating
point buffer size calculations, though I'm not yet sure exactly how it
should work and how fussy it should be (e.g. complain about floating
point calculations vs complain about *any* floating point used as a
buffer size, etc).

Does anyone know of real world code that uses floating point in 
buffer-

size calculations?  (updating Subject accordingly)  Is there code out
there that does this?  It seems broken to me, but maybe there's a 
valid

use-case that I can't think of.

The closest such rule I can think of is CERT-C's
"FLP02-C. Avoid using floating-point numbers when precise computation
is needed":
https://wiki.sei.cmu.edu/confluence/display/c/FLP02-C.+Avoid+using+floating-point+numbers+when+precise+computation+is+needed


Dave




 Regression-tested on x86_64 linux.

 2022-07-05  Tim Lange  

 gcc/analyzer/ChangeLog:

 PR analyzer/106181
 * region-model.cc (capacity_compatible_with_type):
 Can handle non-integer constants now.
 (region_model::check_region_size): Adapted to the new 
signature

 of
 capacity_compatible_with_type.

 gcc/testsuite/ChangeLog:

 PR analyzer/106181
 * gcc.dg/analyzer/allocation-size-1.c: New tests.
 * gcc.dg/analyzer/allocation-size-2.c: New tests.
 * gcc.dg/analyzer/pr106181.c: New test.

 ---
  gcc/analyzer/region-model.cc  | 44 
---

  .../gcc.dg/analyzer/allocation-size-1.c   | 29 +++-
  .../gcc.dg/analyzer/allocation-size-2.c   | 22 ++
  gcc/testsuite/gcc.dg/analyzer/pr106181.c  |  7 +++
  4 files changed, 95 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr106181.c

 diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-
 model.cc
 index 5d939327e01..e097ecb3c07 100644
 --- a/gcc/analyzer/region-model.cc
 +++ b/gcc/analyzer/region-model.cc
 @@ -2904,13 +2904,45 @@ private:

  static bool
  capacity_compatible_with_type (tree cst, tree pointee_size_tree,
 -  bool is_struct)
 +  bool is_struct, bool floor_real)
  {
 -  gcc_assert (TREE_CODE (cst) == INTEGER_CST);
gcc_assert (TREE_CODE (pointee_size_tree) == INTEGER_CST);
 -
unsigned HOST_WIDE_INT pointee_size = TREE_INT_CST_LOW
 (pointee_size_tree);
 -  unsigned HOST_WIDE_INT alloc_size = TREE_INT_CST_LOW (cst);
 +
 +  unsigned HOST_WIDE_INT alloc_size;
 +  switch (TREE_CODE (cst))
 +{
 +default:
 +  /* Assume all unhandled operands are compatible.  */
 

[PATCH] analyzer: Fix handling of non-ints inside allocation size checker [PR106181]

2022-07-05 Thread Tim Lange
This patch fixes the ICE reported in PR106181 by Arseny Solokha. With
this patch, the allocation size checker tries to handle floating-point
operands of allocation size arguments. Constant sizes get implicitly
converted and symbolic sizes are handled as long as the floating-point
operand could also be represented as a positive integer. In all other
cases and on unhandled constants, the checker falls back to not
emitting a warning.
Also, I unified the logic on zero byte allocations.

Regression-tested on x86_64 linux.

2022-07-05  Tim Lange  

gcc/analyzer/ChangeLog:

PR analyzer/106181
* region-model.cc (capacity_compatible_with_type):
Can handle non-integer constants now.
(region_model::check_region_size): Adapted to the new signature of
capacity_compatible_with_type.

gcc/testsuite/ChangeLog:

PR analyzer/106181
* gcc.dg/analyzer/allocation-size-1.c: New tests.
* gcc.dg/analyzer/allocation-size-2.c: New tests.
* gcc.dg/analyzer/pr106181.c: New test.

---
 gcc/analyzer/region-model.cc  | 44 ---
 .../gcc.dg/analyzer/allocation-size-1.c   | 29 +++-
 .../gcc.dg/analyzer/allocation-size-2.c   | 22 ++
 gcc/testsuite/gcc.dg/analyzer/pr106181.c  |  7 +++
 4 files changed, 95 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr106181.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 5d939327e01..e097ecb3c07 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -2904,13 +2904,45 @@ private:
 
 static bool
 capacity_compatible_with_type (tree cst, tree pointee_size_tree,
-  bool is_struct)
+  bool is_struct, bool floor_real)
 {
-  gcc_assert (TREE_CODE (cst) == INTEGER_CST);
   gcc_assert (TREE_CODE (pointee_size_tree) == INTEGER_CST);
-
   unsigned HOST_WIDE_INT pointee_size = TREE_INT_CST_LOW (pointee_size_tree);
-  unsigned HOST_WIDE_INT alloc_size = TREE_INT_CST_LOW (cst);
+
+  unsigned HOST_WIDE_INT alloc_size;
+  switch (TREE_CODE (cst))
+{
+default:
+  /* Assume all unhandled operands are compatible.  */
+  return true;
+case INTEGER_CST:
+  alloc_size = TREE_INT_CST_LOW (cst);
+  break;
+case REAL_CST:
+  {
+   const REAL_VALUE_TYPE *rv = TREE_REAL_CST_PTR (cst);
+   if (floor_real)
+ {
+   /* If the size is constant real at compile-time,
+  we can model the conversion.  */
+   alloc_size = real_to_integer (rv);
+ }
+   else
+ {
+   /* On expressions where the value of one operator isn't
+  representable as an integer or is negative, we give up and
+  just assume that the programmer knows what they are doing.  */
+   HOST_WIDE_INT i;
+   if (real_isneg (rv) || !real_isinteger (rv, ))
+ return true;
+   alloc_size = i;
+ }
+  }
+  break;
+}
+
+  if (alloc_size == 0)
+return true;
 
   if (is_struct)
 return alloc_size >= pointee_size;
@@ -2920,7 +2952,7 @@ capacity_compatible_with_type (tree cst, tree 
pointee_size_tree,
 static bool
 capacity_compatible_with_type (tree cst, tree pointee_size_tree)
 {
-  return capacity_compatible_with_type (cst, pointee_size_tree, false);
+  return capacity_compatible_with_type (cst, pointee_size_tree, false, false);
 }
 
 /* Checks whether SVAL could be a multiple of SIZE_CST.
@@ -3145,7 +3177,7 @@ region_model::check_region_size (const region *lhs_reg, 
const svalue *rhs_sval,
= as_a  (capacity);
tree cst_cap = cst_cap_sval->get_constant ();
if (!capacity_compatible_with_type (cst_cap, pointee_size_tree,
-   is_struct))
+   is_struct, true))
  ctxt->warn (new dubious_allocation_size (lhs_reg, rhs_reg,
   cst_cap));
   }
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c 
b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
index 4a78a81d054..1a1c8e75c98 100644
--- a/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
@@ -114,4 +114,31 @@ void test_10 (int32_t n)
 {
   char *ptr = malloc (7 * n);
   free (ptr);
-}
\ No newline at end of file
+}
+
+void test_11 ()
+{
+  int32_t *ptr = malloc (3.0); /* { dg-line malloc11 } */
+  free (ptr);  
+  /* { dg-warning "allocated buffer size is not a multiple of the pointee's 
size \\\[CWE-131\\\]" "warning" { target *-*-* } malloc11 } */
+  /* { dg-message "'int32_t \\*' (\\\{aka '(long )?int \\*'\\\})? here; 
'sizeof \\(int32_t (\\\{aka (long )?int\\\})?\\)' is '4'" "note" { target *-*-* 
} malloc11 } */
+}
+
+void test_12 ()
+{
+  int32_t *ptr = malloc (4.0);
+  free (ptr);
+}
+

[PATCH] analyzer: Use fixed-width types in allocation size tests

2022-07-02 Thread Tim Lange
Hi,

thanks for the mail! Embarrassingly, I did not account for the different
sizes types may have on different systems. I've switched all testcases
to the fixed-width types of stdint.h. 

Does this patch need an approval?

- Tim

The patch changes the types inside the tests for the allocation size
checker to fixed-width types of stdint.h to account for different
architectures with different type widths.

2022-07-03  Tim Lange  

gcc/testsuite/ChangeLog:

* gcc.dg/analyzer/allocation-size-1.c: Use fixed-length types.
* gcc.dg/analyzer/allocation-size-2.c: Likewise.
* gcc.dg/analyzer/allocation-size-3.c: Likewise.
* gcc.dg/analyzer/allocation-size-4.c: Likewise.
* gcc.dg/analyzer/allocation-size-5.c: Likewise.

---
 .../gcc.dg/analyzer/allocation-size-1.c   | 53 +++---
 .../gcc.dg/analyzer/allocation-size-2.c   | 73 ++-
 .../gcc.dg/analyzer/allocation-size-3.c   | 19 ++---
 .../gcc.dg/analyzer/allocation-size-4.c   | 13 ++--
 .../gcc.dg/analyzer/allocation-size-5.c   | 23 +++---
 5 files changed, 93 insertions(+), 88 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c 
b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
index 4fc2bf75d6c..e6d021a128f 100644
--- a/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
@@ -1,79 +1,80 @@
 #include 
 #include 
+#include 
 
 /* Tests with constant buffer sizes.  */
 
 void test_1 (void)
 {
-  short *ptr = malloc (21 * sizeof (short));
+  int16_t *ptr = malloc (21 * sizeof (int16_t));
   free (ptr);
 }
 
 void test_2 (void)
 {
-  int *ptr = malloc (21 * sizeof (short)); /* { dg-line malloc2 } */
+  int32_t *ptr = malloc (21 * sizeof (int16_t)); /* { dg-line malloc2 } */
   free (ptr);
 
   /* { dg-warning "allocated buffer size is not a multiple of the pointee's 
size \\\[CWE-131\\\]" "warning" { target *-*-* } malloc2 } */
   /* { dg-message "\\d+ bytes" "note" { target *-*-* } malloc2 } */
-  /* { dg-message "'int \\*' here; 'sizeof \\(int\\)' is '\\d+'" "note" { 
target *-*-* } malloc2 } */
+  /* { dg-message "'int32_t \\*' (\\\{aka 'int \\*'\\\})? here; 'sizeof 
\\(int32_t (\\\{aka int\\\})\\)' is '\\d+'" "note" { target *-*-* } malloc2 } */
 }
 
 void test_3 (void)
 {
-  void *ptr = malloc (21 * sizeof (short));
-  short *sptr = (short *)ptr;
+  void *ptr = malloc (21 * sizeof (int16_t));
+  int16_t *sptr = (int16_t *)ptr;
   free (sptr);
 }
 
 void test_4 (void)
 {
-  void *ptr = malloc (21 * sizeof (short)); /* { dg-message "\\d+ bytes" } */
-  int *iptr = (int *)ptr; /* { dg-line assign4 } */
+  void *ptr = malloc (21 * sizeof (int16_t)); /* { dg-message "\\d+ bytes" } */
+  int32_t *iptr = (int32_t *)ptr; /* { dg-line assign4 } */
   free (iptr);
 
   /* { dg-warning "allocated buffer size is not a multiple of the pointee's 
size \\\[CWE-131\\\]" "warning" { target *-*-* } assign4 } */
-  /* { dg-message "'int \\*' here; 'sizeof \\(int\\)' is '\\d+'" "note" { 
target *-*-* } assign4 } */
+  /* { dg-message "'int32_t \\*' (\\\{aka 'int \\*'\\\})? here; 'sizeof 
\\(int32_t (\\\{aka int\\\})\\)' is '\\d+'" "note" { target *-*-* } assign4 } */
 }
 
 void test_5 (void)
 {
-  int user_input;
+  int32_t user_input;
   scanf("%i", _input);
-  int n;
+  int32_t n;
   if (user_input == 0)
-n = 21 * sizeof (short);
+n = 21 * sizeof (int16_t);
   else
-n = 42 * sizeof (short);
+n = 42 * sizeof (int16_t);
   void *ptr = malloc (n);
-  short *sptr = (short *)ptr;
+  int16_t *sptr = (int16_t *)ptr;
   free (sptr);
 }
 
 void test_6 (void)
 {
-  int user_input;
+  int32_t user_input;
   scanf("%i", _input);
-  int n;
+  int32_t n;
   if (user_input == 0)
-n = 21 * sizeof (short);
+n = 21 * sizeof (int16_t);
   else
-n = 42 * sizeof (short);
+n = 42 * sizeof (int16_t);
   void *ptr = malloc (n); /* { dg-message "" "note" } */
   /* ^^^ on widening_svalues no expr is returned
  by get_representative_tree at the moment.  */ 
-  int *iptr = (int *)ptr; /* { dg-line assign6 } */
+  int32_t *iptr = (int32_t *)ptr; /* { dg-line assign6 } */
   free (iptr);
 
   /* { dg-warning "allocated buffer size is not a multiple of the pointee's 
size \\\[CWE-131\\\]" "warning" { target *-*-* } assign6 } */
-  /* { dg-message "'int \\*' here; 'sizeof \\(int\\)' is '\\d+'" "note" { 
target *-*-* } assign6 } */
+  /* { dg-message "'int32_t \\*' (\\\{aka 'int \\*'\\\})? here; 'sizeof 
\\(int32_t (\\\{aka int\\\})\\)' is '\\d+'" "note" { target *-*-* } assign6 } */
 }
 
 void test_7 (void)
 {
-  int user_input;
+  int32_t user_input;
   scanf("%i", _input);
-  int n;
+  int32_t

[PATCH] MAINTAINERS: Add myself to write after approval and DCO

2022-07-02 Thread Tim Lange
Hi everyone,

I've added myself to write after approval and DCO section.

- Tim

2022-07-02  Tim Lange  

ChangeLog:

* MAINTAINERS: Add myself.
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 3c448ba9eb6..17bebefa2db 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -495,6 +495,7 @@ Razya Ladelsky  

 Thierry Lafage 
 Rask Ingemann Lambertsen   
 Jerome Lambourg
+Tim Lange  
 Asher Langton  
 Chris Lattner  
 Terry Laurenzo 
@@ -716,6 +717,7 @@ information.
 
 
 Matthias Kretz 
+Tim Lange  
 Jeff Law   
 Jeff Law   
 Gaius Mulley   
-- 
2.36.1



[PATCH v3] analyzer: add allocation size checker [PR105900]

2022-06-30 Thread Tim Lange
Hi,

here's the updated patch that should address all the comments from the v2.

- Tim


This patch adds an checker that warns about code paths in which a buffer is
assigned to a incompatible type, i.e. when the allocated buffer size is not a
multiple of the pointee's size.

2022-07-30  Tim Lange  

gcc/analyzer/ChangeLog:

PR analyzer/105900
* analyzer.opt: Added Wanalyzer-allocation-size.
* checker-path.cc (region_creation_event::get_desc): Added call to new
virtual function pending_diagnostic::describe_region_creation_event.
* checker-path.h: Added region_creation_event::get_desc.
* diagnostic-manager.cc (diagnostic_manager::add_event_on_final_node):
New function.
* diagnostic-manager.h:
Added diagnostic_manager::add_event_on_final_node.
* pending-diagnostic.h (struct region_creation): New event_desc struct.
(pending_diagnostic::describe_region_creation_event): Added virtual
function to overwrite description of a region creation.
* region-model.cc (class dubious_allocation_size): New class.
(capacity_compatible_with_type): New helper function.
(class size_visitor): New class.
(struct_or_union_with_inheritance_p): New helper function.
(is_any_cast_p): New helper function.
(region_model::check_region_size): New function.
(region_model::set_value): Added call to
region_model::check_region_size.
* region-model.h (class region_model): New function check_region_size.
* svalue.cc (region_svalue::accept): Changed to post-order traversal.
(initial_svalue::accept): Likewise.
(unaryop_svalue::accept): Likewise.
(binop_svalue::accept): Likewise.
(sub_svalue::accept): Likewise.
(repeated_svalue::accept): Likewise.
(bits_within_svalue::accept): Likewise.
(widening_svalue::accept): Likewise.
(unmergeable_svalue::accept): Likewise.
(compound_svalue::accept): Likewise.
(conjured_svalue::accept): Likewise.
(asm_output_svalue::accept): Likewise.
(const_fn_result_svalue::accept): Likewise.

gcc/ChangeLog:

PR analyzer/105900
* doc/invoke.texi: Added Wanalyzer-allocation-size.

gcc/testsuite/ChangeLog:

PR analyzer/105900
* gcc.dg/analyzer/pr96639.c: Changed buffer size to omit warning.
* gcc.dg/analyzer/allocation-size-1.c: New test.
* gcc.dg/analyzer/allocation-size-2.c: New test.
* gcc.dg/analyzer/allocation-size-3.c: New test.
* gcc.dg/analyzer/allocation-size-4.c: New test.
* gcc.dg/analyzer/allocation-size-5.c: New test.

Signed-off-by: Tim Lange 
---
 gcc/analyzer/analyzer.opt |   4 +
 gcc/analyzer/checker-path.cc  |  11 +-
 gcc/analyzer/checker-path.h   |   2 +-
 gcc/analyzer/diagnostic-manager.cc|  61 +++
 gcc/analyzer/diagnostic-manager.h |   4 +
 gcc/analyzer/pending-diagnostic.h |  20 +
 gcc/analyzer/region-model.cc  | 370 ++
 gcc/analyzer/region-model.h   |   2 +
 gcc/analyzer/svalue.cc|  26 +-
 gcc/doc/invoke.texi   |  14 +
 .../gcc.dg/analyzer/allocation-size-1.c   | 116 ++
 .../gcc.dg/analyzer/allocation-size-2.c   | 155 
 .../gcc.dg/analyzer/allocation-size-3.c   |  45 +++
 .../gcc.dg/analyzer/allocation-size-4.c   |  60 +++
 .../gcc.dg/analyzer/allocation-size-5.c   |  36 ++
 gcc/testsuite/gcc.dg/analyzer/pr96639.c   |   2 +-
 16 files changed, 912 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-5.c

diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
index 4aea52d3a87..1d612246a30 100644
--- a/gcc/analyzer/analyzer.opt
+++ b/gcc/analyzer/analyzer.opt
@@ -54,6 +54,10 @@ The minimum number of supernodes within a function for the 
analyzer to consider
 Common Joined UInteger Var(param_analyzer_max_enodes_for_full_dump) Init(200) 
Param
 The maximum depth of exploded nodes that should appear in a dot dump before 
switching to a less verbose format.
 
+Wanalyzer-allocation-size
+Common Var(warn_analyzer_allocation_size) Init(1) Warning
+Warn about code paths in which a pointer to a buffer is assigned to an 
incompatible type.
+
 Wanalyzer-double-fclose
 Common Var(warn_analyzer_double_fclose) Init(1) Warning
 Warn about code paths in which a stdio FILE can be closed more than once.
diff --git a/gcc/analyzer/checker-path.cc b/gcc/analyzer/checker-path.cc
index 0133dc94137..953e192cd55 100644
--- a/gcc/analyzer/checker-path.cc

Re: [PATCH v2] analyzer: add allocation size checker

2022-06-30 Thread Tim Lange
On Wed Jun 29, 2022 at 7:39 PM CEST, David Malcolm wrote:
> On Wed, 2022-06-29 at 17:39 +0200, Tim Lange wrote:
>
> > Hi,
>
> Thanks for the updated patch.
>
> Overall, looks nearly ready; various nits inline below, throughout...
>
> > 
> > I've addressed most of the points from the review.
> > * The allocation size warning warns whenever region_model::get_capacity 
> > returns
> > something useful, i.e. also on statically-allocated regions.
>
> Thanks.  Looks like you added test coverage for this in allocation-
> size-5.c
>
> > * I added a new virtual function to the pending-diagnostic class, so
> that it
> > is possible to emit a custom region creation description.
> > * The test cases should have a better coverage now.
> > * Conservative struct handling
> > 
> > The warning now looks like this:
> > /path/to/main.c:9:8: warning: allocated buffer size is not a multiple of 
> > the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
> > 9 |   int *iptr = ptr;
> >   |^~~~
> >   ‘main’: events 1-2
> > |
> > |8 |   void *ptr = malloc((long unsigned int)n * sizeof(short));
> > |  |   ^
> > |  |   |
> > |  |   (1) allocated ‘(long unsigned int)n * 2’ bytes 
> > here
> > |9 |   int *iptr = ptr;
> > |  |
> > |  ||
> > |  |(2) assigned to ‘int *’ here; ‘sizeof(int)’ is ‘4’
> > |
>
> Looks great.
>
> > 
> > /path/to/main.c:15:15: warning: allocated buffer size is not a multiple of 
> > the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
> >15 |   int *ptrw = malloc (sizeof (short));
> >   |   ^~~
> >   ‘main’: events 1-2
> > |
> > |   15 |   int *ptrw = malloc (sizeof (short));
> > |  |   ^~~
> > |  |   |
> > |  |   (1) allocated ‘2’ bytes here
>
> Looks a bit weird to be quoting a number here; maybe whenever the
> expression is just a constant, print it unquoted?  (though that could
> be fiddly to implement, so can be ignored if it turns out to be) .

Isn't the 'q' in '%qE' responsible for quoting. Using '%E' instead if
m_expr is an INTEGER_CST works.

Otherwise, I've left the quoting on the "'sizeof (int)' is '4'" note.
I do think thata looks better than without.

/path/to/main.c:16:15: warning: allocated buffer size is not a multiple of the 
pointee's size [CWE-131] [-Wanalyzer-allocation-size]
   16 |   int *ptrw = malloc (21 * sizeof (short));
  |   ^~~~
  ‘main’: events 1-2
|
|   16 |   int *ptrw = malloc (21 * sizeof (short));
|  |   ^~~~
|  |   |
|  |   (1) allocated 42 bytes here
|  |   (2) assigned to ‘int *’ here; ‘sizeof (int)’ is ‘4’
|

>
>
> > |  |   (2) assigned to ‘int *’ here; ‘sizeof (int)’ is 
> > ‘4’
> > |
> > The only thing I couldn't address was moving the second event toward the 
> > lhs or
> > assign token here. I tracked it down till get_stmt_location where it seems 
> > that
> > the rhs is actually the location of the statement. Is there any way to get 
> > (2)
> > to be focused on the lhs?
>
> Annoyingly, we've lost a lot of location information by the time the
> analyzer runs.
>
> In theory we could special-case for when we have the def-stmt of the
> SSA_NAME that's that default (i.e. initial) value of a VAR_DECL, and if
> we see the write is there, we could use the DECL_SOUCE_LOCATION of the
> VAR_DECL for the write, so that we'd get:
>
> |   15 |   int *ptrw = malloc (sizeof (short));
> |  |^~~~   ^~~
> |  ||  |
> |  ||  (1) allocated ‘2’ bytes here
> |  |(2) assigned to ‘int *’ here; ‘sizeof (int)’ is ‘4’
> |
>
> which is perhaps slightly more readable.  I'm not sure it's worth it
> though.

Hm, okay. I've left that out for now.

>
> > 
> > Otherwise, the patch compiled coreutils, openssh, curl and httpd without any
> > false-positive (but none of them contained a bug found by the checker 
> > either).
>
> Great.
>
> > `make check-gcc RUNTESTFLAGS="analyzer.exp"` tests pass and as I just 
> > worked on
> > the event splitting, the regression tests are yet to run.
> > 
> > - Tim
> > 
> > 
> 

[PATCH v2] analyzer: add allocation size checker

2022-06-29 Thread Tim Lange
Hi,

I've addressed most of the points from the review.
* The allocation size warning warns whenever region_model::get_capacity returns
something useful, i.e. also on statically-allocated regions.
* I added a new virtual function to the pending-diagnostic class, so that it
is possible to emit a custom region creation description.
* The test cases should have a better coverage now.
* Conservative struct handling

The warning now looks like this:
/path/to/main.c:9:8: warning: allocated buffer size is not a multiple of the 
pointee's size [CWE-131] [-Wanalyzer-allocation-size]
9 |   int *iptr = ptr;
  |^~~~
  ‘main’: events 1-2
|
|8 |   void *ptr = malloc((long unsigned int)n * sizeof(short));
|  |   ^
|  |   |
|  |   (1) allocated ‘(long unsigned int)n * 2’ bytes here
|9 |   int *iptr = ptr;
|  |
|  ||
|  |(2) assigned to ‘int *’ here; ‘sizeof(int)’ is ‘4’
|

/path/to/main.c:15:15: warning: allocated buffer size is not a multiple of the 
pointee's size [CWE-131] [-Wanalyzer-allocation-size]
   15 |   int *ptrw = malloc (sizeof (short));
  |   ^~~
  ‘main’: events 1-2
|
|   15 |   int *ptrw = malloc (sizeof (short));
|  |   ^~~
|  |   |
|  |   (1) allocated ‘2’ bytes here
|  |   (2) assigned to ‘int *’ here; ‘sizeof (int)’ is ‘4’
|
The only thing I couldn't address was moving the second event toward the lhs or
assign token here. I tracked it down till get_stmt_location where it seems that
the rhs is actually the location of the statement. Is there any way to get (2)
to be focused on the lhs?

Otherwise, the patch compiled coreutils, openssh, curl and httpd without any
false-positive (but none of them contained a bug found by the checker either). 
`make check-gcc RUNTESTFLAGS="analyzer.exp"` tests pass and as I just worked on
the event splitting, the regression tests are yet to run.

- Tim


This patch adds an checker that warns about code paths in which a buffer is
assigned to a incompatible type, i.e. when the allocated buffer size is not a
multiple of the pointee's size.

gcc/analyzer/ChangeLog:

* analyzer.opt: Added Wanalyzer-allocation-size.
* checker-path.cc (region_creation_event::get_desc): Added call to new
virtual function pending_diagnostic::describe_region_creation_event.
* checker-path.h: Added region_creation_event::get_desc.
* diagnostic-manager.cc (diagnostic_manager::add_event_on_final_node):
New function.
* diagnostic-manager.h:
Added diagnostic_manager::add_event_on_final_node.
* pending-diagnostic.h (struct region_creation): New event_desc struct.
(pending_diagnostic::describe_region_creation_event): Added virtual
function to overwrite description of a region creation.
* region-model.cc (class dubious_allocation_size): New class.
(capacity_compatible_with_type): New helper function.
(class size_visitor): New class.
(struct_or_union_with_inheritance_p): New helper function.
(is_any_cast_p): New helper function.
(region_model::check_region_size): New function.
(region_model::set_value): Added call to
region_model::check_region_size.
* region-model.h (class region_model): New function check_region_size.
* svalue.cc (region_svalue::accept): Changed to post-order traversal.
(initial_svalue::accept): Likewise.
(unaryop_svalue::accept): Likewise.
(binop_svalue::accept): Likewise.
(sub_svalue::accept): Likewise.
(repeated_svalue::accept): Likewise.
(bits_within_svalue::accept): Likewise.
(widening_svalue::accept): Likewise.
(unmergeable_svalue::accept): Likewise.
(compound_svalue::accept): Likewise.
(conjured_svalue::accept): Likewise.
(asm_output_svalue::accept): Likewise.
(const_fn_result_svalue::accept): Likewise.

gcc/ChangeLog:

* doc/invoke.texi: Added Wanalyzer-allocation-size.

gcc/testsuite/ChangeLog:

* gcc.dg/analyzer/pr96639.c: Changed buffer size to omit warning.
* gcc.dg/analyzer/allocation-size-1.c: New test.
* gcc.dg/analyzer/allocation-size-2.c: New test.
* gcc.dg/analyzer/allocation-size-3.c: New test.
* gcc.dg/analyzer/allocation-size-4.c: New test.
* gcc.dg/analyzer/allocation-size-5.c: New test.

Signed-off-by: Tim Lange 
---
 gcc/analyzer/analyzer.opt |   4 +
 gcc/analyzer/checker-path.cc  |  11 +-
 gcc/analyzer/checker-path.h   |   2 +-
 gcc/analyzer/diagnostic-manager.cc|  61 
 gcc/analyzer/diagnostic-manager.h |   4 +
 gcc/analyzer/pending-di

Re: [RFC] analyzer: allocation size warning

2022-06-22 Thread Tim Lange
The checker reaches region-model.cc#3083 in my patch with the
  impl_region_model_context
on the 'after' node of create_buffer() but then discards the warning inside
impl_region_model_context::warn because m_stmt is null. Even if m_stmt were
not be NULL at the 'after' node, my warning would be emitted before the
return edge was taken and thus be wrongly indented like shown below:
/path/to/.c:10:16: warning: Allocated buffer size is not a multiple of the 
pointee's size [CWE-131] [-Wanalyzer-allocation-size]
   10 | int *buf = create_buffer(42);
  |^
  ‘main’: events 1-2
|
|9 |   int main (int argc, char **argv) {
|  |   ^~~~
|  |   |
|  |   (1) entry to ‘main’
|   10 | int *buf = create_buffer(42);
|  |~
|  ||
|  |(2) calling ‘create_buffer’ from ‘main’
|
+--> ‘create_buffer’: events 3-4
   |
   |4 |   void *create_buffer(int n)
   |  | ^
   |  | |
   |  | (3) entry to ‘create_buffer’
   |5 |   {
   |6 | return malloc(n);
   |  |~
   |  ||
   |  |(4) allocated 42 bytes here
   |
 ‘main’: event 5
   |
   |   10 | int *buf = create_buffer(42);
   |  |^
   |  ||
   |  |(5) Assigned to ‘int *’ here
   |
   
The correct warning should be:
/path/to/.c:10:16: warning: Allocated buffer size is not a multiple of the 
pointee's size [CWE-131] [-Wanalyzer-allocation-size]
   10 | int *buf = create_buffer(42);
  |^
  ‘main’: events 1-2
|
|9 |   int main (int argc, char **argv) {
|  |   ^~~~
|  |   |
|  |   (1) entry to ‘main’
|   10 | int *buf = create_buffer(42);
|  |~
|  ||
|  |(2) calling ‘create_buffer’ from ‘main’
|
+> ‘create_buffer’: events 3-4
   |
   |4 |   void *create_buffer(int n)
   |  | ^
   |  | |
   |  | (3) entry to ‘create_buffer’
   |5 |   {
   |6 | return malloc(n);
   |  |~
   |  ||
   |  |(4) allocated 42 bytes here
   |
‘main’: event 5 <--+
   |
   |   10 | int *buf = create_buffer(42);
   |  |^
   |  ||
   |  |(5) Assigned to ‘int *’ here
   |
For that, the return edge has to be visited to be part of the emission_path.
This is currently not the case as the assignment of the  to
the caller lhs is handled inside pop_frame, which is transitively called
from program_state::on_edge of the 'after' node of the callee.
I tried to defer the set_value(caller lhs, ) call to the
'before' node after the return edge but failed to do elegantly. My last try
is in the patch commented out with // FIXME.
My main problem is that I can not pop the frame and later get the
return value easily. Deferring the whole pop_frame to the before node
breaks the assumptions inside exploded_graph::get_or_create_node.

I don't know what's the best/elegant way of solving this. Is a solution to
attach the return svalue to the return edge and then use it later in the
PK_BEFORE_SUPERNODE?

Signed-off-by: Tim Lange 
---
 gcc/analyzer/analyzer.opt |   4 +
 gcc/analyzer/checker-path.cc  |  12 +-
 gcc/analyzer/checker-path.h   |   2 +-
 gcc/analyzer/engine.cc|  12 +
 gcc/analyzer/pending-diagnostic.h |  21 ++
 gcc/analyzer/region-model.cc  | 322 ++
 gcc/analyzer/region-model.h   |   4 +
 .../gcc.dg/analyzer/allocation-size-1.c   |  63 
 .../gcc.dg/analyzer/allocation-size-2.c   |  44 +++
 .../gcc.dg/analyzer/allocation-size-3.c   |  48 +++
 .../gcc.dg/analyzer/allocation-size-4.c   |  92 +
 gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c |   2 +
 gcc/testsuite/gcc.dg/analyzer/malloc-4.c  |   2 +-
 gcc/testsuite/gcc.dg/analyzer/pr96639.c   |   2 +
 14 files changed, 627 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocati

Re: [RFC] analyzer: allocation size warning

2022-06-21 Thread Tim Lange
On Sat Jun 18, 2022 at 12:13 AM CEST, David Malcolm wrote:
> On Fri, 2022-06-17 at 22:23 +0200, Tim Lange wrote:
> > On Fri, Jun 17, 2022 at 01:48:09PM -0400, David Malcolm wrote:
> > > On Fri, 2022-06-17 at 17:54 +0200, Tim Lange wrote:
>
> [...snip...]
>
> > > 
> > 
> > I have resent the patch using git send-email as a reply to my original
> > message. 
> > The new message looks properly formatted in the archive:
> >     https://gcc.gnu.org/pipermail/gcc/2022-June/238911.html
>
> Thanks; that's *much* more readable.
>
>
> [...snip...]
>
> > > 
> > > 
> > > 
> > > On symbolic buffer sizes:
> > > warning: Allocated buffer size is not a multiple of the pointee's
> > > size 
> > > [CWE-131] [-Wanalyzer-allocation-size]
> > >    33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line malloc3 }
> > > */
> > >   | ^~~~
> > >   ‘test_3’: event 1
> > >     |
> > >     | 33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line malloc3
> > > } 
> > > */
> > >     | | ^~~~
> > >     | | |
> > >     | | (1) Allocation is incompatible with ‘int *’; either the 
> > > allocated size is bogus or the type on the left-hand side is wrong
> > >     |
> > > 
> > > 
> > > Is there location information for both the malloc and for the
> > > assignment, here?
> > 
> > I'm not sure whether I understand your question but the warning is 
> > emitted at the gcall* with a ssa var lhs and the call_fndecl on the
> > rhs.
> > I think that is enough to split that up into "(1) n + sizeof(int) 
> > allocated here" and "(2) Allocation at (1) is incompatible with..."? 
>
> Probably, yes.
>
> FWIW I wrote some more notes about the events in my reply to to your
> reply to Prathamesh, here:
>   https://gcc.gnu.org/pipermail/gcc/2022-June/238917.html
>
> [...snip...]
>
> > > 
> > > There are some things to discuss from my side:
> > > * The tests with the "toy re-implementation of CPython's object 
> > > model"[2] fail due to a extra warning emitted. Because the analyzer
> > > can't know the calculation actually results in a correct buffer size 
> > > when viewed as a string_obj later on, it emits a warning, e.g. at
> > > line 
> > > 61 in data-model-5.c. The only mitigation would be to disable the 
> > > warning for structs entirely. Now, the question is to rather have
> > > noise
> > > on these cases or disable the warning for structs entirely?
> > > 
> > > Can you post the full warning please?
> > 
> > /path/to/data-model-5.c: In function ‘alloc_obj’:
> > /path/to/data-model-5.c:61:31: warning: Allocated buffer size is not a
> > multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
> >    61 |   base_obj *obj = (base_obj *)malloc (sz);
> >   |   ^~~
> >   ‘new_string_obj’: events 1-2
> >     |
> >     |   69 | base_obj *new_string_obj (const char *str)
> >     |  |   ^~
> >     |  |   |
> >     |  |   (1) entry to ‘new_string_obj’
> >     |..
> >     |   75 | = (string_obj *)alloc_obj (_type, sizeof
> > (string_obj) + len + 1);
> >     |  |
> > 
> >     |  | |
> >     |  | (2) calling ‘alloc_obj’ from
> > ‘new_string_obj’
> >     |
> >     +--> ‘alloc_obj’: events 3-4
> >    |
> >    |   59 | base_obj *alloc_obj (type_obj *ob_type, size_t sz)
> >    |  |   ^
> >    |  |   |
> >    |  |   (3) entry to ‘alloc_obj’
> >    |   60 | {
> >    |   61 |   base_obj *obj = (base_obj *)malloc (sz);
> >    |  |   ~~~
> >    |  |   |
> >    |  |   (4) Allocation is
> > incompatible with ‘base_obj *’; either the allocated size is bogus or
> > the type on the left-hand side is wrong
> >    |
> > 
> > > 
> > > These testcases exhibit a common way of faking inheritance in C, and
> > > I
> > > think it ought to be possible to support this in the warning.
> > > 
> > > I thing what's happ

Re: [RFC] analyzer: allocation size warning

2022-06-17 Thread Tim Lange
On Fri, Jun 17, 2022 at 01:48:09PM -0400, David Malcolm wrote:
> On Fri, 2022-06-17 at 17:54 +0200, Tim Lange wrote:
> > Hi everyone,
> 
> Hi Tim.
> 
> Thanks for the patch.
> 
> Various comments inline below, throughout...
> 
> > 
> > tracked in PR105900 [0], I'd like to add support for a new warning on
> > dubious allocation sizes. The new checker emits a warning when the 
> > allocation size is not a multiple of the type's size. With the checker,
> > following mistakes are detected:
> >   int *arr = malloc(3); // forgot to multiply by sizeof
> >   arr[0] = ...;
> >   arr[1] = ...;
> > or
> >   int *buf = malloc (n + sizeof(int)); // probably should be * instead 
> > of +
> > Because it is implemented inside the analyzer, it also emits warnings
> > when the buffer is first of type void* and later on casted to something
> > else. Though, this also inherits a limitation. The checker can not 
> > distinguish 2 * sizeof(short) from sizeof(int) because sizeof is 
> > resolved and constants are folded at the point when the analyzer runs. 
> > As a mitigation, I plan to implement a check in the frontend that emits
> > a warning if sizeof(lhs pointee type) is not part of the malloc 
> > argument.
> > 
> > I'm looking for a first feedback on the phrasing of the diagnostics as 
> > well on the preliminary patch [1].
> > 
> > On constant buffer sizes, the warnings looks like this:
> > warning: Allocated buffer size is not a multiple of the pointee's size 
> > [CWE-131] [-Wanalyzer-allocation-size]
> >    22 | int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } */
> >   | ^
> >   ‘test_2’: event 1
> >     |
> >     | 22 | int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 }
> > */
> >     | | ^
> >     | | |
> >     | | (1) Casting a 14 byte buffer to ‘int *’ leaves 2 trailing 
> > bytes; either the allocated size is bogus or the type on the left-hand 
> > side is wrong
> >     |
> 
> Something strange seems to have happened with the indentation in your
> email; the code in the patch seems to me to be strangely indented, and
> looking at the archive here:
>   https://gcc.gnu.org/pipermail/gcc/2022-June/238907.html
> I see the same thing, so I think it's a problem with what the mailing
> list received, rather than just in my mail client.  Maybe something 
> 
> FWIW I normally use "git send-email" to send patches.
> 
> The underlinings in the above look strange; I see this in your email:

I have resent the patch using git send-email as a reply to my original message. 
The new message looks properly formatted in the archive:
https://gcc.gnu.org/pipermail/gcc/2022-June/238911.html

> 
> warning: Allocated buffer size is not a multiple of the pointee's size 
> [CWE-131] [-Wanalyzer-allocation-size]
>22 | int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } */
>   | ^
>   ‘test_2’: event 1
> |
> | 22 | int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } 
> */
> | | ^
> | | |
> | | (1) Casting a 14 byte buffer to ‘int *’ leaves 2 trailing 
> bytes; either the allocated size is bogus or the type on the left-hand 
> side is wrong
> |
> 
> Should it have been (omitting the dg-line directives for clarity):
> 
> warning: Allocated buffer size is not a multiple of the pointee's size  
> [CWE-131] [-Wanalyzer-allocation-size]
>22 | int *ptr = malloc (10 + sizeof(int));
>   |^
>   ‘test_2’: event 1
> |
> | 22 | int *ptr = malloc (10 + sizeof(int));
> ||^
> |||
> ||(1) Casting a 14 byte buffer to ‘int *’ leaves 2 
> trailing bytes; either the allocated size is bogus or the type on the 
> left-hand side is wrong
> |
> 
> ?
> 
> It looks like something somewhere has collapsed repeated whitespace in
> the message down to single spaces, which has broken the ASCII art in
> your examples, and the indentation in your code.
> 
> 
> It would probably be helpful for the message to tell the user what
> sizeof(*ptr) is,  sizeof(int) in this case (much more helpful when it's
> a struct)
> 
> Maybe something alike:
> 
> note: a buffer of 14 bytes is allocated...
> note: ...but sizeof (int) is 4 bytes...
> note: ...leaving 2 trailing bytes for an array of 3 'int's (which would
> occupy 12 bytes)
> 
> or somesuch???
> 
> I'm brainstorming here, my ideas above aren't necessarily goo

Re: [RFC] analyzer: allocation size warning

2022-06-17 Thread Tim Lange




On Fr, Jun 17 2022 at 22:45:42 +0530, Prathamesh Kulkarni 
 wrote:

On Fri, 17 Jun 2022 at 21:25, Tim Lange  wrote:


 Hi everyone,

Hi Tim,
Thanks for posting the POC patch!
Just a couple of comments (inline)

Hi Prathamesh,
thanks for looking at it.


 tracked in PR105900 [0], I'd like to add support for a new warning 
on

 dubious allocation sizes. The new checker emits a warning when the
 allocation size is not a multiple of the type's size. With the 
checker,

 following mistakes are detected:
   int *arr = malloc(3); // forgot to multiply by sizeof
   arr[0] = ...;
   arr[1] = ...;
 or
   int *buf = malloc (n + sizeof(int)); // probably should be * 
instead

 of +
 Because it is implemented inside the analyzer, it also emits 
warnings
 when the buffer is first of type void* and later on casted to 
something

 else. Though, this also inherits a limitation. The checker can not
 distinguish 2 * sizeof(short) from sizeof(int) because sizeof is
 resolved and constants are folded at the point when the analyzer 
runs.
 As a mitigation, I plan to implement a check in the frontend that 
emits

 a warning if sizeof(lhs pointee type) is not part of the malloc
 argument.

IMHO, warning if sizeof(lhs pointee_type) is not present inside
malloc, might not be a good idea because it
would reject valid calls to malloc.
For eg:
(1)
size_t size = sizeof(int);
int *p = malloc (size);

(2)
void *p = malloc (sizeof(int));
int *q = p;
Hm, that's right. Maybe only warn when there is a sizeof(type) in the 
argument and the lhs pointee_type != type (except for void*, maybe 
char* and "inherited" structs)?


 I'm looking for a first feedback on the phrasing of the diagnostics 
as

 well on the preliminary patch [1].

 On constant buffer sizes, the warnings looks like this:
 warning: Allocated buffer size is not a multiple of the pointee's 
size

 [CWE-131] [-Wanalyzer-allocation-size]
22 | int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 
} */

   | ^
   ‘test_2’: event 1
 |
 | 22 | int *ptr = malloc (10 + sizeof(int)); /* { dg-line 
malloc2 }

 */
 | | ^
 | | |
 | | (1) Casting a 14 byte buffer to ‘int *’ leaves 2 
trailing
 bytes; either the allocated size is bogus or the type on the 
left-hand

 side is wrong
 |

 On symbolic buffer sizes:
 warning: Allocated buffer size is not a multiple of the pointee's 
size

 [CWE-131] [-Wanalyzer-allocation-size]
33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line malloc3 } 
*/

   | ^~~~
   ‘test_3’: event 1
 |
 | 33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line 
malloc3 }

 */
 | | ^~~~
 | | |
 | | (1) Allocation is incompatible with ‘int *’; either the
 allocated size is bogus or the type on the left-hand side is wrong
 |

Won't the warning be incorrect if 'n' is a multiple of sizeof(int) ?
I assume by symbolic buffer size, 'n' is not known at compile time.
* VLAs are resolved to n * sizeof(type) when the analyzer runs and work 
fine.
* Flows with if (cond) n = ...; else n = ...; are tracked by the 
analyzer with a widening_svalue and can be handled (While thinking 
about this answer, I noticed my patch is missing this case. Thanks!)
* In case of more complicated flows, the analyzer's buffer size 
tracking resorts to unknown_svalue. If any variable in an expression is 
unknown, no warning will be emitted.
* Generally, when requesting memory for a variable type, accepting an 
arbitrary number doesn't sound right. I do warn, e.g. if 'n' is a 
conjured_svalue (e.g. a from scanf call).


I think only the last case could in theory be a false-positive. I've 
noticed that this is the case when 'n' is guarded by an if making sure 
n is only a multiple of sizeof(type). In theory, I can fix this case 
too as the analysis is path-sensitive.
Do you know of some other case where 'n' might be an unknown value 
neither guarded an if condition nor resorted to 'unknown' by a 
complicated flow but still correct?


- Tim


Thanks,
Prathamesh


 And this is how a simple flow looks like:
 warning: Allocated buffer size is not a multiple of the pointee's 
size

 [CWE-131] [-Wanalyzer-allocation-size]
39 | int *iptr = (int *)ptr; /* { dg-line assign } */
   | ^~~~
   ‘test_4’: events 1-2
 |
 | 38 | void *ptr = malloc (n * sizeof (short)); /* { dg-message 
} */

 | | ^~~
 | | |
 | | (1) allocated here
 | 39 | int *iptr = (int *)ptr; /* { dg-line assign } */
 | | 
 | | |
 | | (2) ‘ptr’ is incompatible with ‘int *’; either the
 allocated size at (1) is bogus or the type on the left-hand side is
 wrong
 |

 There are some things to discuss from my side:
 * The tests with the "toy re-implementation of CPython's object
 model"[2] fail due to a extra warning emitted. Because the analyzer
 can't know the calculation actually results in a corre

[RFC] analyzer: add allocation size warning

2022-06-17 Thread Tim Lange
I think my mail client did apply auto-wrap and reduced multiple spaces to a 
single one while doing so. Here again, the full patch as well as the ASCII
diagnostics. This should look better now.

On constant size allocations:
/path/to/allocation-size-3.c:22:14: warning: Allocated buffer size is not a 
multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
   22 |   int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } */
  |  ^
  ‘test_2’: event 1
|
|   22 |   int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } */
|  |  ^
|  |  |
|  |  (1) Casting a 14 byte buffer to ‘int *’ leaves 2 
trailing bytes; either the allocated size is bogus or the type on the left-hand 
side is wrong
|

On symbolic buffer sizes:
/path/to/allocation-size-3.c:33:14: warning: Allocated buffer size is not a 
multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
   33 |   int *ptr = malloc (n + sizeof (int)); /* { dg-line malloc3 } */
  |  ^
  ‘test_3’: event 1
|
|   33 |   int *ptr = malloc (n + sizeof (int)); /* { dg-line malloc3 } */
|  |  ^
|  |  |
|  |  (1) Allocation is incompatible with ‘int *’; either 
the allocated size is bogus or the type on the left-hand side is wrong
|
   
And this is how a simple flow looks like:
warning: Allocated buffer size is not a multiple of the pointee's size 
[CWE-131] [-Wanalyzer-allocation-size]
/path/to/allocation-size-2.c:39:8: warning: Allocated buffer size is not a 
multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
   39 |   int *iptr = (int *)ptr; /* { dg-line assign } */
  |^~~~
  ‘test_4’: events 1-2
|
|   38 |   void *ptr = malloc (n * sizeof (short)); /* { dg-message } */
|  |   ^~~
|  |   |
|  |   (1) allocated here
|   39 |   int *iptr = (int *)ptr; /* { dg-line assign } */
|  |
|  ||
|  |(2) ‘ptr’ is incompatible with ‘int *’; either the 
allocated size at (1) is bogus or the type on the left-hand side is wrong
|

This patch adds an allocation size checker to the analyzer.
The checker warns when the tracked buffer size is not a multiple of the 
left-hand side pointee's type. This resolves PR analyzer/105900.

The patch is not yet fully tested.

gcc/analyzer/ChangeLog:

   * analyzer.opt: Add Wanalyzer-allocation-size.
   * sm-malloc.cc (class dubious_allocation_size): New pending_diagnostic 
subclass.
   (capacity_compatible_with_type): New.
   (const_operand_in_sval_p): New.
   (struct_or_union_with_inheritance_p): New.
   (check_capacity): New.
   (malloc_state_machine::on_stmt): Add calls to on_pointer_assignment.
   (malloc_state_machine::on_allocator_call): Add node to parameters and 
call to on_pointer_assignment.
   (malloc_state_machine::on_pointer_assignment): New.

gcc/testsuite/ChangeLog:

   * gcc.dg/analyzer/attr-malloc-6.c: Disabled Wanalyzer-allocation-size 
and added default implementation for FILE.
   * gcc.dg/analyzer/capacity-1.c: Added dg directives.
   * gcc.dg/analyzer/malloc-4.c: Disabled Wanalyzer-allocation-size.
   * gcc.dg/analyzer/pr96639.c: Disabled Wanalyzer-allocation-size and 
added default implementation for foo and bar.
   * gcc.dg/analyzer/allocation-size-1.c: New test.
   * gcc.dg/analyzer/allocation-size-2.c: New test.
   * gcc.dg/analyzer/allocation-size-3.c: New test.
   * gcc.dg/analyzer/allocation-size-4.c: New test.

Signed-off-by: Tim Lange 
---
 gcc/analyzer/analyzer.opt |   4 +
 gcc/analyzer/sm-malloc.cc | 363 +-
 .../gcc.dg/analyzer/allocation-size-1.c   |  54 +++
 .../gcc.dg/analyzer/allocation-size-2.c   |  44 +++
 .../gcc.dg/analyzer/allocation-size-3.c   |  48 +++
 .../gcc.dg/analyzer/allocation-size-4.c   |  39 ++
 gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c |   2 +
 gcc/testsuite/gcc.dg/analyzer/capacity-1.c|   5 +-
 gcc/testsuite/gcc.dg/analyzer/malloc-4.c  |   6 +-
 gcc/testsuite/gcc.dg/analyzer/pr96639.c   |   2 +
 10 files changed, 559 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c

diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
index 4aea52d3a87..f213989e0bb 100644
--- a/gcc/analyzer/analyzer.opt
+++ b/gcc/analyzer/analyzer.opt
@@ -78,6 +78,10 @@ Wanalyzer-malloc-leak
 Common Var

[RFC] analyzer: allocation size warning

2022-06-17 Thread Tim Lange
size): New 
pending_diagnostic subclass.

   (capacity_compatible_with_type): New.
   (const_operand_in_sval_p): New.
   (struct_or_union_with_inheritance_p): New.
   (check_capacity): New.
   (malloc_state_machine::on_stmt): Add calls to 
on_pointer_assignment.
   (malloc_state_machine::on_allocator_call): Add node to 
parameters and call to on_pointer_assignment.

   (malloc_state_machine::on_pointer_assignment): New.

gcc/testsuite/ChangeLog:

   * gcc.dg/analyzer/attr-malloc-6.c: Disabled 
Wanalyzer-allocation-size and added default implementation for FILE.

   * gcc.dg/analyzer/capacity-1.c: Added dg directives.
   * gcc.dg/analyzer/malloc-4.c: Disabled 
Wanalyzer-allocation-size.
   * gcc.dg/analyzer/pr96639.c: Disabled Wanalyzer-allocation-size 
and added default implementation for foo and bar.

   * gcc.dg/analyzer/allocation-size-1.c: New test.
   * gcc.dg/analyzer/allocation-size-2.c: New test.
   * gcc.dg/analyzer/allocation-size-3.c: New test.
   * gcc.dg/analyzer/allocation-size-4.c: New test.

Signed-off-by: Tim Lange 
---
gcc/analyzer/analyzer.opt | 4 +
gcc/analyzer/sm-malloc.cc | 363 +-
.../gcc.dg/analyzer/allocation-size-1.c | 54 +++
.../gcc.dg/analyzer/allocation-size-2.c | 44 +++
.../gcc.dg/analyzer/allocation-size-3.c | 48 +++
.../gcc.dg/analyzer/allocation-size-4.c | 39 ++
gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c | 2 +
gcc/testsuite/gcc.dg/analyzer/capacity-1.c | 5 +-
gcc/testsuite/gcc.dg/analyzer/malloc-4.c | 6 +-
gcc/testsuite/gcc.dg/analyzer/pr96639.c | 2 +
10 files changed, 559 insertions(+), 8 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c

diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
index 4aea52d3a87..f213989e0bb 100644
--- a/gcc/analyzer/analyzer.opt
+++ b/gcc/analyzer/analyzer.opt
@@ -78,6 +78,10 @@ Wanalyzer-malloc-leak
Common Var(warn_analyzer_malloc_leak) Init(1) Warning
Warn about code paths in which a heap-allocated pointer leaks.

+Wanalyzer-allocation-size
+Common Var(warn_analyzer_allocation_size) Init(1) Warning
+Warn about code paths in which a buffer is assigned to a incompatible 
type.

+
Wanalyzer-mismatching-deallocation
Common Var(warn_analyzer_mismatching_deallocation) Init(1) Warning
Warn about code paths in which the wrong deallocation function is 
called.

diff --git a/gcc/analyzer/sm-malloc.cc b/gcc/analyzer/sm-malloc.cc
index 3bd40425919..790c9f0e57d 100644
--- a/gcc/analyzer/sm-malloc.cc
+++ b/gcc/analyzer/sm-malloc.cc
@@ -46,6 +46,8 @@ along with GCC; see the file COPYING3. If not see
#include "attribs.h"
#include "analyzer/function-set.h"
#include "analyzer/program-state.h"
+#include "print-tree.h"
+#include "gimple-pretty-print.h"

#if ENABLE_ANALYZER

@@ -428,6 +430,7 @@ private:
  get_or_create_deallocator (tree deallocator_fndecl);

  void on_allocator_call (sm_context *sm_ctxt,
+ const supernode *node,
 const gcall *call,
 const deallocator_set *deallocators,
 bool returns_nonnull = false) const;
@@ -444,6 +447,16 @@ private:
  void on_realloc_call (sm_context *sm_ctxt,
   const supernode *node,
   const gcall *call) const;
+ void on_pointer_assignment(sm_context *sm_ctxt,
+ const supernode *node,
+ const gassign *assign_stmt,
+ tree lhs,
+ tree rhs) const;
+ void on_pointer_assignment(sm_context *sm_ctxt,
+ const supernode *node,
+ const gcall *call,
+ tree lhs,
+ tree rhs) const;
  void on_zero_assignment (sm_context *sm_ctxt,
  const gimple *stmt,
  tree lhs) const;
@@ -1432,6 +1445,117 @@ private:
  const char *m_funcname;
};

+/* Concrete subclass for casts of pointers that lead to trailing 
bytes. */

+
+class dubious_allocation_size : public malloc_diagnostic
+{
+public:
+ dubious_allocation_size (const malloc_state_machine , tree lhs, 
tree rhs,

+ tree size_tree, unsigned HOST_WIDE_INT size_diff)
+ : malloc_diagnostic(sm, rhs), 
m_type(dubious_allocation_type::CONSTANT_SIZE),

+ m_lhs(lhs), m_size_tree(size_tree), m_size_diff(size_diff)
+ {}
+
+ dubious_allocation_size (const malloc_state_machine , tree lhs, 
tree rhs,

+ tree size_tree)
+ : malloc_diagnostic(sm, rhs), 
m_type(dubious_allocation_type::MISSING_OPERAND),

+ m_lhs(lhs), m_size_tree(size_tree), m_size_diff(0)
+ {}
+
+ const char *get_kind () const final override
+ {
+ return "dubious_allocation_size";
+ }
+
+ int get_controlling_option () const final override
+ {
+ return OPT_Wanalyzer_allocation_size;
+ }
+
+ bool subclass_equal_p (const pending_diagnostic _other) const
+ final override
+ {
+ const dubious_allocation_size  = (const dubious_allocation_size 
&)base_other;

+ return malloc_diagnostic::subclass_equal_p(other)
+ && m_type == other.m_type
+ &a

Re: fanalyzer: debugging zero state machine

2022-06-12 Thread Tim Lange




On Do, Jun 9 2022 at 13:40:06 -0400, David Malcolm 
 wrote:

On Thu, 2022-06-09 at 16:49 +0200, Tim Lange wrote:


  > On Mi, Jun 8 2022 at 11:12:52 -0400, David Malcolm
  wrote:
  > > On Wed, 2022-06-08 at 01:42 +0200, Tim Lange wrote:
  > >
  > > Hi Dave,


Hi Tim; various responses inline below...


  > >
  > > I did spent some time to think about the zero state machine. I
 first
  > > thought about distinguishing between "assigned zero" and "EQ 0
  > > condition on the path" for cases where e.g. unreachable() is
 used
 to
  > > say that some variable will never be zero according to the
 programmer.
  > > In that case the dev might not want zero warnings from
 conditions
  > > outside the if body itself for dev build where unreachable
 expands
 to
  > > some panic exit. But as the condition constraint is not pruned
 when the
  > > state machine is distinguishing the states, I'm not sure how to
 know
  > > whether the analysis already left the if body?
  >
  > The analyzer works on the gimple-ssa representation, which uses
 basic
  > blocks in a CFG, rather than an AST, so the only remants we have
 of
  > scope is in "clobber" statements (a special kind of assignment
 stmt),
  > which the gimplify adds as variables go out of scope.
 If the constraints only lived until the immediate dominator of `if
 (cond)`, I could easily distinguish:
 1. if (x == 0) && still inside the if => zero
 2. if (x == 0) && outside if => maybe zero
 but as this seems to be not the case, if I want to distinguish 1. &
 2.,
 I'd have to find another way.
  >
  > For pruning, the analyzer's state_machine class has a
 "can_purge_p"
  > virtual function:
  >
  > /* Return true if it safe to discard the given state (to help
  > when simplifying state objects).
  > States that need leak detection should return false. */
  > virtual bool can_purge_p (state_t s) const = 0;
  >
  > which should return true for a "zeroness" state machine, in that
 we
  > always consider pruning states for svalues that aren't needed
 anymore
  > along a path.
 Is implemented and returns true.
  >
  > Is there some other kind of state explosion you're running into?
 It's
  > hard to figure this out further without seeing code.
 No, my code is by far not that mature to be tested. I just had in my
 head that I wanted to find out if I can distinguish the two cases.
  >
  >
  > > Also, while trying out different things, it seems simple
 assignments on
  > > phi like here
  > > int x;
  > > if (argc == 1) {
  > > x = 1; // x_5
  > > } else {
  > > x = 0; // x_4
  > > }
  > > printf("%i", 5 / x); // x_2
  > > automatically work such that x_2 already inherits the state 
from

  > > x_4/x_5 without me doing anything inside my sm's on_phi
 function.
 Same
  > > for the simple y = 0; x = y; case. Where does this happen 
inside

 the
  > > code?
  >
  > With the caveat that I'm seeing your code, what's probably
 happening
 is
  > that we have either:
  >
  > BB (a):
  > x_5 = 1;
  > goto BB (c);
  >
  > BB (b):
  > x_4 = 0;
  > goto BB (c);
  >
  > BB (c):
  > x_2 = PHI (x_5 from (a), x_4 from (b));
 I compiled it with -g, so this one is like the dumped gimple.
  >
  > or (at higher optimization levels):
  >
  > BB (a):
  > goto BB (c);
  >
  > BB (b):
  > goto BB (c);
  >
  > BB (c):
  > x_2 = PHI (1 from (a), 0 from (b));
  >
  > and I think that at the phi node we have 
region_model::handle_phi,
  > which is setting x_2 to either the constant 1 or the constant 0 
in

 the
  > store, and is calling the on_phi vfunc, leading to on_phi being
 called
  > for all state machines.
 Thanks, that is the case. The set_value inside handle_phi seems to
 this
 for me.
  >
  > BTW, are you implementing an override for this vfunc:
  > virtual state_machine::state_t get_default_state (const svalue *)
  > const;
  >
  > to capture the inherently known zeroness/nonzeroness of
 constant_svalue
  > instances? That would make those constants have that state.
 Yeah, I saw that on your nullness check. I tried it on a small
 example
 with and without, but didn't noticed a difference in warnings 
(except

 for not having zero(x_4) inside the supergraph.dot). So if I
 understood
 this right, this is just to have one state less for that
 variable/value[0]?


The states are stored in sm_state_map using a mapping from svalue to
state.

Given e.g. a parameter, this will be "initial_svalue (parm)", but in
the above case where x_4 either has value 1 or has value 0, it's not
storing a state for x_4; it's storing a state for 0 or a state for 1.
So without implementing th

fanalyzer: debugging zero state machine

2022-06-09 Thread Tim Lange



> On Mi, Jun 8 2022 at 11:12:52 -0400, David Malcolm 
 wrote:

> > On Wed, 2022-06-08 at 01:42 +0200, Tim Lange wrote:
> >
> > Hi Dave,
> >
> > I did spent some time to think about the zero state machine. I 
first

> > thought about distinguishing between "assigned zero" and "EQ 0
> > condition on the path" for cases where e.g. unreachable() is used 
to
> > say that some variable will never be zero according to the 
programmer.

> > In that case the dev might not want zero warnings from conditions
> > outside the if body itself for dev build where unreachable expands 
to
> > some panic exit. But as the condition constraint is not pruned 
when the
> > state machine is distinguishing the states, I'm not sure how to 
know

> > whether the analysis already left the if body?
>
> The analyzer works on the gimple-ssa representation, which uses basic
> blocks in a CFG, rather than an AST, so the only remants we have of
> scope is in "clobber" statements (a special kind of assignment stmt),
> which the gimplify adds as variables go out of scope.
If the constraints only lived until the immediate dominator of `if 
(cond)`, I could easily distinguish:

1. if (x == 0) && still inside the if => zero
2. if (x == 0) && outside if => maybe zero
but as this seems to be not the case, if I want to distinguish 1. & 2., 
I'd have to find another way.

>
> For pruning, the analyzer's state_machine class has a "can_purge_p"
> virtual function:
>
> /* Return true if it safe to discard the given state (to help
> when simplifying state objects).
> States that need leak detection should return false. */
> virtual bool can_purge_p (state_t s) const = 0;
>
> which should return true for a "zeroness" state machine, in that we
> always consider pruning states for svalues that aren't needed anymore
> along a path.
Is implemented and returns true.
>
> Is there some other kind of state explosion you're running into? It's
> hard to figure this out further without seeing code.
No, my code is by far not that mature to be tested. I just had in my 
head that I wanted to find out if I can distinguish the two cases.

>
>
> > Also, while trying out different things, it seems simple 
assignments on

> > phi like here
> > int x;
> > if (argc == 1) {
> > x = 1; // x_5
> > } else {
> > x = 0; // x_4
> > }
> > printf("%i", 5 / x); // x_2
> > automatically work such that x_2 already inherits the state from
> > x_4/x_5 without me doing anything inside my sm's on_phi function. 
Same
> > for the simple y = 0; x = y; case. Where does this happen inside 
the

> > code?
>
> With the caveat that I'm seeing your code, what's probably happening 
is

> that we have either:
>
> BB (a):
> x_5 = 1;
> goto BB (c);
>
> BB (b):
> x_4 = 0;
> goto BB (c);
>
> BB (c):
> x_2 = PHI (x_5 from (a), x_4 from (b));
I compiled it with -g, so this one is like the dumped gimple.
>
> or (at higher optimization levels):
>
> BB (a):
> goto BB (c);
>
> BB (b):
> goto BB (c);
>
> BB (c):
> x_2 = PHI (1 from (a), 0 from (b));
>
> and I think that at the phi node we have region_model::handle_phi,
> which is setting x_2 to either the constant 1 or the constant 0 in 
the
> store, and is calling the on_phi vfunc, leading to on_phi being 
called

> for all state machines.
Thanks, that is the case. The set_value inside handle_phi seems to this 
for me.

>
> BTW, are you implementing an override for this vfunc:
> virtual state_machine::state_t get_default_state (const svalue *)
> const;
>
> to capture the inherently known zeroness/nonzeroness of 
constant_svalue

> instances? That would make those constants have that state.
Yeah, I saw that on your nullness check. I tried it on a small example 
with and without, but didn't noticed a difference in warnings (except 
for not having zero(x_4) inside the supergraph.dot). So if I understood 
this right, this is just to have one state less for that 
variable/value[0]?


If that is right, is it also favorable to "merge" the stop state and 
non_zero state inside the zero state machine because - for now - there 
is no warning planned on non-zero values?


[0] less states are favorable because then the analyzer maybe has less 
different enodes to visit and thus less runtime(?)

>
> Thanks
> Dave
>
> > - Tim

Also another question unrelated to the ones before. I do have a weird 
bug in my zero sm[1] but I'm unsure where my sm is flawed. Take for 
example, the probably simplest case:

 int x = 0;h
 printf("%d", 42 / x);
If I use inform to emit a notice for my state machine result, it seems 
to be corre

GSoC

2022-06-09 Thread Tim Lange

Hi everyone,

my name is Tim and I'm also working on the static analyzer this summer. 
Some of you might already noticed my nooby questions in the IRC ;).
Specifically, I'll be working on extending the analyzer with several 
smaller warnings that the clang analyzer already has. David created a 
meta-bug[0] with the results of the discussion between him and me about 
the gap and what seems to be useful.


I won't do all of those but rather look how many of them I'm able to 
get done until September. I will begin with a Cast Size warning. This 
emits a warning when the tracked allocation size is not a multiple of 
the pointee's size, e.g., when casting malloc(10) to int*.


Furthermore, in preparation for the official coding phase, I played 
around a bit with a state machine that tracks whether an int is zero or 
not. So this is probably my next candidate after cast size.


- Tim

[0] 
https://gcc.gnu.org/bugzilla/showdependencytree.cgi?id=105887_resolved=1





GSoC: Extending the Static Analysis Pass

2022-04-03 Thread Tim Lange

Hi everyone,
Hi David,

I'm interested in extending the static analysis pass as a GSoC project. 
Short introduction of me: I'm Tim, currently doing my master in 
computer science with focus on IT security at TU Darmstadt. I already 
worked with IFDS as part of my bachelor thesis and took both program 
analysis courses in my masters.


Specifically, I thought about extending the analyzer to check new 
things i.e. the POSIX file-descriptor project. I would prefer a 
medium-sized project, do you think this is doable?


Also, I've read a bit through the internal documentation and got a 
question. The documentation mostly mentions the Reps paper as the 
source for the exploded supergraph. But the paper suggests more such as 
having extensional summaries that lead to the same context sensitivity 
as unbounded call-strings. In contrast, the documentation also talks 
about being call-strings limited and searching for complex-enough 
methods to be worth summarizing.
Do you only use the exploded supergraph idea but not the rest? 
Otherwise why do you use call-strings and search for methods worth 
summarizing?


Best regards
Tim