from:"Siddhesh Poyarekar"





On 2024-02-16 14:47, Qing Zhao wrote:

gcc/c-family/ChangeLog:

* c-ubsan.cc (get_bound_from_access_with_size): New function.
(ubsan_instrument_bounds): Handle call to .ACCESS_WITH_SIZE.

gcc/testsuite/ChangeLog:

* gcc.dg/ubsan/flex-array-counted-by-bounds-2.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds-3.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds.c: New test.
---
  gcc/c-family/c-ubsan.cc   | 42 +
  .../ubsan/flex-array-counted-by-bounds-2.c| 45 ++
  .../ubsan/flex-array-counted-by-bounds-3.c| 34 ++
  .../ubsan/flex-array-counted-by-bounds.c  | 46 +++
  4 files changed, 167 insertions(+)
  create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
  create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-3.c
  create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c

diff --git a/gcc/c-family/c-ubsan.cc b/gcc/c-family/c-ubsan.cc
index 940982819ddf..164b29845b3a 100644
--- a/gcc/c-family/c-ubsan.cc
+++ b/gcc/c-family/c-ubsan.cc
@@ -376,6 +376,40 @@ ubsan_instrument_return (location_t loc)
return build_call_expr_loc (loc, t, 1, build_fold_addr_expr_loc (loc, 
data));
  }
  
+/* Get the tree that represented the number of counted_by, i.e, the maximum

+   number of the elements of the object that the call to .ACCESS_WITH_SIZE
+   points to, this number will be the bound of the corresponding array.  */
+static tree
+get_bound_from_access_with_size (tree call)
+{
+  if (!is_access_with_size_p (call))
+return NULL_TREE;
+
+  tree ref_to_size = CALL_EXPR_ARG (call, 1);
+  unsigned int type_of_size = TREE_INT_CST_LOW (CALL_EXPR_ARG (call, 2));


Again for consistency, this should probably be class_of_size.


+  tree type = TREE_TYPE (CALL_EXPR_ARG (call, 3));
+  tree size = fold_build2 (MEM_REF, type, unshare_expr (ref_to_size),
+  build_int_cst (ptr_type_node, 0));
+  /* If size is negative value, treat it as zero.  */
+  if (!TYPE_UNSIGNED (type))
+  {
+tree cond = fold_build2 (LT_EXPR, boolean_type_node,
+unshare_expr (size), build_zero_cst (type));
+size = fold_build3 (COND_EXPR, type, cond,
+   build_zero_cst (type), size);
+  }
+
+  /* Only when type_of_size is 1,i.e, the number of the elements of
+ the object type, return the size.  */
+  if (type_of_size != 1)
+return NULL_TREE;
+  else
+size = fold_convert (sizetype, size);
+
+  return size;
+}
+
+
  /* Instrument array bounds for ARRAY_REFs.  We create special builtin,
 that gets expanded in the sanopt pass, and make an array dimension
 of it.  ARRAY is the array, *INDEX is an index to the array.
@@ -401,6 +435,14 @@ ubsan_instrument_bounds (location_t loc, tree array, tree 
*index,
  && COMPLETE_TYPE_P (type)
  && integer_zerop (TYPE_SIZE (type)))
bound = build_int_cst (TREE_TYPE (TYPE_MIN_VALUE (domain)), -1);
+  else if (INDIRECT_REF_P (array)
+  && is_access_with_size_p ((TREE_OPERAND (array, 0
+   {
+ bound = get_bound_from_access_with_size ((TREE_OPERAND (array, 0)));
+ bound = fold_build2 (MINUS_EXPR, TREE_TYPE (bound),
+  bound,
+  build_int_cst (TREE_TYPE (bound), 1));
+   }


This will wrap if bound == 0, maybe that needs to be special-cased.  And 
maybe also add a test for it below.



else
return NULL_TREE;
  }
diff --git a/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c 
b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
new file mode 100644
index ..148934975ee5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
@@ -0,0 +1,45 @@
+/* test the attribute counted_by and its usage in
+   bounds sanitizer combined with VLA.  */
+/* { dg-do run } */
+/* { dg-options "-fsanitize=bounds" } */
+/* { dg-output "index 11 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 20 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 11 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 10 out of bounds for type 'int 
\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+
+
+#include 
+
+void __attribute__((__noinline__)) setup_and_test_vla (int n, int m)
+{
+   struct foo {
+   int n;
+   int p[][n] __attribute__((counted_by(n)));
+   } *f;
+
+   f = (struct foo *) malloc (sizeof(struct foo) + m*sizeof(int[n]));
+   f->n = m;
+   f->p[m][n-1]=1;
+   return;
+}
+
+void __attribute__((__noinline__)) setup_and_test_vla_1 (int n1, int n2, int m)
+{
+  struct foo {
+int n;
+int p[][n2][n1] __attribute__((counted_by(n)));
+  } *f;
+
+  f

Re: [PATCH v6 3/5] Use the .ACCESS_WITH_SIZE in builtin object size.





On 2024-02-16 14:47, Qing Zhao wrote:

gcc/ChangeLog:

* tree-object-size.cc (access_with_size_object_size): New function.
(call_object_size): Call the new function.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-object-size-common.h: Add a new macro EXPECT.
* gcc.dg/flex-array-counted-by-3.c: New test.
* gcc.dg/flex-array-counted-by-4.c: New test.
* gcc.dg/flex-array-counted-by-5.c: New test.
---
  .../gcc.dg/builtin-object-size-common.h   |  11 ++
  .../gcc.dg/flex-array-counted-by-3.c  |  63 +++
  .../gcc.dg/flex-array-counted-by-4.c  | 178 ++
  .../gcc.dg/flex-array-counted-by-5.c  |  48 +
  gcc/tree-object-size.cc   |  59 ++
  5 files changed, 359 insertions(+)
  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-5.c

diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-common.h 
b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
index 66ff7cdd953a..b677067c6e6b 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-common.h
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
@@ -30,3 +30,14 @@ unsigned nfails = 0;
__builtin_abort ();   \
  return 0;   \
} while (0)
+
+#define EXPECT(p, _v) do {   \
+  size_t v = _v; \
+  if (p == v)\
+__builtin_printf ("ok:  %s == %zd\n", #p, p);  \
+  else   \
+{\
+  __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v);  
\
+  FAIL ();   \
+}\
+} while (0);
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
new file mode 100644
index ..0066c32ca808
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
@@ -0,0 +1,63 @@
+/* test the attribute counted_by and its usage in
+ * __builtin_dynamic_object_size.  */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "builtin-object-size-common.h"
+
+struct flex {
+  int b;
+  int c[];
+} *array_flex;
+
+struct annotated {
+  int b;
+  int c[] __attribute__ ((counted_by (b)));
+} *array_annotated;
+
+struct nested_annotated {
+  struct {
+union {
+  int b;
+  float f; 
+};
+int n;
+  };
+  int c[] __attribute__ ((counted_by (b)));
+} *array_nested_annotated;
+
+void __attribute__((__noinline__)) setup (int normal_count, int attr_count)
+{
+  array_flex
+= (struct flex *)malloc (sizeof (struct flex)
++ normal_count *  sizeof (int));
+  array_flex->b = normal_count;
+
+  array_annotated
+= (struct annotated *)malloc (sizeof (struct annotated)
+ + attr_count *  sizeof (int));
+  array_annotated->b = attr_count;
+
+  array_nested_annotated
+= (struct nested_annotated *)malloc (sizeof (struct nested_annotated)
++ attr_count *  sizeof (int));
+  array_nested_annotated->b = attr_count;
+
+  return;
+}
+
+void __attribute__((__noinline__)) test ()
+{
+EXPECT(__builtin_dynamic_object_size(array_flex->c, 1), -1);
+EXPECT(__builtin_dynamic_object_size(array_annotated->c, 1),
+  array_annotated->b * sizeof (int));
+EXPECT(__builtin_dynamic_object_size(array_nested_annotated->c, 1),
+  array_nested_annotated->b * sizeof (int));
+}
+
+int main(int argc, char *argv[])
+{
+  setup (10,10);
+  test ();
+  DONE ();
+}
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
new file mode 100644
index ..3ce7f3545549
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
@@ -0,0 +1,178 @@
+/* test the attribute counted_by and its usage in
+__builtin_dynamic_object_size: what's the correct behavior when the
+allocation size mismatched with the value of counted_by attribute?
+we should always use the latest value that is hold by the counted_by
+field.  */
+/* { dg-do run } */
+/* { dg-options "-O -fstrict-flex-arrays=3" } */
+
+#include "builtin-object-size-common.h"
+
+struct annotated {
+  size_t foo;
+  char others;
+  char array[] __attribute__((counted_by (foo)));
+};
+
+#define noinline __attribute__((__noinline__))
+#define SIZE_BUMP 10
+#define MAX(a, b) ((a) > (b) ? (a) : (b))
+
+/* In general, Due to type casting, the

Re: [PATCH v6 2/5] Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.





On 2024-02-16 14:47, Qing Zhao wrote:

Including the following changes:
* The definition of the new internal function .ACCESS_WITH_SIZE
   in internal-fn.def.
* C FE converts every reference to a FAM with a "counted_by" attribute
   to a call to the internal function .ACCESS_WITH_SIZE.
   (build_component_ref in c_typeck.cc)

   This includes the case when the object is statically allocated and
   initialized.
   In order to make this working, the routines initializer_constant_valid_p_1
   and output_constant in varasm.cc are updated to handle calls to
   .ACCESS_WITH_SIZE.
   (initializer_constant_valid_p_1 and output_constant in varasm.c)

   However, for the reference inside "offsetof", the "counted_by" attribute is
   ignored since it's not useful at all.
   (c_parser_postfix_expression in c/c-parser.cc)

   In addtion to "offsetof", for the reference inside operator "typeof" and
   "alignof", we ignore counted_by attribute too.

   When building ADDR_EXPR for the .ACCESS_WITH_SIZE in C FE,
   replace the call with its first argument.

* Convert every call to .ACCESS_WITH_SIZE to its first argument.
   (expand_ACCESS_WITH_SIZE in internal-fn.cc)
* Adjust alias analysis to exclude the new internal from clobbering anything.
   (ref_maybe_used_by_call_p_1 and call_may_clobber_ref_p_1 in 
tree-ssa-alias.cc)
* Adjust dead code elimination to eliminate the call to .ACCESS_WITH_SIZE when
   it's LHS is eliminated as dead code.
   (eliminate_unnecessary_stmts in tree-ssa-dce.cc)
* Provide the utility routines to check the call is .ACCESS_WITH_SIZE and
   get the reference from the call to .ACCESS_WITH_SIZE.
   (is_access_with_size_p and get_ref_from_access_with_size in tree.cc)

gcc/c/ChangeLog:

* c-parser.cc (c_parser_postfix_expression): Ignore the counted-by
attribute when build_component_ref inside offsetof operator.
* c-tree.h (build_component_ref): Add one more parameter.
* c-typeck.cc (build_counted_by_ref): New function.
(build_access_with_size_for_counted_by): New function.
(build_component_ref): Check the counted-by attribute and build
call to .ACCESS_WITH_SIZE.
(build_unary_op): When building ADDR_EXPR for
 .ACCESS_WITH_SIZE, use its first argument.
 (lvalue_p): Accept call to .ACCESS_WITH_SIZE.

gcc/ChangeLog:

* internal-fn.cc (expand_ACCESS_WITH_SIZE): New function.
* internal-fn.def (ACCESS_WITH_SIZE): New internal function.
* tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Special case
IFN_ACCESS_WITH_SIZE.
(call_may_clobber_ref_p_1): Special case IFN_ACCESS_WITH_SIZE.
* tree-ssa-dce.cc (eliminate_unnecessary_stmts): Eliminate the call
to .ACCESS_WITH_SIZE when its LHS is dead.
* tree.cc (process_call_operands): Adjust side effect for function
.ACCESS_WITH_SIZE.
(is_access_with_size_p): New function.
(get_ref_from_access_with_size): New function.
* tree.h (is_access_with_size_p): New prototype.
(get_ref_from_access_with_size): New prototype.
* varasm.cc (initializer_constant_valid_p_1): Handle call to
.ACCESS_WITH_SIZE.
(output_constant): Handle call to .ACCESS_WITH_SIZE.

gcc/testsuite/ChangeLog:

* gcc.dg/flex-array-counted-by-2.c: New test.
---
  gcc/c/c-parser.cc |  10 +-
  gcc/c/c-tree.h|   2 +-
  gcc/c/c-typeck.cc | 128 +-
  gcc/internal-fn.cc|  36 +
  gcc/internal-fn.def   |   4 +
  .../gcc.dg/flex-array-counted-by-2.c  | 112 +++
  gcc/tree-ssa-alias.cc |   2 +
  gcc/tree-ssa-dce.cc   |   5 +-
  gcc/tree.cc   |  25 +++-
  gcc/tree.h|   8 ++
  gcc/varasm.cc |  10 ++
  11 files changed, 332 insertions(+), 10 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index c31349dae2ff..a6ed5ac43bb1 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -10850,9 +10850,12 @@ c_parser_postfix_expression (c_parser *parser)
if (c_parser_next_token_is (parser, CPP_NAME))
  {
c_token *comp_tok = c_parser_peek_token (parser);
+   /* Ignore the counted_by attribute for reference inside
+  offsetof since the information is not useful at all.  */
offsetof_ref
  = build_component_ref (loc, offsetof_ref, comp_tok->value,
-comp_tok->location, UNKNOWN_LOCATION);
+comp_tok->location, UNKNOWN_LOCATION,
+false);
c_parser_consume_token (parser);

Re: [PATCH v6 1/5] Provide counted_by attribute to flexible array member field (PR108896)


On 2024-02-16 14:47, Qing Zhao wrote:

'counted_by (COUNT)'
  The 'counted_by' attribute may be attached to the C99 flexible
  array member of a structure.  It indicates that the number of the
  elements of the array is given by the field named "COUNT" in the
  same structure as the flexible array member.  GCC uses this
  information to improve the results of the array bound sanitizer and
  the '__builtin_dynamic_object_size'.

  For instance, the following code:

   struct P {
 size_t count;
 char other;
 char array[] __attribute__ ((counted_by (count)));
   } *p;

  specifies that the 'array' is a flexible array member whose number
  of elements is given by the field 'count' in the same structure.

  The field that represents the number of the elements should have an
  integer type.  Otherwise, the compiler will report a warning and
  ignore the attribute.

  When the field that represents the number of the elements is assigned a
  negative integer value, the compiler will treat the value as zero.

  An explicit 'counted_by' annotation defines a relationship between
  two objects, 'p->array' and 'p->count', and there are the following
  requirementthat on the relationship between this pair:

 * 'p->count' should be initialized before the first reference to
   'p->array';

 * 'p->array' has _at least_ 'p->count' number of elements
   available all the time.  This relationship must hold even
   after any of these related objects are updated during the
   program.

  It's the user's responsibility to make sure the above requirements
  to be kept all the time.  Otherwise the compiler will report
  warnings, at the same time, the results of the array bound
  sanitizer and the '__builtin_dynamic_object_size' is undefined.

  One important feature of the attribute is, a reference to the
  flexible array member field will use the latest value assigned to
  the field that represents the number of the elements before that
  reference.  For example,

 p->count = val1;
 p->array[20] = 0;  // ref1 to p->array
 p->count = val2;
 p->array[30] = 0;  // ref2 to p->array

  in the above, 'ref1' will use 'val1' as the number of the elements
  in 'p->array', and 'ref2' will use 'val2' as the number of elements
  in 'p->array'.


I can't approve of course, but here's a review of the code that should 
hopefully make it easier for the C frontend maintainers.




gcc/c-family/ChangeLog:

PR C/108896
* c-attribs.cc (handle_counted_by_attribute): New function.
(attribute_takes_identifier_p): Add counted_by attribute to the list.
* c-common.cc (c_flexible_array_member_type_p): ...To this.
* c-common.h (c_flexible_array_member_type_p): New prototype.

gcc/c/ChangeLog:

PR C/108896
* c-decl.cc (flexible_array_member_type_p): Renamed and moved to...
(add_flexible_array_elts_to_size): Use renamed function.
(is_flexible_array_member_p): Use renamed function.
(verify_counted_by_attribute): New function.
(finish_struct): Use renamed function and verify counted_by
attribute.
* c-tree.h (lookup_field): New prototype.
* c-typeck.cc (lookup_field): Expose as extern function.

gcc/ChangeLog:

PR C/108896
* doc/extend.texi: Document attribute counted_by.

gcc/testsuite/ChangeLog:

PR C/108896
* gcc.dg/flex-array-counted-by.c: New test.
---
  gcc/c-family/c-attribs.cc| 54 -
  gcc/c-family/c-common.cc | 13 +++
  gcc/c-family/c-common.h  |  1 +
  gcc/c/c-decl.cc  | 85 
  gcc/c/c-tree.h   |  1 +
  gcc/c/c-typeck.cc|  3 +-
  gcc/doc/extend.texi  | 64 +++
  gcc/testsuite/gcc.dg/flex-array-counted-by.c | 40 +
  8 files changed, 241 insertions(+), 20 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by.c

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 40a0cf90295d..4395c0656b14 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -105,6 +105,8 @@ static tree handle_warn_if_not_aligned_attribute (tree *, 
tree, tree,
  int, bool *);
  static tree handle_strict_flex_array_attribute (tree *, tree, tree,
 int, bool *);
+static tree handle_counted_by_attribute (tree *, tree, tree,
+  int, bool *);
  static tree handle_weak_attribute (tree *, tree, tree, int, bool *) ;
  static tree handle_noplt_attribute (tree *, tree, tree, int, bool *) ;

Re: [RFC] GCC Security policy

2024-02-13 Thread Siddhesh Poyarekar


On 2024-02-12 10:00, Richard Biener wrote:

GCC driver support is then to the extent identifying the inputs and the outputs.


We already have -MM to generate a list of non-system dependencies, so 
gcc should be able to pass on inputs to the tool, which could then map 
those files (and the system headers directory) into the sandbox before 
invocation.  The output file could perhaps be enforced as having to be a 
new one, i.e. fail if the target is already found.



I'm not sure a generic utility can achieve this unless the outputs need to be
retrieved from somewhere else (not "usual" place when invoking un-sandboxed).

Even the driver doesn't necessarily know all files read/written.

So I suppose better defining of the actual goal is in order.


gcc -sandboxed -O2 -c t.ii -fdump-tree-all


what should this do?  IMO invoked tools (gas, cc1plus) should be restricted
to access the input files.  Ideally the dumps would appear where they
appear when not sandboxed but clearly overwriting existing files would be
problematic, writing new files not so much, but only to the standard (or
specified) auxiliary output file paths.


Couldn't we get away with not having to handle dump files?  They don't 
seem to be sensitive targets.


Thanks,
Sid

Re: [RFC] GCC Security policy

2024-02-12 Thread Siddhesh Poyarekar


On 2024-02-12 08:16, Martin Jambor wrote:

This probably ties in somewhat with an idea David Malcolm had riffed on
with me earlier, of caching files for diagnostics.  If we could unify
file accesses somehow, we could make this happen, i.e. open/read files
as root and then do all execution as non-root.

Sandboxing will have similar requirements, i.e. map in input files and
an output file handle upfront and then unshare() into a sandbox to do
the actual compilation.  This will make sure that at least the
processing of inputs does not affect the system on which the compilation
is being run.


Right.  As we often just download some (sometimes large) pre-processed
source from Bugzilla and then happily run GCC on it on our computers,
this feature might be actually useful for us (still, we'd probably need
a more concrete description of what we want, would e.g. using "-wrapper
gdb,--args" work in such a sandbox?).  I agree that for some even
semi-complex builds, a more general sandboxing solution is probably
better.


Joseph seems to be leaning towards nudging people to a general 
sandboxing solution too.  The question then is whether this takes the 
shape of a utility in, e.g. contrib that builds such a sandbox or simply 
a wiki page.


Thanks,
Sid

Re: [RFC] GCC Security policy

2024-02-12 Thread Siddhesh Poyarekar


On 2024-02-09 15:06, Joseph Myers wrote:

Ideally dependencies would be properly set up so that everything is built
in the original build, and ideally there would be no need to relink at
install time (I'm not sure of the exact circumstances in which it might be
needed, or on what OSes to e.g. encode the right library paths in final
installed executables).  In practice I think it's common for some building
to take place at install time.

There is a more general principle here of composability: it's not helpful
for being able to write scripts or makefiles combining invocations of
different utilities and have them behave predictably if some of those
utilities start making judgements about whether it's a good idea to run
them in a particular environment rather than just doing their job
independent of irrelevant aspects of the environment.  The semantics of
invoking "gcc" have nothing to do with whether it's run as root; it should
never need to look up what user it's running as at all.  (And it's
probably also a bad idea for lots of separate utilities to gain their own
ways to run in a restricted environment, for similar reasons; rather than
teaching "gcc" a way to create a restricted environment itself, ensure
there are easy-to-use more general utilities for running arbitrary
programs on untrusted input in a contained environment.)


I see your point.  The way you put it, there's no GCC project here at 
all then.


Sid

Re: [RFC] GCC Security policy

2024-02-09 Thread Siddhesh Poyarekar


On 2024-02-09 12:14, Joseph Myers wrote:

On Fri, 9 Feb 2024, Siddhesh Poyarekar wrote:


For privilege management we could add a --allow-root driver flag that allows
gcc to run as root.  Without the flag one could either outright refuse to run
or drop privileges and run.  Dropping privileges will be a bit tricky to
implement because it would need a user to drop privileges to and then there
would be the question of how to manage file access to read the compiler input
and write out the compiler output.  If there's no such user, gcc could refuse
to run as root by default.  I wonder though if from a security posture
perspective it makes sense to simply discourage running as root all the time
and not bother trying to make it work with dropped privileges and all that.
Of course it would mean that this would be less of a "project"; it'll be a
simple enough patch to refuse to run until --allow-root is specified.


I think disallowing running as root would be a big problem in practice -
the typical problem case is when people build software as non-root and run
"make install" as root, and for some reason "make install" wants to
(re)build or (re)link something.


Isn't that a problematic practice though?  Or maybe have those 
invocations be separated out as CC_ROOT?


Thanks,
Sid

Re: [RFC] GCC Security policy

2024-02-09 Thread Siddhesh Poyarekar


On 2024-02-09 10:38, Martin Jambor wrote:

If anyone is interested in scoping this and then mentoring this as a
Google Summer of Code project this year then now is the right time to
speak up!


I can help with mentoring and reviews, although I'll need someone to 
assist with actual approvals.


There are two distinct sets of ideas to explore, one is privilege 
management and the other sandboxing.


For privilege management we could add a --allow-root driver flag that 
allows gcc to run as root.  Without the flag one could either outright 
refuse to run or drop privileges and run.  Dropping privileges will be a 
bit tricky to implement because it would need a user to drop privileges 
to and then there would be the question of how to manage file access to 
read the compiler input and write out the compiler output.  If there's 
no such user, gcc could refuse to run as root by default.  I wonder 
though if from a security posture perspective it makes sense to simply 
discourage running as root all the time and not bother trying to make it 
work with dropped privileges and all that.  Of course it would mean that 
this would be less of a "project"; it'll be a simple enough patch to 
refuse to run until --allow-root is specified.


This probably ties in somewhat with an idea David Malcolm had riffed on 
with me earlier, of caching files for diagnostics.  If we could unify 
file accesses somehow, we could make this happen, i.e. open/read files 
as root and then do all execution as non-root.


Sandboxing will have similar requirements, i.e. map in input files and 
an output file handle upfront and then unshare() into a sandbox to do 
the actual compilation.  This will make sure that at least the 
processing of inputs does not affect the system on which the compilation 
is being run.


Sid

Re: [PATCH] SECURITY.txt: Drop "exploitable" in reference to hardening issues

2024-01-09 Thread Siddhesh Poyarekar


On 2023-12-18 09:35, Siddhesh Poyarekar wrote:
The "exploitable vulnerability" may lead to a misunderstanding that 
missed hardening issues are considered vulnerabilities, just that 
they're not exploitable.  This is not true, since while hardening bugs 
may be security-relevant, the absence of hardening does not make a 
program any more vulnerable to exploits than without.


Drop the "exploitable" word to make it clear that missed hardening is 
not considered a vulnerability.


Ping, may I commit this if there are no objections?

Thanks,
Sid



diff --git a/SECURITY.txt b/SECURITY.txt
index b3e2bbfda90..126603d4c22 100644
--- a/SECURITY.txt
+++ b/SECURITY.txt
@@ -155,10 +155,10 @@ Security features implemented in GCC
  GCC implements a number of security features that reduce the impact
  of security issues in applications, such as -fstack-protector,
  -fstack-clash-protection, _FORTIFY_SOURCE and so on.  A failure of
-    these features to function perfectly in all situations is not an
-    exploitable vulnerability in itself since it does not affect the
-    correctness of programs.  Further, they're dependent on heuristics
-    and may not always have full coverage for protection.
+    these features to function perfectly in all situations is not a
+    vulnerability in itself since it does not affect the correctness of
+    programs.  Further, they're dependent on heuristics and may not
+    always have full coverage for protection.

  Similarly, GCC may transform code in a way that the correctness of
  the expressed algorithm is preserved, but supplementary properties

Re: [PATCH] tree-object-size: Clean up unknown propagation

2023-12-20 Thread Siddhesh Poyarekar


On 2023-12-20 00:23, Jeff Law wrote:



On 12/19/23 10:21, Siddhesh Poyarekar wrote:

Narrow down scope of the unknowns bitmap so that it is only accessible
within the reexamination process.  This also removes any role of unknown
propagation from object_sizes_set, thus simplifying that code path a
bit.

gcc/ChangeLog:

* tree-object-size.cc (object_size_info): Remove UNKNOWNS.
Drop all references to it.
(object_sizes_set): Move unknowns propagation code to...
(gimplify_size_expressions): ... here.  Also free reexamine
bitmap.
(propagate_unknowns): New parameter UNKNOWNS.  Update callers.

Signed-off-by: Siddhesh Poyarekar 
---

This is a follow-up cleanup to pr#113012, but not required to fix that
bug.  Bootstrapped on x86_64 and with config=ubsan.
OK, assuming it also went through a regression test or does so before 
committing.


Thanks, yes, I also compared test results during the x86_64 bootstrap, 
likewise for i686 after I had posted this.


Sid

Re: [PATCH] tree-object-size: Always set computed bit for bdos [PR113012]

2023-12-19 Thread Siddhesh Poyarekar


On 2023-12-19 17:57, Jakub Jelinek wrote:

On Mon, Dec 18, 2023 at 11:42:52AM -0500, Siddhesh Poyarekar wrote:

It is always safe to set the computed bit for dynamic object sizes at
the end of collect_object_sizes_for because even in case of a dependency
loop encountered in nested calls, we have an SSA temporary to actually
finish the object size expression.  The reexamine pass for dynamic
object sizes is only for propagation of unknowns and gimplification of
the size expressions, not for loop resolution as in the case of static
object sizes.

gcc/ChangeLog:

PR tree-optimization/113012
* gcc.dg/ubsan/pr113012.c: New test case.

gcc/testsuite/ChangeLog:

PR tree-optimization/113012
* tree-object-size.cc (compute_builtin_object_size): Expand
comment for dynamic object sizes.
(collect_object_sizes_for): Always set COMPUTED bitmap for
dynamic object sizes.


You have the gcc/ChangeLog and gcc/testsuite/ChangeLog hunks swapped,
I think this wouldn't get through pre-commit hook.


Oops, fixed.


--- a/gcc/tree-object-size.cc
+++ b/gcc/tree-object-size.cc
@@ -1185,10 +1185,12 @@ compute_builtin_object_size (tree ptr, int 
object_size_type,
  osi.tos = NULL;
}
  
-  /* First pass: walk UD chains, compute object sizes that

-can be computed.  osi.reexamine bitmap at the end will
-contain what variables were found in dependency cycles
-and therefore need to be reexamined.  */
+  /* First pass: walk UD chains, compute object sizes that can be computed.
+osi.reexamine bitmap at the end will contain what variables that need


This wording is weird.
Perhaps ... will contain versions of SSA_NAMEs that need
to be reexamined. ?
Because varno seems to be SSA_NAME_VERSION.


Ack, it's unrelated to my change, but fixed since I touched the comment 
block.



@@ -1823,11 +1825,16 @@ collect_object_sizes_for (struct object_size_info *osi, 
tree var)
gcc_unreachable ();
  }
  
-  if (! reexamine || object_sizes_unknown_p (object_size_type, varno))

+  /* Dynamic sizes use placeholder temps to return an answer, so it is always
+   * safe to set COMPUTED for them.  */


We don't use this style of comments, please replace the * at the start of
second line with a space.


Oops, fixed.


+  if ((object_size_type & OST_DYNAMIC)
+  || !reexamine || object_sizes_unknown_p (object_size_type, varno))
  {
bitmap_set_bit (computed[object_size_type], varno);
if (!(object_size_type & OST_DYNAMIC))
bitmap_clear_bit (osi->reexamine, varno);
+  else if (reexamine)
+   bitmap_set_bit (osi->reexamine, varno);
  }
else
  {


Otherwise LGTM, but please wait at least a few weeks before backporting to
13.


OK, I'll push with fixes to trunk in a bit and then push to 13 next year.

Thanks,
Sid

[PATCH] tree-object-size: Clean up unknown propagation

2023-12-19 Thread Siddhesh Poyarekar

Narrow down scope of the unknowns bitmap so that it is only accessible
within the reexamination process.  This also removes any role of unknown
propagation from object_sizes_set, thus simplifying that code path a
bit.

gcc/ChangeLog:

* tree-object-size.cc (object_size_info): Remove UNKNOWNS.
Drop all references to it.
(object_sizes_set): Move unknowns propagation code to...
(gimplify_size_expressions): ... here.  Also free reexamine
bitmap.
(propagate_unknowns): New parameter UNKNOWNS.  Update callers.

Signed-off-by: Siddhesh Poyarekar 
---

This is a follow-up cleanup to pr#113012, but not required to fix that
bug.  Bootstrapped on x86_64 and with config=ubsan.

 gcc/tree-object-size.cc | 65 +
 1 file changed, 34 insertions(+), 31 deletions(-)

diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc
index 434b2fc0bf5..08a3b7f5d94 100644
--- a/gcc/tree-object-size.cc
+++ b/gcc/tree-object-size.cc
@@ -43,7 +43,7 @@ struct object_size_info
   int object_size_type;
   unsigned char pass;
   bool changed;
-  bitmap visited, reexamine, unknowns;
+  bitmap visited, reexamine;
   unsigned int *depths;
   unsigned int *stack, *tos;
 };
@@ -264,19 +264,8 @@ object_sizes_set (struct object_size_info *osi, unsigned 
varno, tree val,
 {
   if (bitmap_bit_p (osi->reexamine, varno))
{
- if (size_unknown_p (val, object_size_type))
-   {
- oldval = object_sizes_get (osi, varno);
- old_wholeval = object_sizes_get (osi, varno, true);
- bitmap_set_bit (osi->unknowns, SSA_NAME_VERSION (oldval));
- bitmap_set_bit (osi->unknowns, SSA_NAME_VERSION (old_wholeval));
- bitmap_clear_bit (osi->reexamine, varno);
-   }
- else
-   {
- val = bundle_sizes (oldval, val);
- wholeval = bundle_sizes (old_wholeval, wholeval);
-   }
+ val = bundle_sizes (oldval, val);
+ wholeval = bundle_sizes (old_wholeval, wholeval);
}
   else
{
@@ -958,25 +947,26 @@ emit_phi_nodes (gimple *stmt, tree size, tree wholesize)
size_unknown, as noted in UNKNOWNS.  */
 
 static tree
-propagate_unknowns (object_size_info *osi, tree expr)
+propagate_unknowns (object_size_info *osi, tree expr, bitmap unknowns)
 {
   int object_size_type = osi->object_size_type;
 
   switch (TREE_CODE (expr))
 {
 case SSA_NAME:
-  if (bitmap_bit_p (osi->unknowns, SSA_NAME_VERSION (expr)))
+  if (bitmap_bit_p (unknowns, SSA_NAME_VERSION (expr)))
return size_unknown (object_size_type);
   return expr;
 
 case MIN_EXPR:
 case MAX_EXPR:
{
- tree res = propagate_unknowns (osi, TREE_OPERAND (expr, 0));
+ tree res = propagate_unknowns (osi, TREE_OPERAND (expr, 0),
+unknowns);
  if (size_unknown_p (res, object_size_type))
return res;
 
- res = propagate_unknowns (osi, TREE_OPERAND (expr, 1));
+ res = propagate_unknowns (osi, TREE_OPERAND (expr, 1), unknowns);
  if (size_unknown_p (res, object_size_type))
return res;
 
@@ -984,7 +974,8 @@ propagate_unknowns (object_size_info *osi, tree expr)
}
 case MODIFY_EXPR:
{
- tree res = propagate_unknowns (osi, TREE_OPERAND (expr, 1));
+ tree res = propagate_unknowns (osi, TREE_OPERAND (expr, 1),
+unknowns);
  if (size_unknown_p (res, object_size_type))
return res;
  return expr;
@@ -992,7 +983,8 @@ propagate_unknowns (object_size_info *osi, tree expr)
 case TREE_VEC:
   for (int i = 0; i < TREE_VEC_LENGTH (expr); i++)
{
- tree res = propagate_unknowns (osi, TREE_VEC_ELT (expr, i));
+ tree res = propagate_unknowns (osi, TREE_VEC_ELT (expr, i),
+unknowns);
  if (size_unknown_p (res, object_size_type))
return res;
}
@@ -1000,7 +992,8 @@ propagate_unknowns (object_size_info *osi, tree expr)
 case PLUS_EXPR:
 case MINUS_EXPR:
{
- tree res = propagate_unknowns (osi, TREE_OPERAND (expr, 0));
+ tree res = propagate_unknowns (osi, TREE_OPERAND (expr, 0),
+unknowns);
  if (size_unknown_p (res, object_size_type))
return res;
 
@@ -1025,6 +1018,7 @@ gimplify_size_expressions (object_size_info *osi)
   /* Step 1: Propagate unknowns into expressions.  */
   bitmap reexamine = BITMAP_ALLOC (NULL);
   bitmap_copy (reexamine, osi->reexamine);
+  bitmap unknowns = BITMAP_ALLOC (NULL);
   do
 {
   changed = false;
@@ -1032,14 +1026,23 @@ gimplify_size_expressions (object_size_info *osi)
{
  object_size cur = object_sizes_get_raw (osi, i);
 
- if (size_unknown_p (propagate

[PATCH] tree-object-size: Always set computed bit for bdos [PR113012]

2023-12-18 Thread Siddhesh Poyarekar

It is always safe to set the computed bit for dynamic object sizes at
the end of collect_object_sizes_for because even in case of a dependency
loop encountered in nested calls, we have an SSA temporary to actually
finish the object size expression.  The reexamine pass for dynamic
object sizes is only for propagation of unknowns and gimplification of
the size expressions, not for loop resolution as in the case of static
object sizes.

gcc/ChangeLog:

PR tree-optimization/113012
* gcc.dg/ubsan/pr113012.c: New test case.

gcc/testsuite/ChangeLog:

PR tree-optimization/113012
* tree-object-size.cc (compute_builtin_object_size): Expand
comment for dynamic object sizes.
(collect_object_sizes_for): Always set COMPUTED bitmap for
dynamic object sizes.

Signed-off-by: Siddhesh Poyarekar 
---
Testing:

- Bootstrapped x86_64 and config=ubsan
- Tested i686

OK for trunk and backport to gcc 13 branch?

 gcc/testsuite/gcc.dg/ubsan/pr113012.c | 17 +
 gcc/tree-object-size.cc   | 17 -
 2 files changed, 29 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/pr113012.c

diff --git a/gcc/testsuite/gcc.dg/ubsan/pr113012.c 
b/gcc/testsuite/gcc.dg/ubsan/pr113012.c
new file mode 100644
index 000..4fc38cd1171
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ubsan/pr113012.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=undefined" } */
+
+int *
+foo (int x, int y, int z, int w)
+{
+  int *p = __builtin_malloc (z * sizeof (int));
+  int *q = p - 1;
+  while (--x > 0)
+{
+  if (w + 1 > y)
+   q = p - 1;
+  ++*q;
+  ++q;
+}
+  return p;
+}
diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc
index 28f27adf9ca..434b2fc0bf5 100644
--- a/gcc/tree-object-size.cc
+++ b/gcc/tree-object-size.cc
@@ -1185,10 +1185,12 @@ compute_builtin_object_size (tree ptr, int 
object_size_type,
  osi.tos = NULL;
}
 
-  /* First pass: walk UD chains, compute object sizes that
-can be computed.  osi.reexamine bitmap at the end will
-contain what variables were found in dependency cycles
-and therefore need to be reexamined.  */
+  /* First pass: walk UD chains, compute object sizes that can be computed.
+osi.reexamine bitmap at the end will contain what variables that need
+to be reexamined.  For both static and dynamic size computation,
+reexamination is for propagation across dependency loops. The dynamic
+case has the additional use case where the computed expression needs
+to be gimplified.  */
   osi.pass = 0;
   osi.changed = false;
   collect_object_sizes_for (, ptr);
@@ -1823,11 +1825,16 @@ collect_object_sizes_for (struct object_size_info *osi, 
tree var)
   gcc_unreachable ();
 }
 
-  if (! reexamine || object_sizes_unknown_p (object_size_type, varno))
+  /* Dynamic sizes use placeholder temps to return an answer, so it is always
+   * safe to set COMPUTED for them.  */
+  if ((object_size_type & OST_DYNAMIC)
+  || !reexamine || object_sizes_unknown_p (object_size_type, varno))
 {
   bitmap_set_bit (computed[object_size_type], varno);
   if (!(object_size_type & OST_DYNAMIC))
bitmap_clear_bit (osi->reexamine, varno);
+  else if (reexamine)
+   bitmap_set_bit (osi->reexamine, varno);
 }
   else
 {
-- 
2.43.0

[PATCH] SECURITY.txt: Drop "exploitable" in reference to hardening issues

2023-12-18 Thread Siddhesh Poyarekar

The "exploitable vulnerability" may lead to a misunderstanding that 
missed hardening issues are considered vulnerabilities, just that 
they're not exploitable.  This is not true, since while hardening bugs 
may be security-relevant, the absence of hardening does not make a 
program any more vulnerable to exploits than without.


Drop the "exploitable" word to make it clear that missed hardening is 
not considered a vulnerability.


diff --git a/SECURITY.txt b/SECURITY.txt
index b3e2bbfda90..126603d4c22 100644
--- a/SECURITY.txt
+++ b/SECURITY.txt
@@ -155,10 +155,10 @@ Security features implemented in GCC
 GCC implements a number of security features that reduce the impact
 of security issues in applications, such as -fstack-protector,
 -fstack-clash-protection, _FORTIFY_SOURCE and so on.  A failure of
-these features to function perfectly in all situations is not an
-exploitable vulnerability in itself since it does not affect the
-correctness of programs.  Further, they're dependent on heuristics
-and may not always have full coverage for protection.
+these features to function perfectly in all situations is not a
+vulnerability in itself since it does not affect the correctness of
+programs.  Further, they're dependent on heuristics and may not
+always have full coverage for protection.

 Similarly, GCC may transform code in a way that the correctness of
 the expressed algorithm is preserved, but supplementary properties

Re: [gcc15] nested functions in C

2023-12-07 Thread Siddhesh Poyarekar


On 2023-12-07 10:42, Eric Botcazou wrote:

I think from a language standpoint, the general idea that nested
functions are just any functions inside functions (which is how the C
nested functions essentially behave) is too broad and they should be
restricted to minimal implementations that, e.g. don't have side-effects
or if they do, there's explicit syntactic sugar to make it clearer.


That sounds totally arbitrary though.  Algol-derived languages have had nested
subprograms for ages, e.g. Pascal or Ada, and they can be very useful.


I'll admit that it is a subjective preference and is probably not in the 
spirit of traditional C.


Sid

Re: [C PATCH, v2] Add Walloc-size to warn about insufficient size in allocations [PR71219]

2023-12-06 Thread Siddhesh Poyarekar


On 2023-12-06 09:41, Jakub Jelinek wrote:

On Wed, Dec 06, 2023 at 09:30:32AM -0500, Siddhesh Poyarekar wrote:

We have the -Wmemset-transposed-args warning, couldn't we
have a similar one for calloc, and perhaps do it solely in
the case where one uses sizeof of the type used in the cast
pointer?
So warn for
(struct S *) calloc (sizeof (struct S), 1)
or
(struct S *) calloc (sizeof (struct S), n)
but not for
(struct S *) calloc (4, 15)
or
(struct S *) calloc (sizeof (struct T), 1)
or similar?  Of course check for compatible types of TYPE_MAIN_VARIANTs.


+1, this could be an analyzer warning, since in practice it is just a code
cleanliness issue.


We don't do such things in the analyzer, nor it is possible, by the time
analyzer sees the IL all the sizeofs etc. are folded.  Analyzer is for
expensive to compute warnings, code style warnings are normally implemented
in the FEs.


Thanks, understood.  A separate FE warning is fine as well.

Thanks,
Sid

Re: [C PATCH, v2] Add Walloc-size to warn about insufficient size in allocations [PR71219]

2023-12-06 Thread Siddhesh Poyarekar


On 2023-12-06 09:21, Jakub Jelinek wrote:

On Wed, Dec 06, 2023 at 02:34:10PM +0100, Martin Uecker wrote:

Further I think
"size less than or equal to the size requested"
is quite ambiguous in the calloc case, isn't the size requested in the
calloc case actually nmemb * size rather than just size?


This is unclear but it can be understood this way.
This was also Joseph's point.

I am happy to submit a patch that changes the code so
that the swapped arguments to calloc do not cause a warning
anymore.


That would be my preference because then the allocation size is
correct and it is purely a style warning.
It doesn't follow how the warning is described:
"Warn about calls to allocation functions decorated with attribute
@code{alloc_size} that specify insufficient size for the target type of
the pointer the result is assigned to"
when the size is certainly sufficient.

But wonder what others think about it.


+1, from a libc perspective, the transposed arguments don't make a 
difference, a typical allocator will produce sufficiently sized 
allocation for the calloc call.



BTW, shouldn't the warning be for C++ as well?  Sure, I know,
people use operator new more often, but still, the 
allocators are used in there as well.

We have the -Wmemset-transposed-args warning, couldn't we
have a similar one for calloc, and perhaps do it solely in
the case where one uses sizeof of the type used in the cast
pointer?
So warn for
(struct S *) calloc (sizeof (struct S), 1)
or
(struct S *) calloc (sizeof (struct S), n)
but not for
(struct S *) calloc (4, 15)
or
(struct S *) calloc (sizeof (struct T), 1)
or similar?  Of course check for compatible types of TYPE_MAIN_VARIANTs.


+1, this could be an analyzer warning, since in practice it is just a 
code cleanliness issue.


Thanks,
Sid

Re: [gcc15] nested functions in C

2023-12-05 Thread Siddhesh Poyarekar


On 2023-12-04 16:31, Martin Uecker wrote:

If (assuming from them being called lambdas) they are primarily for
small functions without side-effects then it's already a significantly
stronger specification than what we have right now with C nested
functions.  That would end up enforcing what you demonstrate as the good
way to use nested functions.


The proposal we have seen for C23 (which was not accepted into
C23 mostly due to timing and lack of implementation experience)
were similar to C++'s lambdas and did not have any such restriction.


Oh well :/


If nested functions are eventually going to make it into the C standard
then effort is probably better spent in porting the C nested functions
to use descriptors instead of executable stacks or heaps.


I submitted a patch for this a long time ago which was based
on the code for Ada that uses a bit in the pointer to differentiate
between conventional pointers and descriptors.

I would now prefer an approach that uses a qualifier on the
function type to indicate that the static chain has to be
set. A pointer to such a qualified function would a descriptor
that consists of the address and the value for the static chain.

This would be useful for many things.


Ack, this probably becomes a gcc15 thing then, given that stage 1 has 
closed.  Are you planning to revive your work and repost?


Thanks,
Sid

Re: [gcc15] nested functions in C


On 2023-12-04 13:51, Jakub Jelinek wrote:

Why?  The syntax doesn't seem to be something unexpected, and as C doesn't
have lambdas, one can use the nested functions instead.
The only problem is if you need to pass function pointers somewhere else
(and target doesn't have function descriptors or something similar), if it
is only done to make code more readable compared to say use of macros, I
think the nested functions are better, one doesn't have to worry about
multiple evaluations of argument side-effects etc.  And if everything is
inlined and SRA optimized, there is no extra cost.
The problem of passing it as a function pointer to other functions is
common with C++, only lambdas which don't capture anything actually can be
convertible to function pointer, for anything else you need a template and
instantiate it for a particular lambda (which is something you can't do in
C).


I think from a language standpoint, the general idea that nested 
functions are just any functions inside functions (which is how the C 
nested functions essentially behave) is too broad and they should be 
restricted to minimal implementations that, e.g. don't have side-effects 
or if they do, there's explicit syntactic sugar to make it clearer.


If (like Martin stated earlier), nested functions are in fact going to 
enter the C standard in future then maybe this discussion is moot and we 
probably are better off implementing descriptor support for C to replace 
the on-stack trampolines instead of adding -Werror=trampolines in a hurry.


Thanks,
Sid

Re: [gcc15] nested functions in C


On 2023-12-04 13:48, Martin Uecker wrote:

I empathize with Jakub's stated use case though of keeping the C
frontend support for testing purposes, but that could easily be done
behind a flag, or by putting nested C func deprecation behind a flag.


I am relatively sure C will get some form of nested functions.
Maybe as anonymous nested functions, i.e. lambdas, but I do
not see a fundamental difference here (I personally like naming
things for clarity, so i prefer named nested functions)


If (assuming from them being called lambdas) they are primarily for 
small functions without side-effects then it's already a significantly 
stronger specification than what we have right now with C nested 
functions.  That would end up enforcing what you demonstrate as the good 
way to use nested functions.


I suppose minimal, contained side-effects (such as atomically updating a 
structure) may also constitute sound design, but that should be made 
explicit in the language.



I don't disagree for cases like -Warray-bounds,
but for warnings/errors that are more deterministic in nature (like
-Werror=trampolines), they're going to point at actual problems and
larger projects and distributions will usually prefer to at least track
them, if not actually fix them.  For Fedora we tend to provide macro
overrides for packages that need to explicitly disable a security
related flag.


In projects such as mine, this will lead to a lot of code
transformations as indicated above, i.e. much worse code.

One could get away with it, since nested functions are rarely
used, but I think this is bad, because a lot of code would
improve if it used them.


If nested functions are eventually going to make it into the C standard 
then effort is probably better spent in porting the C nested functions 
to use descriptors instead of executable stacks or heaps.


Thanks,
Sid

[gcc15] nested functions in C

[Branching this into a separate conversation to avoid derailing the 
patch, which isn't directly related]


On 2023-12-04 12:21, Martin Uecker wrote:

I do not really agree with that.  Nested functions can substantially
improve code quality and in C can avoid type unsafe use of
void* pointers in callbacks. The code is often much better with
nested functions than without.  Nested functions and lambdas
(i.e. anonymous nested functions) are used in many languages
because they make code better and GNU's nested function are no
exception.

So I disagree with the idea that discouraging nested functions leads
to better code - I think the exact opposite is true.


I would argue that GNU's nested functions *are* an exception because 
they're like feathers stuck on a pig to try and make it fly; I think a 
significant specification effort is required to actually make it a 
cleanly usable feature.  It *may* be possible to implement patterns that 
use C nested functions well enough *and* result in readable code, but 
IMO it is easier to write clunky and unmaintainable code with it.


I empathize with Jakub's stated use case though of keeping the C 
frontend support for testing purposes, but that could easily be done 
behind a flag, or by putting nested C func deprecation behind a flag.



I am generally wary of mitigations that may make exploitation of
buffer overflows a bit harder  while increasing the likelihood
of buffer overflows by reducing type safety and/or code quality.

But I would agree that trampolines are generally problematic. A
better strategy would be wide function pointer type (as in Apple'
Blocks extension). Alternatively, an explicit way to obtain the
static chain for a nested function which could be used with
__builtin_call_with_static_chain  could also work.

But in any case, I think it diminishes the value of -fhardening
it if requires source code changes, because then it is not as easy
to simply turn it on in larger projects / distributitions.


I suppose you mean source code changes even in correct code just to 
comply with the flag?  I don't disagree for cases like -Warray-bounds, 
but for warnings/errors that are more deterministic in nature (like 
-Werror=trampolines), they're going to point at actual problems and 
larger projects and distributions will usually prefer to at least track 
them, if not actually fix them.  For Fedora we tend to provide macro 
overrides for packages that need to explicitly disable a security 
related flag.


Thanks,
Sid

Re: [PATCH] gcc: Disallow trampolines when -fhardened


On 2023-12-04 11:39, Andreas Schwab wrote:

On Dez 04 2023, Siddhesh Poyarekar wrote:


For hardened code in C, I think we really should look to step away from
nested functions instead of adding ways to continue supporting it. There's
probably a larger conversation to be had about the utility of nested
functions in general for C (and whether this GCC extension should be
deprecated altogether in future), but I feel like the -fhardened subset
gives us the opportunity to enforce at least a safe subset for now,
possibly extending it in future.


Nested functions by itself don't need a trampoline, only if the address
of it is passed outside the containing function's scope (as a callback,
for example).


Yes, that's why I said that the conversation about deprecating the C 
nested functions extension is a broader one (and hence for gcc 15) that 
will likely involve the question of whether dropping the extension 
altogether gives any benefit or if dropping support for on-stack 
trampolines is sufficient.  On-heap trampolines are maybe slightly 
better in that they don't need an executable stack, but defaulting to 
on-heap trampolines for -fhardened seems like a lost opportunity to 
enforce better user code.


Thanks,
Sid

Re: [PATCH] gcc: Disallow trampolines when -fhardened


On 2023-12-02 04:42, Martin Uecker wrote:



Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
It came up that a good hardening strategy is to disable trampolines
which may require executable stack.  Therefore the following patch
adds -Werror=trampolines to -fhardened.


This would add a warning about specific code (where it is then
unclear whether rewriting it is feasible or even an improvement),
which seems different to all the other flags -fhardening has
now.


It's actually -Werror=trampolines, not just -Wtrampolines; the aim is to 
hard fail on producing trampolines and consequently, an executable 
stack.  In general the goal of -fhardened is to produce hardened code 
and the nested function trampolines do the exact reverse of that, so 
-Werror=trampolines seems to align perfectly with that goal, doesn't it?



GCC now has an option to allocate trampolines on the heap,
which would seem to be a better fit.  On the other hand,
it does not work with longjmp which may be a limitation.


For hardened code in C, I think we really should look to step away from 
nested functions instead of adding ways to continue supporting it. 
There's probably a larger conversation to be had about the utility of 
nested functions in general for C (and whether this GCC extension should 
be deprecated altogether in future), but I feel like the -fhardened 
subset gives us the opportunity to enforce at least a safe subset for 
now, possibly extending it in future.


Thanks,
Sid

Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Siddhesh Poyarekar


On 2023-11-02 10:12, Martin Uecker wrote:

This shouldn't be necessary. The object-size pass
can track pointer arithmeti if it comes after
inserting the .ACCESS_WITH_SIZE.

https://godbolt.org/z/fvc3aoPfd


The problem is dependency tracking through the pointer arithmetic, which 
Jakub suggested to work around by passing a reference to the size in 
.ACCESS_WITH_SIZE to avoid DCE/reordering.


Thanks,
Sid

Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-10-31 Thread Siddhesh Poyarekar


On 2023-10-31 12:26, Qing Zhao wrote:

Hi,

I wrote a summary based on our extensive discussion, hopefully this can be 
served as an informal proposal.

Please take a look at it and let me know any comment or suggestion.

There are some (???) in the section 3.2 and 3.6, those are my questions seeking 
for help.  -:)

Thanks again for all the help.

Qing.


Represent the missing dependence for the "counted_by" attribute and its 
consumers

Qing Zhao

10/30/2023
==

The whole discussion is at:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633783.html

1. The problem

There is a data dependency between the size assignment and the implicit use of 
the size information in the __builtin_dynamic_object_size that is missing in 
the IL (line 11 and line 13 in the below example). Such information missing 
will result incorrect code reordering and other code transformations.

   1 struct A
   2 {
   3  size_t size;
   4  char buf[] __attribute__((counted_by(size)));
   5 };
   6
   7 size_t
   8 foo (size_t sz)
   9 {
  10  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
  11  obj->size = sz;
  12  obj->buf[0] = 2;
  13  return __builtin_dynamic_object_size (obj->buf, 1);
  14 }
   
Please see a more complicate example in the Appendex 1.


We need to represent such data dependency correctly in the IL.

2. The solution:

2.1 Summary

* Add a new internal function "ACCESS_WITH_SIZE" to carry the size information 
for every FAM field access;
* In C FE, Replace every FAM field access whose TYPE has the "counted_by" attribute with 
the new internal function "ACCESS_WITH_SIZE";
* In every consumer of the size information, for example, BDOS or array bound 
sanitizer, query the size information or ACCESS_MODE information from the new 
internal function;
* When the size information and the "ACCESS_MODE" information are not used 
anymore, possibly at the 2nd object size phase, replace the internal function with the 
actual FAM field access;
* Some adjustment to inlining heuristic and some SSA passes to mitigate the 
impact to the optimizer and code generation.

2.2 The new internal function

   .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)

INTERNAL_FN (ACCESS_WITH_SIZE, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)

which returns the "PTR" same as the 1st argument;

1st argument "PTR": Pointer to the object;
2nd argument "SIZE": The size of the pointed object,
   if the pointee of the "PTR" has a
 * real type, it's the number of the elements of the type;
 * void type, it's the number of bytes;
3rd argument "ACCESS_MODE":
   -1: Unknown access semantics
0: none
1: read_only
2: write_only
3: read_write

NOTEs,
   A. This new internal function is intended for a more general use from all the 3 attributes, "access", 
"alloc_size", and the new "counted_by", to encode the "size" and "access_mode" 
information to the corresponding pointer. (in order to resolve PR96503, etc. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503)
   B. For "counted_by" and "alloc_size" attributes, the 3rd argument will be -1.
   C. In this wrieup, we focus on the implementation details for the "counted_by" attribute. 
However, this function should be ready to be used by "access" and "alloc_size" without 
issue.

2.3 A new semantic requirement in the user documentation of "counted_by"

For the following structure including a FAM with a counted_by attribute:

   struct A
   {
size_t size;
char buf[] __attribute__((counted_by(size)));
   };

for any object with such type:

   struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));

The setting to the size field should be done before the first reference to the 
FAM field.


A more flexible specification could be stating that validation for a 
reference to the FAM field will use the latest value assigned to the 
size field before that reference.  That will allow for situations like:


  o->size = val1;
  deref (o->buf);
  o->size = val2;

making it clear that deref will see val1 and not val2.



Such requirement to the user will guarantee that the first reference to the FAM 
knows the size of the FAM.

We need to add this additional requirement to the user document.

2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE

In C FE:

for every reference to a FAM, for example, "obj->buf" in the small example,
   check whether the corresponding FIELD_DECL has a "counted_by" attribute?
   if YES, replace the reference to "obj->buf" with a call to
   .ACCESS_WITH_SIZE (obj->buf, obj->size, -1);

2.5 Query the size info

There are multiple consumers of the size info (and ACCESS_MODE info):

   * __builtin_dynamic_object_size;
   * array bound sanitizer;

in these consumers, get the size info from the 2nd argument of the call to
ACCESS_WITH_SIZE (PTR, SIZE, -1)

2.6 Eliminate the internal function when not useful anymore

Re: [PATCH] tree-optimization/109334: Improve computation for access attribute

2023-10-29 Thread Siddhesh Poyarekar


On 2023-10-28 16:29, Martin Uecker wrote:

Isn't this testcase h() in builtin-dynamic-object-size-20.c?  If you're
referring to testcase i(), then maybe "where the size is given by a
non-trivial function of a function parameter, e.g.
fn (size_t n, char buf[dummy(n)])."


h() is supported.  For i() we would need something as
__builtin_access__with_size to record the result of dummy().

But the comment refers to the simpler case:

fn (size_t n, char (*buf)[n])
[[gnu::access(read_write, 2, 1)]]

This doesn't work because buf[n] does not have constant
size, but it could be made to work more easily because
the size is directly given by a function argument.



Ah, so it would have been nice to have this more detailed explanation in 
the comment for clarity :)


Thanks,
Sid

Re: [PATCH] tree-optimization/109334: Improve computation for access attribute

2023-10-26 Thread Siddhesh Poyarekar


On 2023-10-26 04:37, Martin Uecker wrote:



Hi Sid and Jakub,

here is the patch discussed in PR 109334.



I can't approve, but here's a review:


Martin



 tree-optimization/109334: Improve computation for access attribute
 
 The fix for PR104970 restricted size computations to the case

 where the access attribute was specified explicitly (no VLA).
 It also restricted it to void pointers or elements with constant
 sizes.  The second restriction is enough to fix the original bug.
 Revert the first change to again allow size computations for VLA
 parameters and for VLA parameters together with an explicit access
 attribute.
 
 gcc/ChangeLog:
 
 PR tree-optimization/109334

 * tree-object-size.cc (parm_object_size): Allow size
 computation for explicit access attributes.
 
 gcc/testsuite/ChangeLog:
 
 PR tree-optimization/109334

 * gcc.dg/builtin-dynamic-object-size-20.c
 (test_parmsz_simple3): Supported again.
 (test_parmsz_external4): New test.
 * gcc.dg/builtin-dynamic-object-size-20.c: New test.
 * gcc.dg/pr104970.c: New test.

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
index 6da04202ffe..07e3da6f254 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
@@ -455,7 +455,6 @@ test_parmsz_simple2 (size_t sz, char obj[])
return __builtin_dynamic_object_size (obj, 0);
  }
  
-/* Implicitly constructed access attributes not supported yet.  */

  size_t
  __attribute__ ((noinline))
  test_parmsz_simple3 (size_t sz, char obj[sz])
@@ -527,6 +526,13 @@ test_parmsz_internal3 (size_t sz1, size_t sz2, double 
obj[sz1][sz2])
return __builtin_dynamic_object_size (obj, 0);
  }


This test case now works.  OK.

  
+size_t

+__attribute__ ((noinline))
+test_parmsz_internal4 (size_t sz1, size_t sz2, double obj[sz1 + 1][4])
+{
+  return __builtin_dynamic_object_size (obj, 0);
+}
+


New test case that isn't supported yet.  OK.


  /* Loops.  */
  
  size_t

@@ -721,8 +727,8 @@ main (int argc, char **argv)
if (test_parmsz_simple2 (__builtin_strlen (argv[0]) + 1, argv[0])
!= __builtin_strlen (argv[0]) + 1)
  FAIL ();
-  /* Only explicitly added access attributes are supported for now.  */
-  if (test_parmsz_simple3 (__builtin_strlen (argv[0]) + 1, argv[0]) != -1)
+  if (test_parmsz_simple3 (__builtin_strlen (argv[0]) + 1, argv[0])
+  != __builtin_strlen (argv[0]) + 1)
  FAIL ();
int arr[42];
if (test_parmsz_scaled (arr, 42) != sizeof (arr))
@@ -759,6 +765,8 @@ main (int argc, char **argv)
  FAIL ();
if (test_parmsz_internal3 (4, 4, obj) != -1)
  FAIL ();
+  if (test_parmsz_internal4 (3, 4, obj) != -1)
+FAIL ();
if (test_loop (arr, 42, 0, 32, 1) != 10 * sizeof (int))
  FAIL ();
if (test_loop (arr, 42, 32, -1, -1) != 0)
diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-20.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-20.c
new file mode 100644
index 000..2c8e07dd98d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-20.c
@@ -0,0 +1,49 @@
+/* PR 109334
+ * { dg-do run }
+ * { dg-options "-O1" } */
+
+
+[[gnu::noinline,gnu::noipa]]
+int f(int n, int buf[n])
+[[gnu::access(read_only, 2, 1)]]
+{
+return __builtin_dynamic_object_size(buf, 0);
+}
+
+[[gnu::noinline,gnu::noipa]]
+int g(int n, int buf[])
+[[gnu::access(read_only, 2, 1)]]
+{
+return __builtin_dynamic_object_size(buf, 0);
+}
+
+[[gnu::noinline,gnu::noipa]]
+int h(int n, int buf[n])
+{
+return __builtin_dynamic_object_size(buf, 0);
+}
+
+int dummy(int x) { return x + 1; }
+
+[[gnu::noinline,gnu::noipa]]
+int i(int n, int buf[dummy(n)])
+{
+return __builtin_dynamic_object_size(buf, 0);
+}
+
+int main()
+{
+int n = 10;
+int buf[n];
+if (n * sizeof(int) != f(n, buf))
+__builtin_abort();
+if (n * sizeof(int) != g(n, buf))
+__builtin_abort();
+if (n * sizeof(int) != h(n, buf))
+__builtin_abort();
+
+(void)i(n, buf);


f(), g(), h() supported, but i() isn't.  OK.


+
+return 0;
+}
+
diff --git a/gcc/testsuite/gcc.dg/pr104970.c b/gcc/testsuite/gcc.dg/pr104970.c
new file mode 100644
index 000..e24a7f22dfb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr104970.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -D_FORTIFY_SOURCE=2" } */


The -D_FORTIFY_SOURCE=2 shouldn't be necessary since it doesn't really 
do anything in the context of this test.



+
+__inline void
+memset2(void *__dest, int __ch, long __len) {
+  long __trans_tmp_1 = __builtin_dynamic_object_size(__dest, 0);
+  __builtin___memset_chk(__dest, __ch, __len, __trans_tmp_1);
+}
+
+void
+mleye(int l, double E[][l]) { memset2(E, 0, sizeof(double)); }


New regression test for the ICE reported in

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-25 Thread Siddhesh Poyarekar

On 2023-10-25 09:27, Qing Zhao wrote:

On Oct 24, 2023, at 7:56 PM, Siddhesh Poyarekar wrote:

On 2023-10-24 18:51, Qing Zhao wrote:

Thanks for the proposal!
So what you suggested is:
For every x.buf, change it as a __builtin_with_size(x.buf, x.L) in the FE,
then the call to the _bdos (x.buf, 1) will
Become:
_bdos(__builtin_with_size(x.buf, x.L), 1)?
Then the implicit use of x.L in _bdos(x.buf.1) will become explicit?

Oops, I think Martin and I fell off-list in a subthread. I clarified that my
comment was that any such annotation at object reference is probably too late
and hence not the right place for it; basically it has the same problems as the
option A in your comment. A better place to reinforce such a relationship
would be the allocation+initialization site instead.

I think Martin’s proposal might work, it’s different than the option A:

A. Add an additional argument, the size parameter, to __bdos,
A.1, during FE;
A.2, during gimplification phase;

Option A targets on the __bdos call, try to encode the implicit use to the
call, this will not work when the real object has not been instantiation at the
call site.

However, Martin’s proposal targets on the FMA array itself, it will enhance the
FAM access naturally with the size information. And such FAM access with size
info will propagated to the __bdos site later through inlining, etc. and then
tree-object-size can use the size information at that point. At the same time,
the implicit use of the size is recorded correctly.

So, I think that this proposal is natural and reasonable.

Ack, we discussed this later in the thread and I agree[1]. Richard
still has concerns[2] that I think may be addressed by putting
__builtin_with_size at the point where the reference to x.buf escapes,
but I'm not very sure about that.

Oh, and Martin suggested using __builtin_with_size more generally[3] in
bugzilla to address attribute inlining issues and we have high level
consensus for a __builtin_with_access instead, which associates access
type in addition to size with the target object. For the purposes of
counted_by, access type could simply be -1.

Thanks,
Sid

[1]
https://inbox.sourceware.org/gcc-patches/73af949c-3caa-4b11-93ce-3064b95a9...@gotplt.org/T/#m4f3cafa489493180e258fd62aca0196a5f244039

[2]
https://inbox.sourceware.org/gcc-patches/73af949c-3caa-4b11-93ce-3064b95a9...@gotplt.org/T/#mcf226f891621db8b640deaedd8942bb8519010f3

[3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503#c6

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-25 Thread Siddhesh Poyarekar


On 2023-10-25 04:16, Martin Uecker wrote:

Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:



Am 24.10.2023 um 22:38 schrieb Martin Uecker :

Am Dienstag, dem 24.10.2023 um 20:30 + schrieb Qing Zhao:

Hi, Sid,

Really appreciate for your example and detailed explanation. Very helpful.
I think that this example is an excellent example to show (almost) all the 
issues we need to consider.

I slightly modified this example to make it to be compilable and run-able, as 
following:
(but I still cannot make the incorrect reordering or DSE happening, anyway, the 
potential reordering possibility is there…)

  1 #include 
  2 struct A
  3 {
  4  size_t size;
  5  char buf[] __attribute__((counted_by(size)));
  6 };
  7
  8 static size_t
  9 get_size_from (void *ptr)
10 {
11  return __builtin_dynamic_object_size (ptr, 1);
12 }
13
14 void
15 foo (size_t sz)
16 {
17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
18  obj->size = sz;
19  obj->buf[0] = 2;
20  __builtin_printf (“%d\n", get_size_from (obj->buf));
21  return;
22 }
23
24 int main ()
25 {
26  foo (20);
27  return 0;
28 }






When it’s set I suppose.  Turn

X.l = n;

Into

X.l = __builtin_with_size (x.buf, n);


It would turn

some_variable = (&) x.buf

into

some_variable = __builtin_with_size ( (&) x.buf. x.len)


So the later access to x.buf and not the initialization
of a member of the struct (which is too early).



Hmm, so with Qing's example above, are you suggesting the transformation 
be to foo like so:


14 void
15 foo (size_t sz)
16 {
16.5  void * _1;
17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
18  obj->size = sz;
19  obj->buf[0] = 2;
19.5  _1 = __builtin_with_size (obj->buf, obj->size);
20  __builtin_printf (“%d\n", get_size_from (_1));
21  return;
22 }

If yes then this could indeed work.  I think I got thrown off by the 
reference to __bdos.


Thanks,
Sid

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)


On 2023-10-24 18:51, Qing Zhao wrote:

Thanks for the proposal!

So what you suggested is:

For every x.buf,  change it as a __builtin_with_size(x.buf, x.L) in the FE, 
then the call to the _bdos (x.buf, 1) will
Become:

_bdos(__builtin_with_size(x.buf, x.L), 1)?

Then the implicit use of x.L in _bdos(x.buf.1) will become explicit?


Oops, I think Martin and I fell off-list in a subthread.  I clarified 
that my comment was that any such annotation at object reference is 
probably too late and hence not the right place for it; basically it has 
the same problems as the option A in your comment.  A better place to 
reinforce such a relationship would be the allocation+initialization 
site instead.


Thanks,
Sid

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)


On 2023-10-24 18:41, Qing Zhao wrote:




On Oct 24, 2023, at 5:03 PM, Siddhesh Poyarekar  wrote:

On 2023-10-24 16:30, Qing Zhao wrote:

Situation 2: With O0, the routine “get_size_from” was NOT inlined into “foo”, 
therefore, the call to __bdos is Not in the same routine as the instantiation 
of the object, As a result, the TYPE info and the attached counted_by info of 
the object can NOT be USED by the __bdos call.


But __bos/__bdos are barely useful without optimization; you need a minimum of 
-O1.  You're right that if the call is never inlined then we don't care because 
the __bdos call does not get expanded to obj->size.

However, the point of situation 2 is that the TYPE info cannot be used by the 
__bdos call *only for a while* (i.e. until the call gets inlined) and that 
window is an opportunity for the reordering/DSE to break things.


The main point of situation 2 I tried made: there are situations where 
obj->size is not used at all by the __bdos, marking it as volatile is too 
conservative, unnecessarily prevent useful optimizations from happening.  -:)


Yes, that's the tradeoff.  However, maybe this is the point where Kees 
jumps in and say the kernel doesn't really care as much or something 
like that :)


Sid

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)


On 2023-10-24 16:38, Martin Uecker wrote:

Here is another proposal:  Add a new builtin function

__builtin_with_size(x, size)

that return x but behaves similar to an allocation
function in that BDOS can look at the size argument
to discover the size.

The FE insers this function when the field is accessed:

__builtin_with_size(x.buf, x.L);



In fact if we do this at the allocation site for x, it may also help 
with future warnings, where the compiler could flag a warning or error 
when it encounters this builtin but does not see an assignment to x.L.


Thanks,
Sid

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)


On 2023-10-24 16:30, Qing Zhao wrote:

Situation 2: With O0, the routine “get_size_from” was NOT inlined into “foo”, 
therefore, the call to __bdos is Not in the same routine as the instantiation 
of the object, As a result, the TYPE info and the attached counted_by info of 
the object can NOT be USED by the __bdos call.



But __bos/__bdos are barely useful without optimization; you need a 
minimum of -O1.  You're right that if the call is never inlined then we 
don't care because the __bdos call does not get expanded to obj->size.


However, the point of situation 2 is that the TYPE info cannot be used 
by the __bdos call *only for a while* (i.e. until the call gets inlined) 
and that window is an opportunity for the reordering/DSE to break things.


Thanks.
Sid

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)


On 2023-10-23 15:43, Qing Zhao wrote:




On Oct 23, 2023, at 2:43 PM, Siddhesh Poyarekar  wrote:

On 2023-10-23 14:06, Martin Uecker wrote:

We should aim for a good integration with the BDOS pass, so
that it can propagate the information further, e.g. the
following should work:
struct { int L; char buf[] __counted_by(L) } x;
x.L = N;
x.buf = ...;
char *p = >f;
__bdos(p) -> N
So we need to be smart on how we provide the size
information for x->f to the backend.
This would also be desirable for the language extension.


This is essentially why there need to be frontend rules constraining reordering 
and reachability semantics of x.L, thus restricting DSE and reordering for it.


My understanding is that Restricting DSE and reordering should be done by the 
proper data flow information, with a new argument added to the BDOS call, this 
correct data flow information could be maintained, and then the DSE and 
reordering will not happen.

I don’t quite understand what kind of frontend rules should be added to 
constrain reordering and reachability semantics? Can you explain this a little 
bit more? Do you mean to add some rules or requirment to the new attribute that 
the users of the attribute should follow in the source code?


Yes, but let me try and summarize the issues and the potential solutions 
at the end:





  This is not really a __bdos/__bos question, because that bit is trivial; if 
the structure is visible, the value is simply x.L.  This is also why adding a 
reference to x.L in __bos/__bdos is not sufficient or even possible in, e.g. 
the above case you note.


I am a little confused here, are we discussing how to resolve the potential 
reordering issue of the following:

"
struct annotated {
   size_t foo;
   char array[] __attribute__((counted_by (foo)));
};

   p->foo = 10;
   size = __builtin_dynamic_object_size (p->array,1);
“?

Or a bigger issue?


Right, so the problem we're trying to solve is the reordering of __bdos 
w.r.t. initialization of the size parameter but to also account for DSE 
of the assignment, we can abstract this problem to that of DFA being 
unable to see implicit use of the size parameter.  __bdos is the one 
such implicit user of the size parameter and you're proposing to solve 
this by encoding the relationship between buffer and size at the __bdos 
call site.  But what about the case when the instantiation of the object 
is not at the same place as the __bdos call site, i.e. the DFA is unable 
to make that relationship?


The example Martin showed where the subobject gets "hidden" behind a 
pointer was a trivial one where DFA *may* actually work in practice 
(because the object-size pass can thread through these assignments) but 
think about this one:


struct A
{
  size_t size;
  char buf[] __attribute__((counted_by(size)));
}

static size_t
get_size_of (void *ptr)
{
  return __bdos (ptr, 1);
}

void
foo (size_t sz)
{
  struct A *obj = __builtin_malloc (sz);
  obj->size = sz;

  ...
  __builtin_printf ("%zu\n", get_size_of (obj->array));
  ...
}

Until get_size_of is inlined, no DFA can see the __bdos call in the same 
place as the point where obj is allocated.  As a result, the assignment 
to obj->size could get reordered (or the store eliminated) w.r.t. the 
__bdos call until the inlining happens.


As a result, the relationship between buf and size established by the 
attribute needs to be encoded into the type somehow.  There are two options:


Option 1: Encode the relationship in the type of buf

This is kinda what you end up doing with component_ref_has_counted_by 
and it does show the relationship if one is looking (through that call), 
but nothing more that can be used to, e.g. prevent reordering or tell 
the optimizer that the reference to the buf member may imply a reference 
to the size member as well.  This could be remedied by somehow encoding 
the USES relationship for size into the type of buf that the 
optimization passes can see.  I feel like this may be a bit convoluted 
to specify in a future language extension in a way that will actually be 
well understood by developers, but it will likely generate faster 
runtime code.  This will also likely require a bigger change across passes.


Option 2: Encode the relationship in the type of size

The other option is to enhance the type of size somehow so that it 
discourages reordering and store elimination, basically pessimizing 
code.  I think volatile semantics might be the way to do this and may 
even be straightforward to specify in the future language extension 
given that it builds on a known language construct and is thematically 
related.  However it does pessimize output for code that implements 
__counted_by__.


Thanks,
Sid

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)


On 2023-10-23 14:06, Martin Uecker wrote:

We should aim for a good integration with the BDOS pass, so
that it can propagate the information further, e.g. the
following should work:

struct { int L; char buf[] __counted_by(L) } x;
x.L = N;
x.buf = ...;
char *p = >f;
__bdos(p) -> N

So we need to be smart on how we provide the size
information for x->f to the backend.

This would also be desirable for the language extension.


This is essentially why there need to be frontend rules constraining 
reordering and reachability semantics of x.L, thus restricting DSE and 
reordering for it.  This is not really a __bdos/__bos question, because 
that bit is trivial; if the structure is visible, the value is simply 
x.L.  This is also why adding a reference to x.L in __bos/__bdos is not 
sufficient or even possible in, e.g. the above case you note.


Thanks,
Sid

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)


On 2023-10-23 08:34, Richard Biener wrote:

A related issue is that assignment to the field and storage allocation
are not tied together - if there's no use of the size data we might
remove the store of it as dead.


Maybe the trick then is to treat the size data as volatile?  That ought
to discourage reordering and also prevent elimination of the "dead" store?


But we are an optimizing compiler, not a static analysis machine, so I
fail to see how this is a useful suggestion.


Sorry I didn't meant to suggest doing this in the middle-end.


I think Martins suggestion to approach this as a language extension
is more useful and would make it easier to handle this?


I think handling for this (e.g. treating any storage allocated for the 
size member in the struct as volatile to prevent reordering or 
elimination) would have to be implemented in the front-end, regardless 
of whether it is a language extension or as a gcc attribute.  How would 
making it a language extension vs a gcc attribute make it different?


Thanks,
Sid

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)


On 2023-10-23 03:57, Richard Biener wrote:

On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao  wrote:





On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar  wrote:

On 2023-10-20 14:38, Qing Zhao wrote:

How about the following:
   Add one more parameter to __builtin_dynamic_object_size(), i.e
__builtin_dynamic_object_size (_1,1,array_annotated->foo)?
When we see the structure field has counted_by attribute.


Or maybe add a barrier preventing any assignments to array_annotated->foo from 
being reordered below the __bdos call? Basically an __asm__ with 
array_annotated->foo in the clobber list ought to do it I think.


Maybe just adding the array_annotated->foo to the use list of the call to 
__builtin_dynamic_object_size should be enough?

But I am not sure how to implement this in the TREE level, is there a 
USE_LIST/CLOBBER_LIST for each call?  Then I can just simply add the counted_by 
field “array_annotated->foo” to the USE_LIST of the call to __bdos?

This might be the simplest solution?


If the dynamic object size is derived of a field then I think you need to
put the "load" of that memory location at the point (as argument)
of the __bos call right at parsing time.  I know that's awkward because
you try to play tricks "discovering" that field only late, but that's not
going to work.

A related issue is that assignment to the field and storage allocation
are not tied together - if there's no use of the size data we might
remove the store of it as dead.


Maybe the trick then is to treat the size data as volatile?  That ought 
to discourage reordering and also prevent elimination of the "dead" store?


Thanks,
Sid

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-20 Thread Siddhesh Poyarekar


On 2023-10-20 14:38, Qing Zhao wrote:

How about the following:

   Add one more parameter to __builtin_dynamic_object_size(), i.e

__builtin_dynamic_object_size (_1,1,array_annotated->foo)?

When we see the structure field has counted_by attribute.


Or maybe add a barrier preventing any assignments to 
array_annotated->foo from being reordered below the __bdos call? 
Basically an __asm__ with array_annotated->foo in the clobber list ought 
to do it I think.


It may not work for something like this though:

static size_t
get_size_of (void *ptr)
{
  return __bdos (ptr, 1);
}

void
foo (size_t sz)
{
  array_annotated = __builtin_malloc (sz);
  array_annotated = sz;

  ...
  __builtin_printf ("%zu\n", get_size_of (array_annotated->foo));
  ...
}

because the call to get_size_of () may not have been inlined that early.

The more fool-proof alternative may be to put a compile time barrier 
right below the assignment to array_annotated->foo; I reckon you could 
do that early in the front end by marking the size identifier and then 
tracking assignments to that identifier.  That may have a slight runtime 
performance overhead since it may prevent even legitimate reordering.  I 
can't think of another alternative at the moment...


Sid

Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-18 Thread Siddhesh Poyarekar


[Sorry, I forgot to respond to this]

On 2023-10-06 16:01, Martin Uecker wrote:

Am Freitag, dem 06.10.2023 um 06:50 -0400 schrieb Siddhesh Poyarekar:

On 2023-10-06 01:11, Martin Uecker wrote:

Am Donnerstag, dem 05.10.2023 um 15:35 -0700 schrieb Kees Cook:

On Thu, Oct 05, 2023 at 04:08:52PM -0400, Siddhesh Poyarekar wrote:

2. How would you handle signedness of the size field?  The size gets
converted to sizetype everywhere it is used and overflows/underflows may
produce interesting results.  Do you want to limit the types to unsigned or
do you want to add a disclaimer in the docs?  The former seems like the
*right* thing to do given that it is a new feature; best to enforce the
cleaner habit at the outset.


The Linux kernel has a lot of "int" counters, so the goal is to catch
negative offsets just like too-large offsets at runtime with the sanitizer
and report 0 for __bdos. Refactoring all these to be unsigned is going
to take time since at least some of them use the negative values as
special values unrelated to array indexing. :(

So, perhaps if unsigned counters are worth enforcing, can this be a
separate warning the kernel can turn off initially?



I think unsigned counters are much more problematic than signed ones
because wraparound errors are more difficult to find.

With unsigned you could potentially diagnose wraparound, but only if we
add -fsanitize=unsigned-overflow *and* add mechanism to mark intentional
wraparound *and* everybody adds this annotation after carefully screening
their code *and* rewriting all operations such as (counter - 3) + 5
where the wraparound in the intermediate expression is harmless.

For this reason, I do not think we should ever enforce some rule that
the counter has to be unsigned.

What we could do, is detect *storing* negative values into the
counter at run-time using UBSan. (but if negative values are
used for special cases, one also should be able to turn this
off).


All of the object size detection relies on object sizes being sizetype.
The closest we could do with that is detect (sz != SIZE_MAX && sz >
size_t / 2), since allocators typically cannot allocate more than
SIZE_MAX / 2.


I was talking about the counter in:

struct {
   int counter;
   char buf[] __counted_by__((counter))
};

which could be checked to be positive either when stored to or
when buf is used.

And yes, we could also check the size of buf.  Not sure what is
done for VLAs now, but I guess it could be similar.


Right now all object sizes are cast to sizetype and the generated 
dynamic expressions are such that overflows will result in the computed 
object size being zero.  Non-generated expressions (like we could get 
with __counted_by__) will simply be cast; there's probably scope for 
improvement here, where we wrap that with an expression that returns 0 
if the size exceeds SIZE_MAX / 2 since that's typically the limit for 
allocators.  We use that heuristic elsewhere in the __bos/__bdos logic too.


Thanks,
Sid

Re: [V3][PATCH 1/3] Provide counted_by attribute to flexible array member field (PR108896)

2023-10-18 Thread Siddhesh Poyarekar


On 2023-10-18 10:51, Qing Zhao wrote:



+   member FIELD_DECL is a valid field of the containing structure's fieldlist,
+   FIELDLIST, Report error and remove this attribute when it's not.  */
+static void
+verify_counted_by_attribute (tree fieldlist, tree field_decl)
+{
+  tree attr_counted_by = lookup_attribute ("counted_by",
+   DECL_ATTRIBUTES (field_decl));
+
+  if (!attr_counted_by)
+return;
+
+  /* If there is an counted_by attribute attached to the field,
+ verify it.  */
+
+  const char *fieldname
+= IDENTIFIER_POINTER (TREE_VALUE (TREE_VALUE (attr_counted_by)));
+
+  /* Verify the argument of the attrbute is a valid field of the

s/attrbute/attribute/

+ containing structure.  */
+
+  tree counted_by_field = get_named_field (fieldlist, fieldname);
+
+  /* Error when the field is not found in the containing structure.  */
+  if (!counted_by_field)
+{
+  error_at (DECL_SOURCE_LOCATION (field_decl),
+"%qE attribute argument not a field declaration"
+" in the same structure, ignore it",
+(get_attribute_name (attr_counted_by)));

Probably someone with English as a first language would make a better 
suggestion, but how about:
   Argument specified in %qE attribute is not a field declaration in the
   same structure, ignoring it.

+
+  DECL_ATTRIBUTES (field_decl)
+= remove_attribute ("counted_by", DECL_ATTRIBUTES (field_decl));
+}
+  else
+  /* Error when the field is not with an integer type.  */

Suggest: Flag an error when the field is not of an integer type.

+{
+  while (TREE_CHAIN (counted_by_field))
+counted_by_field = TREE_CHAIN (counted_by_field);
+  tree real_field = TREE_VALUE (counted_by_field);
+
+  if (TREE_CODE (TREE_TYPE (real_field)) != INTEGER_TYPE)
+{
+  error_at (DECL_SOURCE_LOCATION (field_decl),
+ "%qE attribute argument not a field declaration"
+ " with integer type, ignore it",
+ (get_attribute_name (attr_counted_by)));

Suggest:
   Argument specified in %qE attribute is not of an integer type,
   ignoring it.

+
+  DECL_ATTRIBUTES (field_decl)
+= remove_attribute ("counted_by", DECL_ATTRIBUTES (field_decl));
+}
+}
+
+  return;


I forgot to mention the redundant return here.


Could you please clarify a little bit here, why the return here is redundant?


It's the last line in the function, so even without that statement the 
function will return.


Thanks,
Sid

Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-06 Thread Siddhesh Poyarekar


On 2023-10-06 01:11, Martin Uecker wrote:

Am Donnerstag, dem 05.10.2023 um 15:35 -0700 schrieb Kees Cook:

On Thu, Oct 05, 2023 at 04:08:52PM -0400, Siddhesh Poyarekar wrote:

2. How would you handle signedness of the size field?  The size gets
converted to sizetype everywhere it is used and overflows/underflows may
produce interesting results.  Do you want to limit the types to unsigned or
do you want to add a disclaimer in the docs?  The former seems like the
*right* thing to do given that it is a new feature; best to enforce the
cleaner habit at the outset.


The Linux kernel has a lot of "int" counters, so the goal is to catch
negative offsets just like too-large offsets at runtime with the sanitizer
and report 0 for __bdos. Refactoring all these to be unsigned is going
to take time since at least some of them use the negative values as
special values unrelated to array indexing. :(

So, perhaps if unsigned counters are worth enforcing, can this be a
separate warning the kernel can turn off initially?



I think unsigned counters are much more problematic than signed ones
because wraparound errors are more difficult to find.

With unsigned you could potentially diagnose wraparound, but only if we
add -fsanitize=unsigned-overflow *and* add mechanism to mark intentional
wraparound *and* everybody adds this annotation after carefully screening
their code *and* rewriting all operations such as (counter - 3) + 5
where the wraparound in the intermediate expression is harmless.

For this reason, I do not think we should ever enforce some rule that
the counter has to be unsigned.

What we could do, is detect *storing* negative values into the
counter at run-time using UBSan. (but if negative values are
used for special cases, one also should be able to turn this
off).


All of the object size detection relies on object sizes being sizetype. 
The closest we could do with that is detect (sz != SIZE_MAX && sz > 
size_t / 2), since allocators typically cannot allocate more than 
SIZE_MAX / 2.


Sid

Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

On 05-Oct-2023 18:35, Kees Cook  wrote:On Thu, Oct 05, 2023 at 04:08:52PM -0400, Siddhesh Poyarekar wrote:

> 2. How would you handle signedness of the size field?  The size gets

> converted to sizetype everywhere it is used and overflows/underflows may

> produce interesting results.  Do you want to limit the types to unsigned or

> do you want to add a disclaimer in the docs?  The former seems like the

> *right* thing to do given that it is a new feature; best to enforce the

> cleaner habit at the outset.



The Linux kernel has a lot of "int" counters, so the goal is to catch

negative offsets just like too-large offsets at runtime with the sanitizer

and report 0 for __bdos. Refactoring all these to be unsigned is going

to take time since at least some of them use the negative values as

special values unrelated to array indexing. :(



So, perhaps if unsigned counters are worth enforcing, can this be a

separate warning the kernel can turn off initially?That should be fine, I just want to be sure we're thinking about this during the design.  In that case we should probably add negative offset tests to ensure that we're actually catching these issues with __bdos.Thanks,Sid

Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)


On 2023-08-25 11:24, Qing Zhao wrote:

This is the 3rd version of the patch, per our discussion based on the
review comments for the 1st and 2nd version, the major changes in this
version are:


Hi Qing,

I hope the review was helpful.  Overall, a couple of things to consider:

1. How would you handle potential reordering between assignment of the 
size to the counted_by field with the __bdos call that may consume it? 
You'll probably need to express some kind of dependency there or in the 
worst case, insert a barrier to disallow reordering.


2. How would you handle signedness of the size field?  The size gets 
converted to sizetype everywhere it is used and overflows/underflows may 
produce interesting results.  Do you want to limit the types to unsigned 
or do you want to add a disclaimer in the docs?  The former seems like 
the *right* thing to do given that it is a new feature; best to enforce 
the cleaner habit at the outset.


Thanks,
Sid



***Against 1st version:
1. change the name "element_count" to "counted_by";
2. change the parameter for the attribute from a STRING to an
Identifier;
3. Add logic and testing cases to handle anonymous structure/unions;
4. Clarify documentation to permit the situation when the allocation
size is larger than what's specified by "counted_by", at the same time,
it's user's error if allocation size is smaller than what's specified by
"counted_by";
5. Add a complete testing case for using counted_by attribute in
__builtin_dynamic_object_size when there is mismatch between the
allocation size and the value of "counted_by", the expecting behavior
for each case and the explanation on why in the comments.

***Against 2rd version:
1. Identify a tree node sharing issue and fixed it in the routine
"component_ref_get_counted_ty" of tree.cc;
2. Update the documentation and testing cases with the clear usage
of the fomula to compute the allocation size:
MAX (sizeof (struct A), offsetof (struct A, array[0]) + counted_by * 
sizeof(element))
(the algorithm used in tree-object-size.cc is correct).

In this set of patches, the major functionality provided is:

1. a new attribute "counted_by";
2. use this new attribute in bound sanitizer;
3. use this new attribute in dynamic object size for subobject size;

As discussed, I plan to add two more separate patches sets after this initial
patch set is approved and committed.

set 1. A new warning option and a new sanitizer option for the user error
   when the allocation size is smaller than the value of "counted_by".
set 2. An improvement to __builtin_dynamic_object_size  for whole-object
   size of the structure with FAM annaoted with counted_by.

there are also some existing bugs in tree-object-size.cc identified
during the study, and PRs were filed to record them. these bugs will
be fixed seperately with individual patches:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111030
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111040

Bootstrapped and regression tested on both aarch64 and X86, no issue.

Please see more details on the description of this work on:

https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619708.html

and more discussions on
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626376.html

Okay for committing?

thanks.

Qing

Qing Zhao (3):
   Provide counted_by attribute to flexible array member field (PR108896)
   Use the counted_by atribute info in builtin object size [PR108896]
   Use the counted_by attribute information in bound sanitizer[PR108896]

  gcc/c-family/c-attribs.cc |  54 -
  gcc/c-family/c-common.cc  |  13 ++
  gcc/c-family/c-common.h   |   1 +
  gcc/c-family/c-ubsan.cc   |  16 ++
  gcc/c/c-decl.cc   |  79 +--
  gcc/doc/extend.texi   |  77 +++
  .../gcc.dg/flex-array-counted-by-2.c  |  74 ++
  .../gcc.dg/flex-array-counted-by-3.c  | 210 ++
  gcc/testsuite/gcc.dg/flex-array-counted-by.c  |  40 
  .../ubsan/flex-array-counted-by-bounds-2.c|  27 +++
  .../ubsan/flex-array-counted-by-bounds.c  |  46 
  gcc/tree-object-size.cc   |  37 ++-
  gcc/tree.cc   | 133 +++
  gcc/tree.h|  15 ++
  14 files changed, 797 insertions(+), 25 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by.c
  create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
  create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c

Re: [V3][PATCH 2/3] Use the counted_by atribute info in builtin object size [PR108896]





On 2023-08-25 11:24, Qing Zhao wrote:

Use the counted_by atribute info in builtin object size to compute the
subobject size for flexible array members.

gcc/ChangeLog:

PR C/108896
* tree-object-size.cc (addr_object_size): Use the counted_by
attribute info.
* tree.cc (component_ref_has_counted_by_p): New function.
(component_ref_get_counted_by): New function.
* tree.h (component_ref_has_counted_by_p): New prototype.
(component_ref_get_counted_by): New prototype.

gcc/testsuite/ChangeLog:

PR C/108896
* gcc.dg/flex-array-counted-by-2.c: New test.
* gcc.dg/flex-array-counted-by-3.c: New test.
---
  .../gcc.dg/flex-array-counted-by-2.c  |  74 ++
  .../gcc.dg/flex-array-counted-by-3.c  | 210 ++
  gcc/tree-object-size.cc   |  37 ++-
  gcc/tree.cc   |  95 +++-
  gcc/tree.h|  10 +
  5 files changed, 418 insertions(+), 8 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c

diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-2.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
new file mode 100644
index ..ec580c1f1f01
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
@@ -0,0 +1,74 @@
+/* test the attribute counted_by and its usage in
+ * __builtin_dynamic_object_size.  */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "builtin-object-size-common.h"
+
+#define expect(p, _v) do { \
+size_t v = _v; \
+if (p == v) \
+   __builtin_printf ("ok:  %s == %zd\n", #p, p); \
+else \
+   {  \
+ __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v); \
+ FAIL (); \
+   } \
+} while (0);


You're using this in a bunch of tests already; does it make sense to 
consolidate it into builtin-object-size-common.h?



+
+struct flex {
+  int b;
+  int c[];
+} *array_flex;
+
+struct annotated {
+  int b;
+  int c[] __attribute__ ((counted_by (b)));
+} *array_annotated;
+
+struct nested_annotated {
+  struct {
+union {
+  int b;
+  float f; 
+};
+int n;
+  };
+  int c[] __attribute__ ((counted_by (b)));
+} *array_nested_annotated;
+
+void __attribute__((__noinline__)) setup (int normal_count, int attr_count)
+{
+  array_flex
+= (struct flex *)malloc (sizeof (struct flex)
++ normal_count *  sizeof (int));
+  array_flex->b = normal_count;
+
+  array_annotated
+= (struct annotated *)malloc (sizeof (struct annotated)
+ + attr_count *  sizeof (int));
+  array_annotated->b = attr_count;
+
+  array_nested_annotated
+= (struct nested_annotated *)malloc (sizeof (struct nested_annotated)
++ attr_count *  sizeof (int));
+  array_nested_annotated->b = attr_count;
+
+  return;
+}
+
+void __attribute__((__noinline__)) test ()
+{
+expect(__builtin_dynamic_object_size(array_flex->c, 1), -1);
+expect(__builtin_dynamic_object_size(array_annotated->c, 1),
+  array_annotated->b * sizeof (int));
+expect(__builtin_dynamic_object_size(array_nested_annotated->c, 1),
+  array_nested_annotated->b * sizeof (int));
+}


Maybe another test where the allocation, size assignment and __bdos call 
happen in the same function, where the allocator is not recognized by gcc:


void *
__attribute__ ((noinline))
alloc (size_t sz)
{
  return __builtin_malloc (sz);
}

void test (size_t sz)
{
  array_annotated = alloc (sz);
  array_annotated->b = sz;
  return __builtin_dynamic_object_size (array_annotated->c, 1);
}

The interesting thing to test (and ensure in the codegen) is that the 
assignment to array_annotated->b does not get reordered to below the 
__builtin_dynamic_object_size call since technically there is no data 
dependency between the two.



+
+int main(int argc, char *argv[])
+{
+  setup (10,10);
+  test ();
+  DONE ();
+}
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
new file mode 100644
index ..a0c3cb88ec71
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
@@ -0,0 +1,210 @@
+/* test the attribute counted_by and its usage in
+__builtin_dynamic_object_size: what's the correct behavior when the
+allocation size mismatched with the value of counted_by attribute?  */


If the behaviour is undefined, does it make sense to add tests for this? 
 Maybe once you have a -Wmismatched-counted-by or similar, we could 
have tests for that.  I guess the counter-argument is that we keep track 
of this behaviour but not necessarily guarantee it.



+/* { dg-do run } */
+/* { dg-options "-O -fstrict-flex-arrays=3" } */
+
+#include "builtin-object-size-common.h"
+
+struct annotated {
+  size_t foo;
+  char others;
+  char array[]

Re: [V3][PATCH 1/3] Provide counted_by attribute to flexible array member field (PR108896)


On 2023-10-05 14:51, Siddhesh Poyarekar wrote:

On 2023-08-25 11:24, Qing Zhao wrote:

Provide a new counted_by attribute to flexible array member field.


The obligatory "I can't ack the patch but here's a review" disclaimer :)



'counted_by (COUNT)'
  The 'counted_by' attribute may be attached to the flexible array
  member of a structure.  It indicates that the number of the
  elements of the array is given by the field named "COUNT" in the
  same structure as the flexible array member.  GCC uses this
  information to improve the results of the array bound sanitizer and
  the '__builtin_dynamic_object_size'.

  For instance, the following code:

   struct P {
 size_t count;
 char other;
 char array[] __attribute__ ((counted_by (count)));
   } *p;

  specifies that the 'array' is a flexible array member whose number
  of elements is given by the field 'count' in the same structure.

  The field that represents the number of the elements should have an
  integer type.  An explicit 'counted_by' annotation defines a
  relationship between two objects, 'p->array' and 'p->count', that
  'p->array' has _at least_ 'p->count' number of elements available.
  This relationship must hold even after any of these related objects
  are updated.  It's the user's responsibility to make sure this
  relationship to be kept all the time.  Otherwise the results of the
  array bound sanitizer and the '__builtin_dynamic_object_size' might
  be incorrect.

  For instance, in the following example, the allocated array has
  less elements than what's specified by the 'sbuf->count', this is
  an user error.  As a result, out-of-bounds access to the array
  might not be detected.

   #define SIZE_BUMP 10
   struct P *sbuf;
   void alloc_buf (size_t nelems)
   {
 sbuf = (struct P *) malloc (MAX (sizeof (struct P),
    (offsetof (struct P, 
array[0])

 + nelems * sizeof (char;
 sbuf->count = nelems + SIZE_BUMP;
 /* This is invalid when the sbuf->array has less than 
sbuf->count

    elements.  */
   }

  In the following example, the 2nd update to the field 'sbuf->count'
  of the above structure will permit out-of-bounds access to the
  array 'sbuf>array' as well.

   #define SIZE_BUMP 10
   struct P *sbuf;
   void alloc_buf (size_t nelems)
   {
 sbuf = (struct P *) malloc (MAX (sizeof (struct P),
    (offsetof (struct P, 
array[0])
 + (nelems + SIZE_BUMP) * 
sizeof (char;

 sbuf->count = nelems;
 /* This is valid when the sbuf->array has at least 
sbuf->count

    elements.  */
   }
   void use_buf (int index)
   {
 sbuf->count = sbuf->count + SIZE_BUMP + 1;
 /* Now the value of sbuf->count is larger than the number
    of elements of sbuf->array.  */
 sbuf->array[index] = 0;
 /* then the out-of-bound access to this array
    might not be detected.  */
   }

gcc/c-family/ChangeLog:

PR C/108896
* c-attribs.cc (handle_counted_by_attribute): New function.
(attribute_takes_identifier_p): Add counted_by attribute to the list.
* c-common.cc (c_flexible_array_member_type_p): ...To this.
* c-common.h (c_flexible_array_member_type_p): New prototype.

gcc/c/ChangeLog:

PR C/108896
* c-decl.cc (flexible_array_member_type_p): Renamed and moved to...
(add_flexible_array_elts_to_size): Use renamed function.
(is_flexible_array_member_p): Use renamed function.
(verify_counted_by_attribute): New function.
(finish_struct): Use renamed function and verify counted_by
attribute.

gcc/ChangeLog:

PR C/108896
* doc/extend.texi: Document attribute counted_by.
* tree.cc (get_named_field): New function.
* tree.h (get_named_field): New prototype.

gcc/testsuite/ChangeLog:

PR C/108896
* gcc.dg/flex-array-counted-by.c: New test.
---
  gcc/c-family/c-attribs.cc    | 54 -
  gcc/c-family/c-common.cc | 13 
  gcc/c-family/c-common.h  |  1 +
  gcc/c/c-decl.cc  | 79 +++-
  gcc/doc/extend.texi  | 77 +++
  gcc/testsuite/gcc.dg/flex-array-counted-by.c | 40 ++
  gcc/tree.cc  | 40 ++
  gcc/tree.h   |  5 ++
  8 files changed, 291 insertions(+), 18 deletions(-)
  create mode 100644 gcc/testsuite/gcc.

Re: [V3][PATCH 1/3] Provide counted_by attribute to flexible array member field (PR108896)


On 2023-08-25 11:24, Qing Zhao wrote:

Provide a new counted_by attribute to flexible array member field.


The obligatory "I can't ack the patch but here's a review" disclaimer :)



'counted_by (COUNT)'
  The 'counted_by' attribute may be attached to the flexible array
  member of a structure.  It indicates that the number of the
  elements of the array is given by the field named "COUNT" in the
  same structure as the flexible array member.  GCC uses this
  information to improve the results of the array bound sanitizer and
  the '__builtin_dynamic_object_size'.

  For instance, the following code:

   struct P {
 size_t count;
 char other;
 char array[] __attribute__ ((counted_by (count)));
   } *p;

  specifies that the 'array' is a flexible array member whose number
  of elements is given by the field 'count' in the same structure.

  The field that represents the number of the elements should have an
  integer type.  An explicit 'counted_by' annotation defines a
  relationship between two objects, 'p->array' and 'p->count', that
  'p->array' has _at least_ 'p->count' number of elements available.
  This relationship must hold even after any of these related objects
  are updated.  It's the user's responsibility to make sure this
  relationship to be kept all the time.  Otherwise the results of the
  array bound sanitizer and the '__builtin_dynamic_object_size' might
  be incorrect.

  For instance, in the following example, the allocated array has
  less elements than what's specified by the 'sbuf->count', this is
  an user error.  As a result, out-of-bounds access to the array
  might not be detected.

   #define SIZE_BUMP 10
   struct P *sbuf;
   void alloc_buf (size_t nelems)
   {
 sbuf = (struct P *) malloc (MAX (sizeof (struct P),
(offsetof (struct P, array[0])
 + nelems * sizeof (char;
 sbuf->count = nelems + SIZE_BUMP;
 /* This is invalid when the sbuf->array has less than sbuf->count
elements.  */
   }

  In the following example, the 2nd update to the field 'sbuf->count'
  of the above structure will permit out-of-bounds access to the
  array 'sbuf>array' as well.

   #define SIZE_BUMP 10
   struct P *sbuf;
   void alloc_buf (size_t nelems)
   {
 sbuf = (struct P *) malloc (MAX (sizeof (struct P),
(offsetof (struct P, array[0])
 + (nelems + SIZE_BUMP) * sizeof 
(char;
 sbuf->count = nelems;
 /* This is valid when the sbuf->array has at least sbuf->count
elements.  */
   }
   void use_buf (int index)
   {
 sbuf->count = sbuf->count + SIZE_BUMP + 1;
 /* Now the value of sbuf->count is larger than the number
of elements of sbuf->array.  */
 sbuf->array[index] = 0;
 /* then the out-of-bound access to this array
might not be detected.  */
   }

gcc/c-family/ChangeLog:

PR C/108896
* c-attribs.cc (handle_counted_by_attribute): New function.
(attribute_takes_identifier_p): Add counted_by attribute to the list.
* c-common.cc (c_flexible_array_member_type_p): ...To this.
* c-common.h (c_flexible_array_member_type_p): New prototype.

gcc/c/ChangeLog:

PR C/108896
* c-decl.cc (flexible_array_member_type_p): Renamed and moved to...
(add_flexible_array_elts_to_size): Use renamed function.
(is_flexible_array_member_p): Use renamed function.
(verify_counted_by_attribute): New function.
(finish_struct): Use renamed function and verify counted_by
attribute.

gcc/ChangeLog:

PR C/108896
* doc/extend.texi: Document attribute counted_by.
* tree.cc (get_named_field): New function.
* tree.h (get_named_field): New prototype.

gcc/testsuite/ChangeLog:

PR C/108896
* gcc.dg/flex-array-counted-by.c: New test.
---
  gcc/c-family/c-attribs.cc| 54 -
  gcc/c-family/c-common.cc | 13 
  gcc/c-family/c-common.h  |  1 +
  gcc/c/c-decl.cc  | 79 +++-
  gcc/doc/extend.texi  | 77 +++
  gcc/testsuite/gcc.dg/flex-array-counted-by.c | 40 ++
  gcc/tree.cc  | 40 ++
  gcc/tree.h   |  5 ++
  8 files changed, 291 insertions(+), 18 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by.c

diff --git

[committed 0/2] SECURITY.txt: Trivial fixups

Committed some trivial comma and indentation fixups that Jan shared with
me off-list.

Jan Engelhardt (2):
  secpol: add grammatically missing commas / remove one excess instance
  secpol: consistent indentation

 SECURITY.txt | 48 
 1 file changed, 24 insertions(+), 24 deletions(-)

-- 
2.41.0

[committed 2/2] secpol: consistent indentation

From: Jan Engelhardt 

86% of the document have 4 spaces; adjust the remaining 14%.

Signed-off-by: Jan Engelhardt 

ChangeLog:

* SECURITY.txt: Fix up indentation.
---
 SECURITY.txt | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/SECURITY.txt b/SECURITY.txt
index 93792923583..b3e2bbfda90 100644
--- a/SECURITY.txt
+++ b/SECURITY.txt
@@ -173,33 +173,33 @@ Security features implemented in GCC
 Reporting private security bugs
 ===
 
-   *All bugs reported in the GCC Bugzilla are public.*
+*All bugs reported in the GCC Bugzilla are public.*
 
-   In order to report a private security bug that is not immediately
-   public, please contact one of the downstream distributions with
-   security teams.  The following teams have volunteered to handle
-   such bugs:
+In order to report a private security bug that is not immediately
+public, please contact one of the downstream distributions with
+security teams.  The following teams have volunteered to handle
+such bugs:
 
   Debian:  secur...@debian.org
   Red Hat: secal...@redhat.com
   SUSE:secur...@suse.de
   AdaCore: product-secur...@adacore.com
 
-   Please report the bug to just one of these teams.  It will be shared
-   with other teams as necessary.
+Please report the bug to just one of these teams.  It will be shared
+with other teams as necessary.
 
-   The team contacted will take care of details such as vulnerability
-   rating and CVE assignment (http://cve.mitre.org/about/).  It is likely
-   that the team will ask to file a public bug because the issue is
-   sufficiently minor and does not warrant an embargo.  An embargo is not
-   a requirement for being credited with the discovery of a security
-   vulnerability.
+The team contacted will take care of details such as vulnerability
+rating and CVE assignment (http://cve.mitre.org/about/).  It is likely
+that the team will ask to file a public bug because the issue is
+sufficiently minor and does not warrant an embargo.  An embargo is not
+a requirement for being credited with the discovery of a security
+vulnerability.
 
 Reporting public security bugs
 ==
 
-   It is expected that critical security bugs will be rare, and that most
-   security bugs can be reported in GCC, thus making
-   them public immediately.  The system can be found here:
+It is expected that critical security bugs will be rare, and that most
+security bugs can be reported in GCC, thus making
+them public immediately.  The system can be found here:
 
   https://gcc.gnu.org/bugzilla/
-- 
2.41.0

[committed 1/2] secpol: add grammatically missing commas / remove one excess instance

From: Jan Engelhardt 

Signed-off-by: Jan Engelhardt 

ChangeLog:

* SECURITY.txt: Fix up commas.
---
 SECURITY.txt | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/SECURITY.txt b/SECURITY.txt
index b65f24cfc2a..93792923583 100644
--- a/SECURITY.txt
+++ b/SECURITY.txt
@@ -3,12 +3,12 @@ What is a GCC security bug?
 
 A security bug is one that threatens the security of a system or
 network, or might compromise the security of data stored on it.
-In the context of GCC there are multiple ways in which this might
+In the context of GCC, there are multiple ways in which this might
 happen and some common scenarios are detailed below.
 
 If you're reporting a security issue and feel like it does not fit
 into any of the descriptions below, you're encouraged to reach out
-through the GCC bugzilla or if needed, privately, by following the
+through the GCC bugzilla or, if needed, privately, by following the
 instructions in the last two sections of this document.
 
 Compiler drivers, programs, libgccjit and support libraries
@@ -24,11 +24,11 @@ Compiler drivers, programs, libgccjit and support libraries
 
 The libgccjit library can, despite the name, be used both for
 ahead-of-time compilation and for just-in-compilation.  In both
-cases it can be used to translate input representations (such as
-source code) in the application context; in the latter case the
+cases, it can be used to translate input representations (such as
+source code) in the application context; in the latter case, the
 generated code is also run in the application context.
 
-Limitations that apply to the compiler driver, apply here too in
+Limitations that apply to the compiler driver apply here too in
 terms of trusting inputs and it is recommended that both the
 compilation *and* execution context of the code are appropriately
 sandboxed to contain the effects of any bugs in libgccjit, the
@@ -43,7 +43,7 @@ Compiler drivers, programs, libgccjit and support libraries
 
 Libraries such as zlib that are bundled with GCC to build it will be
 treated the same as the compiler drivers and programs as far as
-security coverage is concerned.  However if you find an issue in
+security coverage is concerned.  However, if you find an issue in
 these libraries independent of their use in GCC, you should reach
 out to their upstream projects to report them.
 
@@ -97,7 +97,7 @@ Language runtime libraries
 * libssp
 * libstdc++
 
-These libraries are intended to be used in arbitrary contexts and as
+These libraries are intended to be used in arbitrary contexts and, as
 a result, bugs in these libraries may be evaluated for security
 impact.  However, some of these libraries, e.g. libgo, libphobos,
 etc.  are not maintained in the GCC project, due to which the GCC
@@ -145,7 +145,7 @@ GCC plugins
 
 It should be noted that GCC may execute arbitrary code loaded by a
 user through the GCC plugin mechanism or through system preloading
-mechanism.  Such custom code should be vetted by the user for safety
+mechanism.  Such custom code should be vetted by the user for safety,
 as bugs exposed through such code will not be considered security
 issues.
 
-- 
2.41.0

Re: [PATCH v2] Add a GCC Security policy

2023-10-04 Thread Siddhesh Poyarekar





On 2023-10-04 11:49, Alexander Monakov wrote:


On Thu, 28 Sep 2023, Siddhesh Poyarekar wrote:


Define a security process and exclusions to security issues for GCC and
all components it ships.


Some typos and wording suggestions below.


I've incorporated all your and David's suggestions and pushed it.  Thank 
you for iterating with me on this!


Sid




--- /dev/null
+++ b/SECURITY.txt
@@ -0,0 +1,205 @@
+What is a GCC security bug?
+===
+
+A security bug is one that threatens the security of a system or
+network, or might compromise the security of data stored on it.
+In the context of GCC there are multiple ways in which this might
+happen and some common scenarios are detailed below.
+
+If you're reporting a security issue and feel like it does not fit
+into any of the descriptions below, you're encouraged to reach out
+through the GCC bugzilla or if needed, privately by following the
+instructions in the last two sections of this document.
+
+Compiler drivers, programs, libgccjit and support libraries
+---
+
+The compiler driver processes source code, invokes other programs
+such as the assembler and linker and generates the output result,
+which may be assembly code or machine code.  Compiling untrusted
+sources can result in arbitrary code execution and unconstrained
+resource consumption in the compiler. As a result, compilation of
+such code should be done inside a sandboxed environment to ensure
+that it does not compromise the development environment.


"... the host environment" seems more appropriate.


+
+The libgccjit library can, despite the name, be used both for
+ahead-of-time compilation and for just-in-compilation.  In both
+cases it can be used to translate input representations (such as
+source code) in the application context; in the latter case the
+generated code is also run in the application context.
+
+Limitations that apply to the compiler driver, apply here too in
+terms of sanitizing inputs and it is recommended that both the


s/sanitizing inputs/trusting inputs/ (I suggested it earlier, just unsure
if you don't agree or it simply fell through the cracks)


+compilation *and* execution context of the code are appropriately
+sandboxed to contain the effects of any bugs in libgccjit, the
+application code using it, or its generated code to the sandboxed
+environment.
+
+Libraries such as libiberty, libcc1 and libcpp are not distributed
+for runtime support and have similar challenges to compiler drivers.
+While they are expected to be robust against arbitrary input, they
+should only be used with trusted inputs when linked into the
+compiler.
+
+Libraries such as zlib that bundled into GCC to build it will be


'are bundled with' (missing 'are', s/into/with/)


+treated the same as the compiler drivers and programs as far as
+security coverage is concerned.  However if you find an issue in
+these libraries independent of their use in GCC, you should reach
+out to their upstream projects to report them.
+
+As a result, the only case for a potential security issue in the
+compiler is when it generates vulnerable application code for
+trusted input source code that is conforming to the relevant
+programming standard or extensions documented as supported by GCC
+and the algorithm expressed in the source code does not have the
+vulnerability.  The output application code could be considered
+vulnerable if it produces an actual vulnerability in the target
+application, specifically in the following cases:


It seems ambiguous if the list that follows is meant to be an exhaustive
enumeration. I think it is meant to give examples without covering all
possibilities; if that's the case, I would suggest

s/specifically in the following cases/for example/

If I misunderstood and the list is really meant to be exhaustive,
it would be nice to make that clear and perhaps refer the reader
to the second paragraph when their scenario does not fit.


+
+- The application dereferences an invalid memory location despite
+  the application sources being valid.
+- The application reads from or writes to a valid but incorrect
+  memory location, resulting in an information integrity issue or an
+  information leak.
+- The application ends up running in an infinite loop or with
+  severe degradation in performance despite the input sources having
+  no such issue, resulting in a Denial of Service.  Note that
+  correct but non-performant code is not a security issue candidate,
+  this only applies to incorrect code that may result in performance
+  degradation severe enough to amount to a denial of service.
+- The application crashes due to the generated incorrect code,
+

[ping][PATCH v2] Add a GCC Security policy

2023-10-04 Thread Siddhesh Poyarekar


Ping!

On 2023-09-28 07:55, Siddhesh Poyarekar wrote:

Define a security process and exclusions to security issues for GCC and
all components it ships.

Signed-off-by: Siddhesh Poyarekar 
---
  SECURITY.txt | 205 +++
  1 file changed, 205 insertions(+)
  create mode 100644 SECURITY.txt

diff --git a/SECURITY.txt b/SECURITY.txt
new file mode 100644
index 000..14cb31570d3
--- /dev/null
+++ b/SECURITY.txt
@@ -0,0 +1,205 @@
+What is a GCC security bug?
+===
+
+A security bug is one that threatens the security of a system or
+network, or might compromise the security of data stored on it.
+In the context of GCC there are multiple ways in which this might
+happen and some common scenarios are detailed below.
+
+If you're reporting a security issue and feel like it does not fit
+into any of the descriptions below, you're encouraged to reach out
+through the GCC bugzilla or if needed, privately by following the
+instructions in the last two sections of this document.
+
+Compiler drivers, programs, libgccjit and support libraries
+---
+
+The compiler driver processes source code, invokes other programs
+such as the assembler and linker and generates the output result,
+which may be assembly code or machine code.  Compiling untrusted
+sources can result in arbitrary code execution and unconstrained
+resource consumption in the compiler. As a result, compilation of
+such code should be done inside a sandboxed environment to ensure
+that it does not compromise the development environment.
+
+The libgccjit library can, despite the name, be used both for
+ahead-of-time compilation and for just-in-compilation.  In both
+cases it can be used to translate input representations (such as
+source code) in the application context; in the latter case the
+generated code is also run in the application context.
+
+Limitations that apply to the compiler driver, apply here too in
+terms of sanitizing inputs and it is recommended that both the
+compilation *and* execution context of the code are appropriately
+sandboxed to contain the effects of any bugs in libgccjit, the
+application code using it, or its generated code to the sandboxed
+environment.
+
+Libraries such as libiberty, libcc1 and libcpp are not distributed
+for runtime support and have similar challenges to compiler drivers.
+While they are expected to be robust against arbitrary input, they
+should only be used with trusted inputs when linked into the
+compiler.
+
+Libraries such as zlib that bundled into GCC to build it will be
+treated the same as the compiler drivers and programs as far as
+security coverage is concerned.  However if you find an issue in
+these libraries independent of their use in GCC, you should reach
+out to their upstream projects to report them.
+
+As a result, the only case for a potential security issue in the
+compiler is when it generates vulnerable application code for
+trusted input source code that is conforming to the relevant
+programming standard or extensions documented as supported by GCC
+and the algorithm expressed in the source code does not have the
+vulnerability.  The output application code could be considered
+vulnerable if it produces an actual vulnerability in the target
+application, specifically in the following cases:
+
+- The application dereferences an invalid memory location despite
+  the application sources being valid.
+- The application reads from or writes to a valid but incorrect
+  memory location, resulting in an information integrity issue or an
+  information leak.
+- The application ends up running in an infinite loop or with
+  severe degradation in performance despite the input sources having
+  no such issue, resulting in a Denial of Service.  Note that
+  correct but non-performant code is not a security issue candidate,
+  this only applies to incorrect code that may result in performance
+  degradation severe enough to amount to a denial of service.
+- The application crashes due to the generated incorrect code,
+  resulting in a Denial of Service.
+
+Language runtime libraries
+--
+
+GCC also builds and distributes libraries that are intended to be
+used widely to implement runtime support for various programming
+languages.  These include the following:
+
+* libada
+* libatomic
+* libbacktrace
+* libcc1
+* libcody
+* libcpp
+* libdecnumber
+* libffi
+* libgcc
+* libgfortran
+* libgm2
+* libgo
+* libgomp
+* libitm
+* libobjc
+* libphobos
+* libquadmath
+* libssp
+* libstdc++
+
+These libraries are intended to be used

Re: [PATCH v2] Add a GCC Security policy

2023-09-28 Thread Siddhesh Poyarekar


On 2023-09-28 12:55, Siddhesh Poyarekar wrote:

Define a security process and exclusions to security issues for GCC and
all components it ships.

Signed-off-by: Siddhesh Poyarekar 
---


Sorry I forgot to summarize changes since the previous version:

- Reworded the introduction so that it doesn't sound like we know *all* 
scenarios and also encourage reporters to reach out.


- Fixed up support and diagnostic libraries sections based on Jakub's 
feedback.



  SECURITY.txt | 205 +++
  1 file changed, 205 insertions(+)
  create mode 100644 SECURITY.txt

diff --git a/SECURITY.txt b/SECURITY.txt
new file mode 100644
index 000..14cb31570d3
--- /dev/null
+++ b/SECURITY.txt
@@ -0,0 +1,205 @@
+What is a GCC security bug?
+===
+
+A security bug is one that threatens the security of a system or
+network, or might compromise the security of data stored on it.
+In the context of GCC there are multiple ways in which this might
+happen and some common scenarios are detailed below.
+
+If you're reporting a security issue and feel like it does not fit
+into any of the descriptions below, you're encouraged to reach out
+through the GCC bugzilla or if needed, privately by following the
+instructions in the last two sections of this document.
+
+Compiler drivers, programs, libgccjit and support libraries
+---
+
+The compiler driver processes source code, invokes other programs
+such as the assembler and linker and generates the output result,
+which may be assembly code or machine code.  Compiling untrusted
+sources can result in arbitrary code execution and unconstrained
+resource consumption in the compiler. As a result, compilation of
+such code should be done inside a sandboxed environment to ensure
+that it does not compromise the development environment.
+
+The libgccjit library can, despite the name, be used both for
+ahead-of-time compilation and for just-in-compilation.  In both
+cases it can be used to translate input representations (such as
+source code) in the application context; in the latter case the
+generated code is also run in the application context.
+
+Limitations that apply to the compiler driver, apply here too in
+terms of sanitizing inputs and it is recommended that both the
+compilation *and* execution context of the code are appropriately
+sandboxed to contain the effects of any bugs in libgccjit, the
+application code using it, or its generated code to the sandboxed
+environment.
+
+Libraries such as libiberty, libcc1 and libcpp are not distributed
+for runtime support and have similar challenges to compiler drivers.
+While they are expected to be robust against arbitrary input, they
+should only be used with trusted inputs when linked into the
+compiler.
+
+Libraries such as zlib that bundled into GCC to build it will be
+treated the same as the compiler drivers and programs as far as
+security coverage is concerned.  However if you find an issue in
+these libraries independent of their use in GCC, you should reach
+out to their upstream projects to report them.
+
+As a result, the only case for a potential security issue in the
+compiler is when it generates vulnerable application code for
+trusted input source code that is conforming to the relevant
+programming standard or extensions documented as supported by GCC
+and the algorithm expressed in the source code does not have the
+vulnerability.  The output application code could be considered
+vulnerable if it produces an actual vulnerability in the target
+application, specifically in the following cases:
+
+- The application dereferences an invalid memory location despite
+  the application sources being valid.
+- The application reads from or writes to a valid but incorrect
+  memory location, resulting in an information integrity issue or an
+  information leak.
+- The application ends up running in an infinite loop or with
+  severe degradation in performance despite the input sources having
+  no such issue, resulting in a Denial of Service.  Note that
+  correct but non-performant code is not a security issue candidate,
+  this only applies to incorrect code that may result in performance
+  degradation severe enough to amount to a denial of service.
+- The application crashes due to the generated incorrect code,
+  resulting in a Denial of Service.
+
+Language runtime libraries
+--
+
+GCC also builds and distributes libraries that are intended to be
+used widely to implement runtime support for various programming
+languages.  These include the following:
+
+* libada
+* libatomic
+* libbacktrace
+* libcc1
+* libcody

[PATCH v2] Add a GCC Security policy

2023-09-28 Thread Siddhesh Poyarekar

Define a security process and exclusions to security issues for GCC and
all components it ships.

Signed-off-by: Siddhesh Poyarekar 
---
 SECURITY.txt | 205 +++
 1 file changed, 205 insertions(+)
 create mode 100644 SECURITY.txt

diff --git a/SECURITY.txt b/SECURITY.txt
new file mode 100644
index 000..14cb31570d3
--- /dev/null
+++ b/SECURITY.txt
@@ -0,0 +1,205 @@
+What is a GCC security bug?
+===
+
+A security bug is one that threatens the security of a system or
+network, or might compromise the security of data stored on it.
+In the context of GCC there are multiple ways in which this might
+happen and some common scenarios are detailed below.
+
+If you're reporting a security issue and feel like it does not fit
+into any of the descriptions below, you're encouraged to reach out
+through the GCC bugzilla or if needed, privately by following the
+instructions in the last two sections of this document.
+
+Compiler drivers, programs, libgccjit and support libraries
+---
+
+The compiler driver processes source code, invokes other programs
+such as the assembler and linker and generates the output result,
+which may be assembly code or machine code.  Compiling untrusted
+sources can result in arbitrary code execution and unconstrained
+resource consumption in the compiler. As a result, compilation of
+such code should be done inside a sandboxed environment to ensure
+that it does not compromise the development environment.
+
+The libgccjit library can, despite the name, be used both for
+ahead-of-time compilation and for just-in-compilation.  In both
+cases it can be used to translate input representations (such as
+source code) in the application context; in the latter case the
+generated code is also run in the application context.
+
+Limitations that apply to the compiler driver, apply here too in
+terms of sanitizing inputs and it is recommended that both the
+compilation *and* execution context of the code are appropriately
+sandboxed to contain the effects of any bugs in libgccjit, the
+application code using it, or its generated code to the sandboxed
+environment.
+
+Libraries such as libiberty, libcc1 and libcpp are not distributed
+for runtime support and have similar challenges to compiler drivers.
+While they are expected to be robust against arbitrary input, they
+should only be used with trusted inputs when linked into the
+compiler.
+
+Libraries such as zlib that bundled into GCC to build it will be
+treated the same as the compiler drivers and programs as far as
+security coverage is concerned.  However if you find an issue in
+these libraries independent of their use in GCC, you should reach
+out to their upstream projects to report them.
+
+As a result, the only case for a potential security issue in the
+compiler is when it generates vulnerable application code for
+trusted input source code that is conforming to the relevant
+programming standard or extensions documented as supported by GCC
+and the algorithm expressed in the source code does not have the
+vulnerability.  The output application code could be considered
+vulnerable if it produces an actual vulnerability in the target
+application, specifically in the following cases:
+
+- The application dereferences an invalid memory location despite
+  the application sources being valid.
+- The application reads from or writes to a valid but incorrect
+  memory location, resulting in an information integrity issue or an
+  information leak.
+- The application ends up running in an infinite loop or with
+  severe degradation in performance despite the input sources having
+  no such issue, resulting in a Denial of Service.  Note that
+  correct but non-performant code is not a security issue candidate,
+  this only applies to incorrect code that may result in performance
+  degradation severe enough to amount to a denial of service.
+- The application crashes due to the generated incorrect code,
+  resulting in a Denial of Service.
+
+Language runtime libraries
+--
+
+GCC also builds and distributes libraries that are intended to be
+used widely to implement runtime support for various programming
+languages.  These include the following:
+
+* libada
+* libatomic
+* libbacktrace
+* libcc1
+* libcody
+* libcpp
+* libdecnumber
+* libffi
+* libgcc
+* libgfortran
+* libgm2
+* libgo
+* libgomp
+* libitm
+* libobjc
+* libphobos
+* libquadmath
+* libssp
+* libstdc++
+
+These libraries are intended to be used in arbitrary contexts and as
+a result, bugs in these libraries

Re: [PATCH] Add a GCC Security policy


On 2023-09-20 08:29, Jakub Jelinek wrote:

I just noticed (ENOCOFFEE) that the line (after removing libvtv) is:

 Support libraries such as libiberty, libcc1 and libcpp have been
 developed separately to share code with other tools such as binutils
 and gdb.

Does that address your concern Jakub?


I believe that is the case just for libiberty.
libcpp is I think solely used by gcc itself (several frontends use it
though, plus some build utilities in gcc).
libcc1 is code for gdb with gcc implementation details.


How about:

Libraries that are not distributed for runtime language support such as 
libiberty, libcc1 and libcpp have similar challenges to compiler 
drivers.  While they are expected to be robust against arbitrary input, 
they should only be used with trusted inputs.


Thanks,
Sid

Re: [PATCH] Add a GCC Security policy


On 2023-09-20 07:58, Siddhesh Poyarekar wrote:

On 2023-09-20 07:55, Jakub Jelinek wrote:

On Wed, Sep 20, 2023 at 07:50:43AM -0400, Siddhesh Poyarekar wrote:

+    Support libraries such as libiberty, libcc1 libvtv and libcpp have


Missing comma before libvtv.  But more importantly, libvtv is not
support library like libiberty, libcpp, it is more like the sanitizer
libraries runtime library for -fvtable-verify= .


Ack, I'll move libvtv out.


And, libcc1 also isn't a compiler support library, but support library
for a GDB plugin.



Isn't that like libiberty then, which also gets used by other toolchain 
projects?  Maybe calling it "Toolchain support libraries" would make it 
more explicit?


I just noticed (ENOCOFFEE) that the line (after removing libvtv) is:

Support libraries such as libiberty, libcc1 and libcpp have been
developed separately to share code with other tools such as binutils
and gdb.

Does that address your concern Jakub?

Thanks,
Sid

Re: [PATCH] Add a GCC Security policy


On 2023-09-20 07:55, Jakub Jelinek wrote:

On Wed, Sep 20, 2023 at 07:50:43AM -0400, Siddhesh Poyarekar wrote:

+Support libraries such as libiberty, libcc1 libvtv and libcpp have


Missing comma before libvtv.  But more importantly, libvtv is not
support library like libiberty, libcpp, it is more like the sanitizer
libraries runtime library for -fvtable-verify= .


Ack, I'll move libvtv out.


And, libcc1 also isn't a compiler support library, but support library
for a GDB plugin.



Isn't that like libiberty then, which also gets used by other toolchain 
projects?  Maybe calling it "Toolchain support libraries" would make it 
more explicit?


Thanks,
Sid

[PATCH] Add a GCC Security policy

Define a security process and exclusions to security issues for GCC and
all components it ships.

Signed-off-by: Siddhesh Poyarekar 
---

Sending as a proper patch since there have been no further comments on
the RFC.  I toyed with the idea of making the distinction of
"exploitable vulnerability" vs "missed hardening" more explicit near the
top of the document but decided against further tinkering in the end
since we already have a proper section dealing with it.  Instead I made
the language in the hardening section a bit more explicit, clarifying
that missed hardening is not an *exploitable vulnerability*, which
hopefully resolves the contradication of a bug in a security feature not
being a security bug.

I also added the AdaCore security contact at Arnaud's request.

Thanks,
Sid

 SECURITY.txt | 202 +++
 1 file changed, 202 insertions(+)
 create mode 100644 SECURITY.txt

diff --git a/SECURITY.txt b/SECURITY.txt
new file mode 100644
index 000..d2161f03bf5
--- /dev/null
+++ b/SECURITY.txt
@@ -0,0 +1,202 @@
+What is a GCC security bug?
+===
+
+A security bug is one that threatens the security of a system or
+network, or might compromise the security of data stored on it.
+In the context of GCC there are multiple ways in which this might
+happen and they're detailed below.
+
+Compiler drivers, programs, libgccjit and support libraries
+---
+
+The compiler driver processes source code, invokes other programs
+such as the assembler and linker and generates the output result,
+which may be assembly code or machine code.  Compiling untrusted
+sources can result in arbitrary code execution and unconstrained
+resource consumption in the compiler. As a result, compilation of
+such code should be done inside a sandboxed environment to ensure
+that it does not compromise the development environment.
+
+The libgccjit library can, despite the name, be used both for
+ahead-of-time compilation and for just-in-compilation.  In both
+cases it can be used to translate input representations (such as
+source code) in the application context; in the latter case the
+generated code is also run in the application context.
+
+Limitations that apply to the compiler driver, apply here too in
+terms of sanitizing inputs and it is recommended that both the
+compilation *and* execution context of the code are appropriately
+sandboxed to contain the effects of any bugs in libgccjit, the
+application code using it, or its generated code to the sandboxed
+environment.
+
+Support libraries such as libiberty, libcc1 libvtv and libcpp have
+been developed separately to share code with other tools such as
+binutils and gdb.  These libraries again have similar challenges to
+compiler drivers.  While they are expected to be robust against
+arbitrary input, they should only be used with trusted inputs.
+
+Libraries such as zlib that bundled into GCC to build it will be
+treated the same as the compiler drivers and programs as far as
+security coverage is concerned.  However if you find an issue in
+these libraries independent of their use in GCC, you should reach
+out to their upstream projects to report them.
+
+As a result, the only case for a potential security issue in the
+compiler is when it generates vulnerable application code for
+trusted input source code that is conforming to the relevant
+programming standard or extensions documented as supported by GCC
+and the algorithm expressed in the source code does not have the
+vulnerability.  The output application code could be considered
+vulnerable if it produces an actual vulnerability in the target
+application, specifically in the following cases:
+
+- The application dereferences an invalid memory location despite
+  the application sources being valid.
+- The application reads from or writes to a valid but incorrect
+  memory location, resulting in an information integrity issue or an
+  information leak.
+- The application ends up running in an infinite loop or with
+  severe degradation in performance despite the input sources having
+  no such issue, resulting in a Denial of Service.  Note that
+  correct but non-performant code is not a security issue candidate,
+  this only applies to incorrect code that may result in performance
+  degradation severe enough to amount to a denial of service.
+- The application crashes due to the generated incorrect code,
+  resulting in a Denial of Service.
+
+Language runtime libraries
+--
+
+GCC also builds and distributes libraries that are intended to be
+used widely to implement runtime support for various programming
+languages.  These i

Re: [PATCH 00/19] aarch64: Fix -fstack-protector issue

2023-09-12 Thread Siddhesh Poyarekar


On 2023-09-12 11:25, Richard Sandiford via Gcc-patches wrote:

This series of patches fixes deficiencies in GCC's -fstack-protector
implementation for AArch64 when using dynamically allocated stack space.
This is CVE-2023-4039.  See:



While this is a legitimate missed hardening, I'm not sure if this 
qualifies as a CVE-worthy vulnerability since correct programs won't 
actually be exploitable due to this.  This is essentially the kind of 
thing that the "Security features implemented in GCC" section in the 
proposed security policy[1] describes.


Thanks,
Sid

[1] 
https://inbox.sourceware.org/gcc-patches/ba133293-a7e8-8fe4-e1ba-7129b9e10...@gotplt.org/

Re: [RFC] GCC Security policy

2023-09-06 Thread Siddhesh Poyarekar


Hello folks,

Here's v3 of the top part of the security policy.  Hopefully this 
addresses all concerns raised so far.


Thanks,
Sid


What is a GCC security bug?
===

A security bug is one that threatens the security of a system or
network, or might compromise the security of data stored on it.
In the context of GCC there are multiple ways in which this might
happen and they're detailed below.

Compiler drivers, programs, libgccjit and support libraries
---

The compiler driver processes source code, invokes other programs
such as the assembler and linker and generates the output result,
which may be assembly code or machine code.  Compiling untrusted
sources can result in arbitrary code execution and unconstrained
resource consumption in the compiler. As a result, compilation of
such code should be done inside a sandboxed environment to ensure
that it does not compromise the development environment.

The libgccjit library can, despite the name, be used both for
ahead-of-time compilation and for just-in-compilation.  In both
cases it can be used to translate input representations (such as
source code) in the application context; in the latter case the
generated code is also run in the application context.

Limitations that apply to the compiler driver, apply here too in
terms of sanitizing inputs and it is recommended that both the
compilation *and* execution context of the code are appropriately
sandboxed to contain the effects of any bugs in libgccjit, the
application code using it, or its generated code to the sandboxed
environment.

Support libraries such as libiberty, libcc1 libvtv and libcpp have
been developed separately to share code with other tools such as
binutils and gdb.  These libraries again have similar challenges to
compiler drivers.  While they are expected to be robust against
arbitrary input, they should only be used with trusted inputs.

Libraries such as zlib that bundled into GCC to build it will be
treated the same as the compiler drivers and programs as far as
security coverage is concerned.  However if you find an issue in
these libraries independent of their use in GCC, you should reach
out to their upstream projects to report them.

As a result, the only case for a potential security issue in the
compiler is when it generates vulnerable application code for
trusted input source code that is conforming to the relevant
programming standard or extensions documented as supported by GCC
and the algorithm expressed in the source code does not have the
vulnerability.  The output application code could be considered
vulnerable if it produces an actual vulnerability in the target
application, specifically in the following cases:

- The application dereferences an invalid memory location despite
  the application sources being valid.
- The application reads from or writes to a valid but incorrect
  memory location, resulting in an information integrity issue or an
  information leak.
- The application ends up running in an infinite loop or with
  severe degradation in performance despite the input sources having
  no such issue, resulting in a Denial of Service.  Note that
  correct but non-performant code is not a security issue candidate,
  this only applies to incorrect code that may result in performance
  degradation severe enough to amount to a denial of service.
- The application crashes due to the generated incorrect code,
  resulting in a Denial of Service.

Language runtime libraries
--

GCC also builds and distributes libraries that are intended to be
used widely to implement runtime support for various programming
languages.  These include the following:

* libada
* libatomic
* libbacktrace
* libcc1
* libcody
* libcpp
* libdecnumber
* libffi
* libgcc
* libgfortran
* libgm2
* libgo
* libgomp
* libiberty
* libitm
* libobjc
* libphobos
* libquadmath
* libsanitizer
* libssp
* libstdc++

These libraries are intended to be used in arbitrary contexts and as
a result, bugs in these libraries may be evaluated for security
impact.  However, some of these libraries, e.g. libgo, libphobos,
etc.  are not maintained in the GCC project, due to which the GCC
project may not be the correct point of contact for them.  You are
encouraged to look at README files within those library directories
to locate the canonical security contact point for those projects
and include them in the report.  Once the issue is fixed in the
upstream project, the fix will be synced into GCC in a future
release.

Most security vulnerabilities in these runtime

Re: Another bug for __builtin_object_size? (Or expected behavior)


On 2023-08-17 17:25, Qing Zhao wrote:

It's not exactly the same issue, the earlier discussion was about choosing sizes in the 
same pass while the current one is about choosing between passes, but I agree it 
"rhymes".  This is what I was alluding to originally (for OST_MINIMUM use 
MIN_EXPR if both passes returned a pass) but I haven't thought about it hard enough to be 
100% confident that it's the better solution, especially for OST_MAXIMUM.


We have two different sources to get SIZE information for the subobject:

1. From the TYPESIZE information embedded in the IR;
2. From the initialization information propagated from data flow, this includes 
both malloc call and the DECL_INIT.

We need to choose between these two when both available, (these two information 
could be
  in the same pass as we discussed before, or in different passes which is 
shown in this discussion).

I think that the MIN_EXPR might be the right choice (especially for 
OST_MAXIMUM) -:)


It's worth a shot I guess.  We could emit something like the following 
in early_object_sizes_execute_one:


  sz = (__bos(o->sub, ost) == unknown
? early_size
: MIN_EXPR (__bos(o->sub, ost), early_size));

and see if it sticks.

Thanks,
Sid

Re: Another bug for __builtin_object_size? (Or expected behavior)


On 2023-08-17 16:23, Qing Zhao wrote:

Then, I think whatever MIN or MAX, the early phase has more precise information 
than the later phase, we should use its result if it’s NOT UNKNOWN?


We can't be sure about that though, can we?  For example for something like 
this:

struct S
{
  int a;
  char b[10];
  int c;
};

size_t foo (struct S *s)
{
  return __builtin_object_size (s->b, 1);
}

size_t bar ()
{
  struct S *in = malloc (8);

  return foo (in);
}

returns 10 for __builtin_object_size in early_objsz but when it sees the malloc 
in the later objsz pass, it returns 4:

$ gcc/cc1 -fdump-tree-objsz-details -quiet -o - -O bug.c
...
foo:
.LFB0:
.cfi_startproc
movl$10, %eax
ret
.cfi_endproc
...
bar:
.LFB1:
.cfi_startproc
movl$4, %eax
ret
.cfi_endproc
...

In fact, this ends up returning the wrong result for OST_MINIMUM:

$ gcc/cc1 -fdump-tree-objsz-details -quiet -o - -O bug.c
...
foo:
.LFB0:
.cfi_startproc
movl$10, %eax
ret
.cfi_endproc
...
bar:
.LFB1:
.cfi_startproc
movl$10, %eax
ret
.cfi_endproc
...

bar ought to have returned 4 too (and I'm betting the later objsz must have 
seen that) but it got overridden by the earlier estimate of 10.


Okay, I see.

Then is this the similar issue we discussed previously?  (As following:)

"

Hi, Sid and Jakub,
I have a question in the following source portion of the routine 
“addr_object_size” of gcc/tree-object-size.cc:
  743   bytes = compute_object_offset (TREE_OPERAND (ptr, 0), var);
  744   if (bytes != error_mark_node)
  745 {
  746   bytes = size_for_offset (var_size, bytes);
  747   if (var != pt_var && pt_var_size && TREE_CODE (pt_var) == 
MEM_REF)
  748 {
  749   tree bytes2 = compute_object_offset (TREE_OPERAND (ptr, 0),
  750pt_var);
  751   if (bytes2 != error_mark_node)
  752 {
  753   bytes2 = size_for_offset (pt_var_size, bytes2);
  754   bytes = size_binop (MIN_EXPR, bytes, bytes2);
  755 }
  756 }
  757 }
At line 754, why we always use “MIN_EXPR” whenever it’s for OST_MINIMUM or not?
Shall we use
(object_size_type & OST_MINIMUM
 ? MIN_EXPR : MAX_EXPR)


That MIN_EXPR is not for OST_MINIMUM.  It is to cater for allocations like this:

typedef struct
{
  int a;
} A;

size_t f()
{
  A *p = malloc (1);

  return __builtin_object_size (p, 0);
}

where the returned size should be 1 and not sizeof (int).  The mode doesn't 
really matter in this case.
“

If this is the same issue, I think we can use the same solution: always use 
MIN_EXPR,
What do you think?


It's not exactly the same issue, the earlier discussion was about 
choosing sizes in the same pass while the current one is about choosing 
between passes, but I agree it "rhymes".  This is what I was alluding to 
originally (for OST_MINIMUM use MIN_EXPR if both passes returned a pass) 
but I haven't thought about it hard enough to be 100% confident that 
it's the better solution, especially for OST_MAXIMUM.


Thanks,
Sid

Re: Another bug for __builtin_object_size? (Or expected behavior)


On 2023-08-17 15:27, Qing Zhao wrote:

Yes, that's it.  Maybe it's more correct if instead of MAX_EXPR if for 
OST_MINIMUM we stick with the early_objsz answer if it's non-zero.  I'm not 
sure if that's the case for maximum size though, my gut says it isn't.


So, the major purpose for adding the early object size phase is for computing 
SUBobjects size more precisely before the subobject information lost?


I suppose it's more about being able to do it at all, rather than precision.


Then, I think whatever MIN or MAX, the early phase has more precise information 
than the later phase, we should use its result if it’s NOT UNKNOWN?


We can't be sure about that though, can we?  For example for something 
like this:


struct S
{
  int a;
  char b[10];
  int c;
};

size_t foo (struct S *s)
{
  return __builtin_object_size (s->b, 1);
}

size_t bar ()
{
  struct S *in = malloc (8);

  return foo (in);
}

returns 10 for __builtin_object_size in early_objsz but when it sees the 
malloc in the later objsz pass, it returns 4:


$ gcc/cc1 -fdump-tree-objsz-details -quiet -o - -O bug.c
...
foo:
.LFB0:
.cfi_startproc
movl$10, %eax
ret
.cfi_endproc
...
bar:
.LFB1:
.cfi_startproc
movl$4, %eax
ret
.cfi_endproc
...

In fact, this ends up returning the wrong result for OST_MINIMUM:

$ gcc/cc1 -fdump-tree-objsz-details -quiet -o - -O bug.c
...
foo:
.LFB0:
.cfi_startproc
movl$10, %eax
ret
.cfi_endproc
...
bar:
.LFB1:
.cfi_startproc
movl$10, %eax
ret
.cfi_endproc
...

bar ought to have returned 4 too (and I'm betting the later objsz must 
have seen that) but it got overridden by the earlier estimate of 10.


We probably need smarter heuristics on choosing between the estimate of 
the early_objsz and late objsz passes each by itself isn't good enough 
for subobjects.


Thanks,
Sid

Re: Another bug for __builtin_object_size? (Or expected behavior)


On 2023-08-17 09:58, Qing Zhao wrote:

So this is a (sort of) known issue, which necessitated the early_objsz pass to 
get an estimate before a subobject reference was optimized to a MEM_REF.


Do you mean that after a subobject reference was optimized to a MEM_REF, there 
is no way to compute the size of the subobject anymore?


Yes, in cases where the TYPE_SIZE is lost and there's no other 
allocation information to fall back on.



  However it looks like the MIN/MAX hack doesn't work in this case for 
OST_MINIMUM; it should probably get the minimum of the two passes if both 
passes were successful, or only the result of the pass that was successful.


You mean that the following line:
2053   enum tree_code code = object_size_type & OST_MINIMUM ? MAX_EXPR : 
MIN_EXPR;
Might need to be changed to:
2053   enum tree_code code =  MIN_EXPR;


Yes, that's it.  Maybe it's more correct if instead of MAX_EXPR if for 
OST_MINIMUM we stick with the early_objsz answer if it's non-zero.  I'm 
not sure if that's the case for maximum size though, my gut says it isn't.


Thanks,
Sid

Re: Another bug for __builtin_object_size? (Or expected behavior)


On 2023-08-16 11:59, Qing Zhao wrote:

Jakub and Sid,

During my study, I found an interesting behavior for the following small 
testing case:

#include 
#include 

struct fixed {
   size_t foo;
   char b;
   char array[10];
} q = {};

#define noinline __attribute__((__noinline__))

static void noinline bar ()
{
   struct fixed *p = 

   printf("the__bos of MAX p->array sub is %d \n", 
__builtin_object_size(p->array, 1));
   printf("the__bos of MIN p->array sub is %d \n", 
__builtin_object_size(p->array, 3));

   return;
}

int main ()
{
   bar ();
   return 0;
}
[opc@qinzhao-aarch64-ol8 108896]$ sh t
/home/opc/Install/latest-d/bin/gcc -O -fstrict-flex-arrays=3 t2.c
the__bos of MAX p->array sub is 10
the__bos of MIN p->array sub is 15

I assume that the Minimum size in the sub-object should be 10 too (i.e 
__builtin_object_size(p->array, 3) should be 10 too).

So, first question: Is this correct or wrong behavior for 
__builtin_object_size(p->array, 3)?

The second question is, when I debugged into why 
__builtin_object_size(p->array, 3) returns 15 instead of 10, I observed the 
following:

1. In “early_objz” phase, The IR for p->array is:
(gdb) call debug_generic_expr(ptr)
_5->array

And the pt_var is:
(gdb) call debug_generic_expr(pt_var)
*p_5

As a result, the following condition in tree-object-size.cc:

  585   if (pt_var != TREE_OPERAND (ptr, 0))

Was satisfied, and then the algorithm for computing the SUBOBJECT was invoked 
and the size of the subobject 10 was used.

and then an MAX_EXPR was inserted after the __builtin_object_size call as:
   _3 = _5->array;
   _10 = __builtin_object_size (_3, 3);
   _4 = MAX_EXPR <_10, 10>;

Till now, everything looks fine.

2. within “ccp1” phase, when folding the call  to __builtin_object_size, the IR 
for the p-:>array is:
(gdb) call debug_generic_expr(ptr)
  [(void *) + 9B]

And the pt_var is:
(gdb) call debug_generic_expr(pt_var)
MEM  [(void *) + 9B]

As a result, the following condition in tree-object-size.cc:

  585   if (pt_var != TREE_OPERAND (ptr, 0))

Was NOT satisfied, therefore the algorithm for computing the SUBOBJECT was NOT 
invoked at all, as a result, the size in the whole object, 15, was used.

And then finally, MAX_EXPR (_10, 10) becomes MAX_EXPR (15, 10), 15 is the final 
result.

Based on the above, is there any issue with the current algorithm?


So this is a (sort of) known issue, which necessitated the early_objsz 
pass to get an estimate before a subobject reference was optimized to a 
MEM_REF.  However it looks like the MIN/MAX hack doesn't work in this 
case for OST_MINIMUM; it should probably get the minimum of the two 
passes if both passes were successful, or only the result of the pass 
that was successful.


Thanks,
Sid

Re: [RFC] GCC Security policy

2023-08-16 Thread Siddhesh Poyarekar


On 2023-08-16 11:06, Alexander Monakov wrote:

No I understood the distinction you're trying to make, I just wanted to point
out that the effect isn't all that different.  The intent of the wording is
not to prescribe a solution, but to describe what the compiler cannot do and
hence, users must find a way to do this.  I think we have a consensus on this
part of the wording though because we're not really responsible for the
prescription here and I'm happy with just asking users to sandbox.


Nice!


I suppose it's kinda like saying "don't try this at home".  You know many will
and some will break their leg while others will come out of it feeling
invincible.  Our job is to let them know that they will likely break their leg
:)


Continuing this analogy, I was protesting against doing our job by telling
users "when trying this at home, make sure to wear vibranium shielding"
while knowing for sure that nobody can, in fact, obtain said shielding,
making our statement not helpful and rather tautological.


:)


How about this in the last section titled "Security features implemented in
GCC", since that's where we also deal with security hardening.

 Similarly, GCC may transform code in a way that the correctness of
 the expressed algorithm is preserved but supplementary properties
 that are observable only outside the program or through a
 vulnerability in the program, may not be preserved.  This is not a
 security issue in GCC and in such cases, the vulnerability that
 caused exposure of the supplementary properties must be fixed.


Yeah, indicating scenarios that fall outside of intended guarantees should
be helpful. I feel the exact text quoted above will be hard to decipher
without knowing the discussion that led to it. Some sort of supplementary
section with examples might help there.


Ah, so I had started out by listing examples but dropped them before 
emailing.  How about:


Similarly, GCC may transform code in a way that the correctness of
the expressed algorithm is preserved but supplementary properties
that are observable only outside the program or through a
vulnerability in the program, may not be preserved.  Examples
of such supplementary properties could be the state of memory after
it is no longer in use, performance and timing characteristics of a
program, state of the CPU cache, etc. Such issues are not security
vulnerabilities in GCC and in such cases, the vulnerability that
caused exposure of the supplementary properties must be fixed.


In any case, I hope further discussion, clarification and wordsmithing
goes productively for you both here on the list and during the Cauldron.


Thanks!

Sid

Re: [RFC] GCC Security policy

2023-08-16 Thread Siddhesh Poyarekar


On 2023-08-15 19:07, Alexander Monakov wrote:


On Tue, 15 Aug 2023, Siddhesh Poyarekar wrote:


Thanks, this is nicer (see notes below). My main concern is that we
shouldn't pretend there's some method of verifying that arbitrary source
code is "safe" to pass to an unsandboxed compiler, nor should we push
the responsibility of doing that on users.


But responsibility would be pushed to users, wouldn't it?


Making users responsible for verifying that sources are "safe" is not okay
(we cannot teach them how to do that since there's no general method).
Making users responsible for sandboxing the compiler is fine (there's
a range of sandboxing solutions, from which they can choose according
to their requirements and threat model). Sorry about the ambiguity.


No I understood the distinction you're trying to make, I just wanted to 
point out that the effect isn't all that different.  The intent of the 
wording is not to prescribe a solution, but to describe what the 
compiler cannot do and hence, users must find a way to do this.  I think 
we have a consensus on this part of the wording though because we're not 
really responsible for the prescription here and I'm happy with just 
asking users to sandbox.


I suppose it's kinda like saying "don't try this at home".  You know 
many will and some will break their leg while others will come out of it 
feeling invincible.  Our job is to let them know that they will likely 
break their leg :)



inside a sandboxed environment to ensure that it does not compromise the
development environment.  Note that this still does not guarantee safety of
the produced output programs and that such programs should still either be
analyzed thoroughly for safety or run only inside a sandbox or an isolated
system to avoid compromising the execution environment.


The last statement seems to be a new addition. It is too broad and again
makes a reference to analysis that appears quite theoretical. It might be
better to drop this (and instead talk in more specific terms about any
guarantees that produced binary code matches security properties intended
by the sources; I believe Richard Sandiford raised this previously).


OK, so I actually cover this at the end of the section; Richard's point AFAICT
was about hardening, which I added another note for to make it explicit that
missed hardening does not constitute a CVE-worthy threat:


Thanks for the reminder. To illustrate what I was talking about, let me give
two examples:

1) safety w.r.t timing attacks: even if the source code is written in
a manner that looks timing-safe, it might be transformed in a way that
mounting a timing attack on the resulting machine code is possible;

2) safety w.r.t information leaks: even if the source code attempts
to discard sensitive data (such as passwords and keys) immediately
after use, (partial) copies of that data may be left on stack and
in registers, to be leaked later via a different vulnerability.

For both 1) and 2), GCC is not engineered to respect such properties
during optimization and code generation, so it's not appropriate for such
tasks (a possible solution is to isolate such sensitive functions to
separate files, compile to assembly, inspect the assembly to check that it
still has the required properties, and use the inspected asm in subsequent
builds instead of the original high-level source).


How about this in the last section titled "Security features implemented 
in GCC", since that's where we also deal with security hardening.


Similarly, GCC may transform code in a way that the correctness of
the expressed algorithm is preserved but supplementary properties
that are observable only outside the program or through a
vulnerability in the program, may not be preserved.  This is not a
security issue in GCC and in such cases, the vulnerability that
caused exposure of the supplementary properties must be fixed.

Thanks,
Sid

Re: [RFC] GCC Security policy

2023-08-16 Thread Siddhesh Poyarekar


On 2023-08-16 04:25, Alexander Monakov wrote:


On Tue, 15 Aug 2023, David Malcolm via Gcc-patches wrote:


I'd prefer to reword this, as libgccjit was a poor choice of name for
the library (sorry!), to make it clearer it can be used for both ahead-
of-time and just-in-time compilation, and that as used for compilation,
the host considerations apply, not just those of the generated target
code.

How about:

  The libgccjit library can, despite the name, be used both for
  ahead-of-time compilation and for just-in-compilation.  In both
  cases it can be used to translate input representations (such as
  source code) in the application context; in the latter case the
  generated code is also run in the application context.
  Limitations that apply to the compiler driver, apply here too in
  terms of sanitizing inputs, so it is recommended that inputs are


Thanks David!



Unfortunately the lines that follow:


  either sanitized by an external program to allow only trusted,
  safe compilation and execution in the context of the application,


again make a reference to a purely theoretical "external program" that
is not going to exist in reality, and I made a fuss about that in another
subthread (sorry Siddhesh). We shouldn't speak as if this solution is
actually available to users.

I know this is not the main point of your email, but we came up with
a better wording for the compiler driver, and it would be good to align
this text with that.


How about:

The libgccjit library can, despite the name, be used both for
ahead-of-time compilation and for just-in-compilation.  In both
cases it can be used to translate input representations (such as
source code) in the application context; in the latter case the
generated code is also run in the application context.

Limitations that apply to the compiler driver, apply here too in
terms of sanitizing inputs and it is recommended that both the
compilation *and* execution context of the code are appropriately
sandboxed to contain the effects of any bugs in libgccjit, the
application code using it, or its generated code to the sandboxed
environment.

Re: [RFC] GCC Security policy

2023-08-15 Thread Siddhesh Poyarekar


On 2023-08-15 10:07, Alexander Monakov wrote:


On Tue, 15 Aug 2023, Siddhesh Poyarekar wrote:


Does this as the first paragraph address your concerns:


Thanks, this is nicer (see notes below). My main concern is that we shouldn't
pretend there's some method of verifying that arbitrary source code is "safe"
to pass to an unsandboxed compiler, nor should we push the responsibility of
doing that on users.


But responsibility would be pushed to users, wouldn't it?


The compiler driver processes source code, invokes other programs such as the
assembler and linker and generates the output result, which may be assembly
code or machine code.  It is necessary that all source code inputs to the
compiler are trusted, since it is impossible for the driver to validate input
source code for safety.


The statement begins with "It is necessary", but the next statement offers
an alternative in case the code is untrusted. This is a contradiction.
Is it necessary or not in the end?

I'd suggest to drop this statement and instead make a brief note that
compiling crafted/untrusted sources can result in arbitrary code execution
and unconstrained resource consumption in the compiler.


So:

The compiler driver processes source code, invokes other programs such 
as the assembler and linker and generates the output result, which may 
be assembly code or machine code.  Compiling untrusted sources can 
result in arbitrary code execution and unconstrained resource 
consumption in the compiler. As a result, compilation of such code 
should be done inside a sandboxed environment to ensure that it does not 
compromise the development environment.



For untrusted code should compilation should be done

  ^^
 typo (spurious 'should')


Ack, thanks.




inside a sandboxed environment to ensure that it does not compromise the
development environment.  Note that this still does not guarantee safety of
the produced output programs and that such programs should still either be
analyzed thoroughly for safety or run only inside a sandbox or an isolated
system to avoid compromising the execution environment.


The last statement seems to be a new addition. It is too broad and again
makes a reference to analysis that appears quite theoretical. It might be
better to drop this (and instead talk in more specific terms about any
guarantees that produced binary code matches security properties intended
by the sources; I believe Richard Sandiford raised this previously).


OK, so I actually cover this at the end of the section; Richard's point 
AFAICT was about hardening, which I added another note for to make it 
explicit that missed hardening does not constitute a CVE-worthy threat:


As a result, the only case for a potential security issue in the
compiler is when it generates vulnerable application code for
trusted input source code that is conforming to the relevant
programming standard or extensions documented as supported by GCC
and the algorithm expressed in the source code does not have the
vulnerability.  The output application code could be considered
vulnerable if it produces an actual vulnerability in the target
application, specifically in the following cases:

- The application dereferences an invalid memory location despite
  the application sources being valid.
- The application reads from or writes to a valid but incorrect
  memory location, resulting in an information integrity issue or an
  information leak.
- The application ends up running in an infinite loop or with
  severe degradation in performance despite the input sources having
  no such issue, resulting in a Denial of Service.  Note that
  correct but non-performant code is not a security issue candidate,
  this only applies to incorrect code that may result in performance
  degradation severe enough to amount to a denial of service.
- The application crashes due to the generated incorrect code,
  resulting in a Denial of Service.

Re: Is this a bug for __builtin_dynamic_object_size?

2023-08-15 Thread Siddhesh Poyarekar


On 2023-08-14 19:12, Qing Zhao wrote:

Hi, Sid,

For the following testing case:

#include 

#define noinline __attribute__((__noinline__))

static void noinline alloc_buf_more (int index)
{
   struct annotated {
 long foo;
 char b;
 char array[index];
 long c;
   } q, *p;

   p = 

   printf("the__bdos of p->array whole max is %d \n", 
__builtin_dynamic_object_size(p->array, 0));
   printf("the__bdos of p->array sub max is %d \n", 
__builtin_dynamic_object_size(p->array, 1));
   printf("the__bdos of p->array whole min is %d \n", 
__builtin_dynamic_object_size(p->array, 2));
   printf("the__bdos of p->array sub min is %d \n", 
__builtin_dynamic_object_size(p->array, 3));

   return;
}

int main ()
{
   alloc_buf_more (10);
   return 0;
}

If I compile it with the latest upstream gcc and run it:

/home/opc/Install/latest-d/bin/gcc -O t.c
the__bdos of p->array whole max is 23
the__bdos of p->array sub max is 23
the__bdos of p->array whole min is 23
the__bdos of p->array sub min is 23

In which__builtin_dynamic_object_size(p->array, 0) and 
__builtin_dynamic_object_size(p->array, 1) return the same size, this seems wrong 
to me.

There is one line in tree-object-size.cc might relate to this bug: (in the 
routine “addr_object_size”)

  603   if (! TYPE_SIZE_UNIT (TREE_TYPE (var))
  604   || ! tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (var)))
  605   || (pt_var_size && TREE_CODE (pt_var_size) == INTEGER_CST
  606   && tree_int_cst_lt (pt_var_size,
  607   TYPE_SIZE_UNIT (TREE_TYPE (var)
  608 var = pt_var;

I suspect that the above line 604 “ ! tree_fits_uhwi_p (TYPE_SIZE_UNIT 
(TREE_TYPE (var)))” relates to this bug, since the TYPESIZE of the VLA “array” 
is not a unsigned HOST_WIDE_INT, but we still can use its TYPESIZE for 
dynamic_object_size?

What do you think?


Thanks, yes that doesn't work.  I'm trying to revive the patch I had 
submitted earlier[1] in the year and fix this issue too in that process. 
 In general the subobject size computation doesn't handle variable 
sizes at all; it depends on whole object+offset to get size information, 
which ends up working only for flex arrays at the end of objects.


Sid

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608914.html

Re: [RFC] GCC Security policy

2023-08-15 Thread Siddhesh Poyarekar


On 2023-08-15 01:59, Alexander Monakov wrote:


On Mon, 14 Aug 2023, Siddhesh Poyarekar wrote:


There's no practical (programmatic) way to do such validation; it has to be a
manual audit, which is why source code passed to the compiler has to be
*trusted*.


No, I do not think that is a logical conclusion. What is the problem with
passing untrusted code to a sandboxed compiler?


Right, that's what we're essentially trying to convey in the security policy
text.  It doesn't go into mechanisms for securing execution (because that's
really beyond the scope of the *project's* policy IMO) but it states
unambiguously that input to the compiler must be trusted:

"""
   ... It is necessary that
 all source code inputs to the compiler are trusted, since it is
 impossible for the driver to validate input source code beyond
 conformance to a programming language standard...
"""


I see two issues with this. First, it reads as if people wishing to build
not-entirely-trusted sources need to seek some other compiler, as somehow
we seem to imply that sandboxing GCC is out of the question.

Second, I take issue with the last part of the quoted text (language
conformance): verifying standards conformance is also impossible
(consider UB that manifests only during linking or dynamic loading)
so GCC is only doing that on a best-effort basis with no guarantees.


Does this as the first paragraph address your concerns:

The compiler driver processes source code, invokes other programs such 
as the assembler and linker and generates the output result, which may 
be assembly code or machine code.  It is necessary that all source code 
inputs to the compiler are trusted, since it is impossible for the 
driver to validate input source code for safety.  For untrusted code 
should compilation should be done inside a sandboxed environment to 
ensure that it does not compromise the development environment.  Note 
that this still does not guarantee safety of the produced output 
programs and that such programs should still either be analyzed 
thoroughly for safety or run only inside a sandbox or an isolated system 
to avoid compromising the execution environment.


Thanks,
Sid

Re: [RFC] GCC Security policy

2023-08-14 Thread Siddhesh Poyarekar


On 2023-08-14 17:16, Alexander Monakov wrote:


On Mon, 14 Aug 2023, Siddhesh Poyarekar wrote:


1. It makes it clear to users of the project the scope in which the project
could be used and what safety it could reasonably expect from the project.  In
the context of GCC for example, it cannot expect the compiler to do a safety
check of untrusted sources; the compiler will consider #include "/etc/passwd"
just as valid code as #include  and as a result, the onus is on the
user environment to validate the input sources for safety.


Whoa, no. We shouldn't make such statements unless we are prepared to explain
to users how such validation can be practically implemented, which I'm sure
we cannot in this case, due to future extensions such as the #embed directive,
and ability to obfuscate filenames using the preprocessor.


There's no practical (programmatic) way to do such validation; it has to 
be a manual audit, which is why source code passed to the compiler has 
to be *trusted*.



I think it would be more honest to say that crafted sources can result in
arbitrary code execution with the privileges of the user invoking the compiler,
and hence the operator may want to ensure that no sensitive data is available
to that user (via measures ranging from plain UNIX permissions, to chroots,
to virtual machines, to air-gapped computers, depending on threat model).


Right, that's what we're essentially trying to convey in the security 
policy text.  It doesn't go into mechanisms for securing execution 
(because that's really beyond the scope of the *project's* policy IMO) 
but it states unambiguously that input to the compiler must be trusted:


"""
  ... It is necessary that
all source code inputs to the compiler are trusted, since it is
impossible for the driver to validate input source code beyond
conformance to a programming language standard...
"""


Resource consumption is another good reason to sandbox compilers.


Agreed, we make that specific recommendation in the context of libgccjit.

Thanks,
Sid

Re: [RFC] GCC Security policy

2023-08-14 Thread Siddhesh Poyarekar

On 2023-08-14 14:51, Richard Sandiford wrote:

I think it would help to clarify what the aim of the security policy is.
Specifically:

(1) What service do we want to provide to users by classifying one thing
as a security bug and another thing as not a security bug?

(2) What service do we want to provide to the GNU community by the same
classification?

I think it will be easier to agree on the classification if we first
agree on that.

I actually wanted to do a talk on this at the Cauldron this year and
*then* propose this for the gcc community, but I guess we could do this
early :)

So the core intent of a security policy for a project is to make clear
the security stance of the project, specifying to the extent possible
what kind of uses are considered safe and what kinds of bugs would be
considered security issues in the context of those uses.

There are a few advantages of doing this:

1. It makes it clear to users of the project the scope in which the
project could be used and what safety it could reasonably expect from
the project. In the context of GCC for example, it cannot expect the
compiler to do a safety check of untrusted sources; the compiler will
consider #include "/etc/passwd" just as valid code as #include
and as a result, the onus is on the user environment to validate the
input sources for safety.

2. It helps the security community (Mitre and other CNAs and security
researchers) set correct expectations of the project so that they don't
cry wolf for every segfault or ICE under the pretext that code could
presumably be run as a service somehow and hence result in a "DoS".

3. This in turn helps stave off spurious CVE submissions that cause
needless churn in downstream distributions. LLVM is already starting to
see this[1] and it's only a matter of time before people start doing
this for GCC.

4. It helps make a distinction between important bugs and security bugs;
they're often conflated as one and the same thing. Security bugs are
special because they require different handling from those that do not
have a security impact, regardless of their actual importance.
Unfortunately one of the reasons they're special is because there's a
bunch of (pretty dumb) automation out there that rings alarm bells on
every single CVE. Without a clear understanding of the context under
which a project can be used, these alarm bells can be made unreasonably
loud (due to incorrect scoring, see the LLVM CVE for instance; just one
element in that vector changes the score from 0.0 to 5.5), causing
needless churn in not just the code base but in downstream releases and
end user environments.

5. This exercise is also a great start in developing an understanding of
which parts in GCC are security sensitive and in what sense. Runtime
libraries for example have a direct impact on application security.
Compiler impact is a little less direct. Hardening features have
another effect, but it's more mitigation-oriented than direct safety.
This also informs us about the impact of various project actions such as
bundling third-party libraries and development and maintenance of
tooling within GCC and will hopefully guide policies around those practices.

I hope this is a sufficient start. We don't necessarily want to get
into the business of acknowledging or rejecting security issues as
upstream at the moment (but see also the CNA discussion[2] of what we
intend to do in that space for glibc) but having uniform upstream
guidelines would be helpful to researchers as well as downstream
consumers to help decide what constitutes a security issue.

Thanks,
Sid

[1] https://nvd.nist.gov/vuln/detail/CVE-2023-29932
[2]
https://inbox.sourceware.org/libc-alpha/1a44f25a-5aa3-28b7-1ecb-b3991d44c...@gotplt.org/T/

Re: [RFC] GCC Security policy

2023-08-14 Thread Siddhesh Poyarekar


Hi,

Here's the updated draft of the top part of the security policy with all 
of the recommendations incorporated.


Thanks,
Sid


What is a GCC security bug?
===

A security bug is one that threatens the security of a system or
network, or might compromise the security of data stored on it.
In the context of GCC there are multiple ways in which this might
happen and they're detailed below.

Compiler drivers, programs, libgccjit and support libraries
---

The compiler driver processes source code, invokes other programs
such as the assembler and linker and generates the output result,
which may be assembly code or machine code.  It is necessary that
all source code inputs to the compiler are trusted, since it is
impossible for the driver to validate input source code beyond
conformance to a programming language standard.

The GCC JIT implementation, libgccjit, is intended to be plugged
into applications to translate input source code in the application
context.  Limitations that apply to the compiler
driver, apply here too in terms of sanitizing inputs, so it is
recommended that inputs are either sanitized by an external program
to allow only trusted, safe execution in the context of the
application or the JIT execution context is appropriately sandboxed
to contain the effects of any bugs in the JIT or its generated code
to the sandboxed environment.

Support libraries such as libiberty, libcc1 libvtv and libcpp have
been developed separately to share code with other tools such as
binutils and gdb.  These libraries again have similar challenges to
compiler drivers.  While they are expected to be robust against
arbitrary input, they should only be used with trusted inputs.

Libraries such as zlib that bundled into GCC to build it will be
treated the same as the compiler drivers and programs as far as
security coverage is concerned.  However if you find an issue in
these libraries independent of their use in GCC, you should reach
out to their upstream projects to report them.

As a result, the only case for a potential security issue in all
these cases is when it ends up generating vulnerable output for
valid input source code.

As a result, the only case for a potential security issue in the
compiler is when it generates vulnerable application code for
trusted input source code that is conforming to the relevant
programming standard or extensions documented as supported by GCC
and the algorithm expressed in the source code does not have the
vulnerability.  The output application code could be considered
vulnerable if it produces an actual vulnerability in the target
application, specifically in the following cases:

- The application dereferences an invalid memory location despite
  the application sources being valid.
- The application reads from or writes to a valid but incorrect
  memory location, resulting in an information integrity issue or an
  information leak.
- The application ends up running in an infinite loop or with
  severe degradation in performance despite the input sources having
  no such issue, resulting in a Denial of Service.  Note that
  correct but non-performant code is not a security issue candidate,
  this only applies to incorrect code that may result in performance
  degradation severe enough to amount to a denial of service.
- The application crashes due to the generated incorrect code,
  resulting in a Denial of Service.

Language runtime libraries
--

GCC also builds and distributes libraries that are intended to be
used widely to implement runtime support for various programming
languages.  These include the following:

* libada
* libatomic
* libbacktrace
* libcc1
* libcody
* libcpp
* libdecnumber
* libffi
* libgcc
* libgfortran
* libgm2
* libgo
* libgomp
* libiberty
* libitm
* libobjc
* libphobos
* libquadmath
* libsanitizer
* libssp
* libstdc++

These libraries are intended to be used in arbitrary contexts and as
a result, bugs in these libraries may be evaluated for security
impact.  However, some of these libraries, e.g. libgo, libphobos,
etc.  are not maintained in the GCC project, due to which the GCC
project may not be the correct point of contact for them.  You are
encouraged to look at README files within those library directories
to locate the canonical security contact point for those projects
and include them in the report.  Once the issue is fixed in the
upstream project, the fix will be synced into GCC in a future
release.

Most security vulnerabilities in these runtime libraries arise when
an application

Re: [RFC] GCC Security policy

2023-08-11 Thread Siddhesh Poyarekar


On 2023-08-11 11:12, David Edelsohn wrote:
The text above states "bugs in these libraries may be evaluated for 
security impact", but there is no comment about the criteria for a 
security impact, unlike the GLIBC SECURITY.md document.  The text seems 
to imply the "What is a security bug?" definitions from GLIBC, but the 
definitions are not explicitly stated in the GCC Security policy.


Should this "Language runtime libraries" section include some of the 
GLIBC "What is a security bug?" text or should the GCC "What is a 
security bug?" section earlier in this document include the text with a 
qualification that issues like buffer overflow, memory leaks, 
information disclosure, etc. specifically apply to "Language runtime 
libraries" and not all components of GCC?


Yes, that makes sense.  This part will likely evolve though, much like 
the glibc one did, based on reports we get over time.  I'll work it in 
and post an updated draft.


Thanks,
Sid

Re: [RFC] GCC Security policy

2023-08-11 Thread Siddhesh Poyarekar


On 2023-08-11 11:09, Paul Koning wrote:




On Aug 11, 2023, at 10:36 AM, Siddhesh Poyarekar  wrote:

On 2023-08-10 14:50, Siddhesh Poyarekar wrote:

   As a result, the only case for a potential security issue in all
   these cases is when it ends up generating vulnerable output for
   valid input source code.


I think this leaves open the interpretation "every wrong code bug
is potentially a security bug".  I suppose that's true in a trite sense,
but not in a useful sense.  As others said earlier in the thread,
whether a wrong code bug in GCC leads to a security bug in the object
code is too application-dependent to be a useful classification for GCC.

I think we should explicitly say that we don't generally consider wrong
code bugs to be security bugs.  Leaving it implicit is bound to lead
to misunderstanding.

I see what you mean, but the context-dependence of a bug is something GCC will 
have to deal with, similar to how libraries have to deal with bugs.  But I 
agree this probably needs some more expansion.  Let me try and come up with 
something more detailed for that last paragraph.


How's this:

As a result, the only case for a potential security issue in the compiler is 
when it generates vulnerable application code for valid, trusted input source 
code.  The output application code could be considered vulnerable if it 
produces an actual vulnerability in the target application, specifically in the 
following cases:


You might make it explicit that we're talking about wrong code errors here -- 
in other words, the source code is correct (conforms to the standard) and the 
algorithm expressed in the source code does not have a vulnerability, but the 
generated code has semantics that differ from those of the source code such 
that it does have a vulnerability.


Ack, thanks for the suggestion.




- The application dereferences an invalid memory location despite the 
application sources being valid.

- The application reads from or writes to a valid but incorrect memory 
location, resulting in an information integrity issue or an information leak.

- The application ends up running in an infinite loop or with severe 
degradation in performance despite the input sources having no such issue, 
resulting in a Denial of Service.  Note that correct but non-performant code is 
not a security issue candidate, this only applies to incorrect code that may 
result in performance degradation.


The last sentence somewhat contradicts the preceding one.  Perhaps "...may result in 
performance degradation severe enough to amount to a denial of service".


Ack, will fix that up, thanks.

Sid

Re: [RFC] GCC Security policy

2023-08-11 Thread Siddhesh Poyarekar


On 2023-08-10 14:50, Siddhesh Poyarekar wrote:

  As a result, the only case for a potential security issue in all
  these cases is when it ends up generating vulnerable output for
  valid input source code.


I think this leaves open the interpretation "every wrong code bug
is potentially a security bug".  I suppose that's true in a trite sense,
but not in a useful sense.  As others said earlier in the thread,
whether a wrong code bug in GCC leads to a security bug in the object
code is too application-dependent to be a useful classification for GCC.

I think we should explicitly say that we don't generally consider wrong
code bugs to be security bugs.  Leaving it implicit is bound to lead
to misunderstanding.


I see what you mean, but the context-dependence of a bug is something 
GCC will have to deal with, similar to how libraries have to deal with 
bugs.  But I agree this probably needs some more expansion.  Let me try 
and come up with something more detailed for that last paragraph.


How's this:

As a result, the only case for a potential security issue in the 
compiler is when it generates vulnerable application code for valid, 
trusted input source code.  The output application code could be 
considered vulnerable if it produces an actual vulnerability in the 
target application, specifically in the following cases:


- The application dereferences an invalid memory location despite the 
application sources being valid.


- The application reads from or writes to a valid but incorrect memory 
location, resulting in an information integrity issue or an information 
leak.


- The application ends up running in an infinite loop or with severe 
degradation in performance despite the input sources having no such 
issue, resulting in a Denial of Service.  Note that correct but 
non-performant code is not a security issue candidate, this only applies 
to incorrect code that may result in performance degradation.


- The application crashes due to the generated incorrect code, resulting 
in a Denial of Service.

Re: [RFC] GCC Security policy


On 2023-08-10 14:28, Richard Sandiford wrote:

Siddhesh Poyarekar  writes:

On 2023-08-08 10:30, Siddhesh Poyarekar wrote:

Do you have a suggestion for the language to address libgcc,
libstdc++, etc. and libiberty, libbacktrace, etc.?


I'll work on this a bit and share a draft.


Hi David,

Here's what I came up with for different parts of GCC, including the
runtime libraries.  Over time we may find that specific parts of runtime
libraries simply cannot be used safely in some contexts and flag that.

Sid

"""
What is a GCC security bug?
===

  A security bug is one that threatens the security of a system or
  network, or might compromise the security of data stored on it.
  In the context of GCC there are multiple ways in which this might
  happen and they're detailed below.

Compiler drivers, programs, libgccjit and support libraries
---

  The compiler driver processes source code, invokes other programs
  such as the assembler and linker and generates the output result,
  which may be assembly code or machine code.  It is necessary that
  all source code inputs to the compiler are trusted, since it is
  impossible for the driver to validate input source code beyond
  conformance to a programming language standard.

  The GCC JIT implementation, libgccjit, is intended to be plugged
  into applications to translate input source code in the application
  context.  Limitations that apply to the compiler
  driver, apply here too in terms of sanitizing inputs, so it is
  recommended that inputs are either sanitized by an external program
  to allow only trusted, safe execution in the context of the
  application or the JIT execution context is appropriately sandboxed
  to contain the effects of any bugs in the JIT or its generated code
  to the sandboxed environment.

  Support libraries such as libiberty, libcc1 libvtv and libcpp have
  been developed separately to share code with other tools such as
  binutils and gdb.  These libraries again have similar challenges to
  compiler drivers.  While they are expected to be robust against
  arbitrary input, they should only be used with trusted inputs.

  Libraries such as zlib and libffi that bundled into GCC to build it
  will be treated the same as the compiler drivers and programs as far
  as security coverage is concerned.

  As a result, the only case for a potential security issue in all
  these cases is when it ends up generating vulnerable output for
  valid input source code.


I think this leaves open the interpretation "every wrong code bug
is potentially a security bug".  I suppose that's true in a trite sense,
but not in a useful sense.  As others said earlier in the thread,
whether a wrong code bug in GCC leads to a security bug in the object
code is too application-dependent to be a useful classification for GCC.

I think we should explicitly say that we don't generally consider wrong
code bugs to be security bugs.  Leaving it implicit is bound to lead
to misunderstanding.


I see what you mean, but the context-dependence of a bug is something 
GCC will have to deal with, similar to how libraries have to deal with 
bugs.  But I agree this probably needs some more expansion.  Let me try 
and come up with something more detailed for that last paragraph.



There's another case that I think should be highlighted explicitly:
GCC provides various security-hardening features.  I think any failure
of those feature to act as documented is poentially a security bug.
Failure to follow reasonable expectations (even if not documented)
might sometimes be a security bug too.


Missed hardening in general does not put systems at immediate risk, so 
they're not considered CVE-worthy.  In fact when bugs are evaluated for 
security risk at a source level (e.g. when NIST does it), hardening does 
not come into the picture at all.  It's only at product levels that 
hardening features are accounted for, e.g. where -fstack-protector would 
reduce the seriousness of a stack buffer overflow and even there one 
must do an analysis to see if the generated code actually mitigated the 
overflow using the stack protector canary.


Thanks,
Sid

Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)


On 2023-08-10 12:39, Jakub Jelinek wrote:

On Thu, Aug 10, 2023 at 12:30:06PM -0400, Siddhesh Poyarekar wrote:

The definition of __bos/__bdos allows us the freedom to *estimate* rather
than be precise, so I'd go for sizeof(x) + N * sizeof(*x.a) since it's bound
to give the more conservative answer of the two.


To be precise, we have the 0/1 modes vs. 2/3.  So, when not determining
__bos/__bdos from actual allocation size or size of an stack object or
size of data section object but something else (say counted_by), perhaps
0/1 modes should give the upper estimate of sizeof (x) + N * sizeof(elt)
and 2/3 modes should give a lower estimate, so offsetof + N * sizeof(elt),
then user code can continue testing if both modes are equal to have
exact number.


Ack, that's fair.

Thanks,
Sid

Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)


On 2023-08-10 11:18, Martin Uecker wrote:

Am Donnerstag, dem 10.08.2023 um 10:58 -0400 schrieb Siddhesh Poyarekar:

On 2023-08-10 10:47, Martin Uecker wrote:

Am Donnerstag, dem 10.08.2023 um 16:42 +0200 schrieb Jakub Jelinek:

On Thu, Aug 10, 2023 at 04:38:21PM +0200, Martin Uecker wrote:

Am Donnerstag, dem 10.08.2023 um 13:59 + schrieb Qing Zhao:



On Aug 10, 2023, at 2:58 AM, Martin Uecker  wrote:

Am Mittwoch, dem 09.08.2023 um 20:10 + schrieb Qing Zhao:



On Aug 9, 2023, at 12:21 PM, Michael Matz  wrote:





I am not sure for the reason given above. The following
code would not work:

struct foo_flex { int a; short b; char t[]; } x;
x.a = 1;
struct foo_flex *p = malloc(sizeof(x) + x.a);
if (!p) abort();
memcpy(p, , sizeof(x)); // initialize struct


Okay.
Then, the user still should use the sizeof(struct foo_flex) + N * 
sizeof(foo->t) for the allocation, even though this might allocate more bytes 
than necessary. (But this is safe)

Let me know if I still miss anything.


The question is not only what the user should use to
allocate, but also what BDOS should return.  In my
example the user uses the sizeof() + N * sizeof
formula and the memcpy is safe, but it would be flagged
as a buffer overrun if BDOS uses the offsetof formula.


BDOS/BOS (at least the 0 level) should return what is actually
allocated for the var, what size was passed to malloc and if it
is a var with flex array member with initialization what is actually the
size on the stack or in .data/.rodata etc.


Agreed.

But what about a struct with FAM with the new "counted_by" attribute
if the original allocation is not visible?


There's precedent for this through the __access__ attribute; __bos
trusts what the attribute says about the allocation.


The access attribute gives the size directly. The counted_by gives
a length for the array which needs to be translated into a size
via a formula. There are different formulas in use. The question
is which formula should bdos trust?

Whatever you pick, if this is not consistent with the actual
allocation or use, then it will cause problems either by
breaking code or not detecting buffer overruns.

So it needs to be consistent with what GCC allocates for a
var with FAM and initialization and also the user needs to
be told what the right choice is so that he can use the right
size for allocation and argument to memcpy / memset etc.


We'd rather miss overflow to the extent of padding than to try and be 
overly aggressive; I doubt if we're missing much protection in practice 
by trying to account for the padding.


The definition of __bos/__bdos allows us the freedom to *estimate* 
rather than be precise, so I'd go for sizeof(x) + N * sizeof(*x.a) since 
it's bound to give the more conservative answer of the two.


Thanks,
Sid

Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)


On 2023-08-10 10:47, Martin Uecker wrote:

Am Donnerstag, dem 10.08.2023 um 16:42 +0200 schrieb Jakub Jelinek:

On Thu, Aug 10, 2023 at 04:38:21PM +0200, Martin Uecker wrote:

Am Donnerstag, dem 10.08.2023 um 13:59 + schrieb Qing Zhao:



On Aug 10, 2023, at 2:58 AM, Martin Uecker  wrote:

Am Mittwoch, dem 09.08.2023 um 20:10 + schrieb Qing Zhao:



On Aug 9, 2023, at 12:21 PM, Michael Matz  wrote:





I am not sure for the reason given above. The following
code would not work:

struct foo_flex { int a; short b; char t[]; } x;
x.a = 1;
struct foo_flex *p = malloc(sizeof(x) + x.a);
if (!p) abort();
memcpy(p, , sizeof(x)); // initialize struct


Okay.
Then, the user still should use the sizeof(struct foo_flex) + N * 
sizeof(foo->t) for the allocation, even though this might allocate more bytes 
than necessary. (But this is safe)

Let me know if I still miss anything.


The question is not only what the user should use to
allocate, but also what BDOS should return.  In my
example the user uses the sizeof() + N * sizeof
formula and the memcpy is safe, but it would be flagged
as a buffer overrun if BDOS uses the offsetof formula.


BDOS/BOS (at least the 0 level) should return what is actually
allocated for the var, what size was passed to malloc and if it
is a var with flex array member with initialization what is actually the
size on the stack or in .data/.rodata etc.


Agreed.

But what about a struct with FAM with the new "counted_by" attribute
if the original allocation is not visible?


There's precedent for this through the __access__ attribute; __bos 
trusts what the attribute says about the allocation.


Sid

Re: [RFC] GCC Security policy

2023-08-09 Thread Siddhesh Poyarekar

On 2023-08-09 14:17, David Edelsohn wrote:
On Wed, Aug 9, 2023 at 1:33 PM Siddhesh Poyarekar <mailto:siddh...@gotplt.org>> wrote:

On 2023-08-08 10:30, Siddhesh Poyarekar wrote:
 >> Do you have a suggestion for the language to address libgcc,
 >> libstdc++, etc. and libiberty, libbacktrace, etc.?
 >
 > I'll work on this a bit and share a draft.

Hi David,

Here's what I came up with for different parts of GCC, including the
runtime libraries.  Over time we may find that specific parts of
runtime
libraries simply cannot be used safely in some contexts and flag that.

Sid

Hi, Sid

Thanks for iterating on this.

"""
What is a GCC security bug?
===

      A security bug is one that threatens the security of a system or
      network, or might compromise the security of data stored on it.
      In the context of GCC there are multiple ways in which this might
      happen and they're detailed below.

Compiler drivers, programs, libgccjit and support libraries
---

      The compiler driver processes source code, invokes other programs
      such as the assembler and linker and generates the output result,
      which may be assembly code or machine code.  It is necessary that
      all source code inputs to the compiler are trusted, since it is
      impossible for the driver to validate input source code beyond
      conformance to a programming language standard.

      The GCC JIT implementation, libgccjit, is intended to be plugged
      into applications to translate input source code in the
application
      context.  Limitations that apply to the compiler
      driver, apply here too in terms of sanitizing inputs, so it is
      recommended that inputs are either sanitized by an external
program
      to allow only trusted, safe execution in the context of the
      application or the JIT execution context is appropriately
sandboxed
      to contain the effects of any bugs in the JIT or its generated
code
      to the sandboxed environment.

      Support libraries such as libiberty, libcc1 libvtv and libcpp have
      been developed separately to share code with other tools such as
      binutils and gdb.  These libraries again have similar
challenges to
      compiler drivers.  While they are expected to be robust against
      arbitrary input, they should only be used with trusted inputs.

      Libraries such as zlib and libffi that bundled into GCC to
build it
      will be treated the same as the compiler drivers and programs
as far
      as security coverage is concerned.

Should we direct people to the upstream projects for their security 
policies?

We bundle zlib and libffi so regardless of whether it's a security issue 
in those libraries (because security impact of memory safety bugs in 
general use libraries will be context dependent and hence get assigned 
CVEs more often than not), the context in gcc is well defined as a local 
unprivileged executable and hence not security-relevant.

That said, we could add something like:

However if you find a issue in these libraries independent of their
use in GCC you should reach out to their upstream projects to report
them.

      As a result, the only case for a potential security issue in all
      these cases is when it ends up generating vulnerable output for
      valid input source code.

Language runtime libraries
--

      GCC also builds and distributes libraries that are intended to be
      used widely to implement runtime support for various programming
      languages.  These include the following:

      * libada
      * libatomic
      * libbacktrace
      * libcc1
      * libcody
      * libcpp
      * libdecnumber
      * libgcc
      * libgfortran
      * libgm2
      * libgo
      * libgomp
      * libiberty
      * libitm
      * libobjc
      * libphobos
      * libquadmath
      * libssp
      * libstdc++

      These libraries are intended to be used in arbitrary contexts
and as
      a result, bugs in these libraries may be evaluated for security
      impact.  However, some of these libraries, e.g. libgo, libphobos,
      etc.  are not maintained in the GCC project, due to which the GCC
      project may not be the correct point of contact for them.  You are
      encouraged to look at README files within those library
directories
      to locate the canonical security contact point for those projects.

As Richard mentioned, should GCC make a specific statement about the 
security policy / response for issues that

Re: [RFC] GCC Security policy

2023-08-09 Thread Siddhesh Poyarekar


On 2023-08-08 10:30, Siddhesh Poyarekar wrote:
Do you have a suggestion for the language to address libgcc, 
libstdc++, etc. and libiberty, libbacktrace, etc.?


I'll work on this a bit and share a draft.


Hi David,

Here's what I came up with for different parts of GCC, including the 
runtime libraries.  Over time we may find that specific parts of runtime 
libraries simply cannot be used safely in some contexts and flag that.


Sid

"""
What is a GCC security bug?
===

A security bug is one that threatens the security of a system or
network, or might compromise the security of data stored on it.
In the context of GCC there are multiple ways in which this might
happen and they're detailed below.

Compiler drivers, programs, libgccjit and support libraries
---

The compiler driver processes source code, invokes other programs
such as the assembler and linker and generates the output result,
which may be assembly code or machine code.  It is necessary that
all source code inputs to the compiler are trusted, since it is
impossible for the driver to validate input source code beyond
conformance to a programming language standard.

The GCC JIT implementation, libgccjit, is intended to be plugged
into applications to translate input source code in the application
context.  Limitations that apply to the compiler
driver, apply here too in terms of sanitizing inputs, so it is
recommended that inputs are either sanitized by an external program
to allow only trusted, safe execution in the context of the
application or the JIT execution context is appropriately sandboxed
to contain the effects of any bugs in the JIT or its generated code
to the sandboxed environment.

Support libraries such as libiberty, libcc1 libvtv and libcpp have
been developed separately to share code with other tools such as
binutils and gdb.  These libraries again have similar challenges to
compiler drivers.  While they are expected to be robust against
arbitrary input, they should only be used with trusted inputs.

Libraries such as zlib and libffi that bundled into GCC to build it
will be treated the same as the compiler drivers and programs as far
as security coverage is concerned.

As a result, the only case for a potential security issue in all
these cases is when it ends up generating vulnerable output for
valid input source code.

Language runtime libraries
--

GCC also builds and distributes libraries that are intended to be
used widely to implement runtime support for various programming
languages.  These include the following:

* libada
* libatomic
* libbacktrace
* libcc1
* libcody
* libcpp
* libdecnumber
* libgcc
* libgfortran
* libgm2
* libgo
* libgomp
* libiberty
* libitm
* libobjc
* libphobos
* libquadmath
* libssp
* libstdc++

These libraries are intended to be used in arbitrary contexts and as
a result, bugs in these libraries may be evaluated for security
impact.  However, some of these libraries, e.g. libgo, libphobos,
etc.  are not maintained in the GCC project, due to which the GCC
project may not be the correct point of contact for them.  You are
encouraged to look at README files within those library directories
to locate the canonical security contact point for those projects.

Diagnostic libraries


The sanitizer library bundled in GCC is intended to be used in
diagnostic cases and not intended for use in sensitive environments.
As a result, bugs in the sanitizer will not be considered security
sensitive.

GCC plugins
---

It should be noted that GCC may execute arbitrary code loaded by a
user through the GCC plugin mechanism or through system preloading
mechanism.  Such custom code should be vetted by the user for safety
as bugs exposed through such code will not be considered security
issues.

Re: [RFC] GCC Security policy


On 2023-08-08 11:48, David Malcolm wrote:

On Tue, 2023-08-08 at 09:33 -0400, Paul Koning via Gcc-patches wrote:




On Aug 8, 2023, at 9:01 AM, Jakub Jelinek via Gcc-patches
 wrote:

On Tue, Aug 08, 2023 at 02:52:57PM +0200, Richard Biener via Gcc-
patches wrote:

There's probably external tools to do this, not sure if we should
replicate
things in the driver for this.

But sure, I think the driver is the proper point to address any
of such
issues - iff we want to address them at all.  Maybe a nice little
google summer-of-code project ;)


What I'd really like to avoid is having all compiler bugs
(primarily ICEs)
considered to be security bugs (e.g. DoS category), it would be
terrible to
release every week a new compiler because of the "security" issues.


Indeed.  But my answer would be that such things are not DoS issues.
DoS means that an external input, over which you have little control,
is impairing service.  In the case of a compiler, if feeding it bad
source code X.c causes it to crash, the answer is "well, then don't
do that".


Agreed.

I'm not sure how to "wordsmith" this, but it seems like the sources and
options on the *host* are assumed to be trusted, and that the act of
*compiling* source on the host requires trusting them, just like the
act of executing the compiled code on the target does.  Though users
may be more familiar with sandboxing the target than the host.

We should spell this out further for libgccjit: libgccjit allows for
ahead-of-time and JIT compilation of sources - but it assumes that
those sources (and the compilation options) are trusted.

[Adding Andrea Corallo to the addressees]

For example, Emacs is using libgccjit to do ahead-of-time compilation
of Emacs bytecode.  I'm assuming that Emacs is assuming that its
bytecode is trusted, and that there isn't any attempt by Emacs to
sandbox the Emacs Lisp being processed.

However, consider a situation in which someone attempted to, say, embed
libgccjit inside a web browser to generate machine code from
JavaScript, where the JavaScript is potentially controlled by an
attacker.  I think we want to explicitly say that that if you're going
to do that, you need to put some other layer of defense in, so that
you're not blithely accepting the inputs to the compilation (sources
and options) from a potentially hostile source, where a crafted input
sources could potentially hit an ICE in the compiler and thus crash the
web browser.


+1, this is precisely the kind of thing the security policy should warn 
against and suggest using sandboxing for.  The compiler (or libgccjit) 
isn't really in a position to defend such uses, ICE or otherwise.


Thanks,
Sid

Re: [RFC] GCC Security policy


On 2023-08-08 10:37, Jakub Jelinek wrote:

On Tue, Aug 08, 2023 at 10:30:10AM -0400, Siddhesh Poyarekar wrote:

Do you have a suggestion for the language to address libgcc, libstdc++,
etc. and libiberty, libbacktrace, etc.?


I'll work on this a bit and share a draft.


BTW, I think we should perhaps differentiate between production ready
libraries (e.g. libgcc, libstdc++, libgomp, libatomic, libgfortran, libquadmath,
libssp) vs. e.g. the sanitizer libraries which are meant for debugging and


Agreed, that's why I need some time to sort all of the libraries gcc 
builds to categorize them into various levels of support in terms of 
safety re. untrusted input.


Thanks,
Sid

Re: [RFC] GCC Security policy

On 2023-08-08 10:14, David Edelsohn wrote:
On Tue, Aug 8, 2023 at 10:07 AM Siddhesh Poyarekar <mailto:siddh...@gotplt.org>> wrote:

On 2023-08-08 10:04, Richard Biener wrote:
 > On Tue, Aug 8, 2023 at 3:35 PM Ian Lance Taylor mailto:i...@google.com>> wrote:
 >>
 >> On Tue, Aug 8, 2023 at 6:02 AM Jakub Jelinek via Gcc-patches
 >> mailto:gcc-patches@gcc.gnu.org>> wrote:
 >>>
 >>> On Tue, Aug 08, 2023 at 02:52:57PM +0200, Richard Biener via
Gcc-patches wrote:
 >>>> There's probably external tools to do this, not sure if we
should replicate
 >>>> things in the driver for this.
 >>>>
 >>>> But sure, I think the driver is the proper point to address
any of such
 >>>> issues - iff we want to address them at all.  Maybe a nice little
 >>>> google summer-of-code project ;)
 >>>
 >>> What I'd really like to avoid is having all compiler bugs
(primarily ICEs)
 >>> considered to be security bugs (e.g. DoS category), it would be
terrible to
 >>> release every week a new compiler because of the "security" issues.
 >>> Running compiler on untrusted sources can trigger ICEs (which
we want to fix
 >>> but there will always be some), or run into some compile time
and/or compile
 >>> memory issue (we have various quadratic or worse spots),
compiler stack
 >>> limits (deeply nested stuff e.g. during parsing but other areas
as well).
 >>> So, people running fuzzers and reporting issues is great, but
if they'd get
 >>> a CVE assigned for each ice-on-invalid-code, ice-on-valid-code,
 >>> each compile-time-hog and each memory-hog, that wouldn't be useful.
 >>> Runtime libraries or security issues in the code we generate
for valid
 >>> sources are of course a different thing.
 >>
 >>
 >> I wonder if a security policy should say something about the
-fplugin
 >> option.  I agree that an ICE is not a security issue, but I
wonder how
 >> many people are aware that a poorly chosen command line option can
 >> direct the compiler to run arbitrary code.  For that matter the same
 >> is true of setting the GCC_EXEC_PREFIX environment variable, and no
 >> doubt several other environment variables.  My point is not that we
 >> should change these, but that a security policy should draw
attention
 >> to the fact that there are cases in which the compiler will
 >> unexpectedly run other programs.
 >
 > Well, if you run an arbitrary commandline from the internet you get
 > what you deserve, running "echo "Hello World" | gcc -xc - -o
/dev/sda"
 > as root doesn't need plugins to shoot yourself in the foot.  You
need to
 > know what you're doing, otherwise you are basically executing an
 > arbitrary shell script with whatever privileges you have.

I think it would be useful to mention caveats with plugins though, just
like it would be useful to mention exceptions for libiberty and similar
libraries that gcc builds.  It only helps makes things clearer in terms
of what security coverage the project provides.

I have added a line to the Note section in the proposed text:

     GCC and its tools provide features and options that can run 
arbitrary user code (e.g., -fplugin).

How about the following to make it clearer that arbitrary code in 
plugins is not considered secure:

GCC and its tools provide features and options that can run arbitrary 
user code, e.g. using the -fplugin options.  Such custom code should be 
vetted by the user for safety as bugs exposed through such code will not 
be considered security issues.

I believe that the security implication already is addressed because the 
program is not tricked into a direct compromise of security.

Do you have a suggestion for the language to address libgcc, libstdc++, 
etc. and libiberty, libbacktrace, etc.?

I'll work on this a bit and share a draft.

Thanks,
Sid

Re: [RFC] GCC Security policy


On 2023-08-08 10:04, Richard Biener wrote:

On Tue, Aug 8, 2023 at 3:35 PM Ian Lance Taylor  wrote:


On Tue, Aug 8, 2023 at 6:02 AM Jakub Jelinek via Gcc-patches
 wrote:


On Tue, Aug 08, 2023 at 02:52:57PM +0200, Richard Biener via Gcc-patches wrote:

There's probably external tools to do this, not sure if we should replicate
things in the driver for this.

But sure, I think the driver is the proper point to address any of such
issues - iff we want to address them at all.  Maybe a nice little
google summer-of-code project ;)


What I'd really like to avoid is having all compiler bugs (primarily ICEs)
considered to be security bugs (e.g. DoS category), it would be terrible to
release every week a new compiler because of the "security" issues.
Running compiler on untrusted sources can trigger ICEs (which we want to fix
but there will always be some), or run into some compile time and/or compile
memory issue (we have various quadratic or worse spots), compiler stack
limits (deeply nested stuff e.g. during parsing but other areas as well).
So, people running fuzzers and reporting issues is great, but if they'd get
a CVE assigned for each ice-on-invalid-code, ice-on-valid-code,
each compile-time-hog and each memory-hog, that wouldn't be useful.
Runtime libraries or security issues in the code we generate for valid
sources are of course a different thing.



I wonder if a security policy should say something about the -fplugin
option.  I agree that an ICE is not a security issue, but I wonder how
many people are aware that a poorly chosen command line option can
direct the compiler to run arbitrary code.  For that matter the same
is true of setting the GCC_EXEC_PREFIX environment variable, and no
doubt several other environment variables.  My point is not that we
should change these, but that a security policy should draw attention
to the fact that there are cases in which the compiler will
unexpectedly run other programs.


Well, if you run an arbitrary commandline from the internet you get
what you deserve, running "echo "Hello World" | gcc -xc - -o /dev/sda"
as root doesn't need plugins to shoot yourself in the foot.  You need to
know what you're doing, otherwise you are basically executing an
arbitrary shell script with whatever privileges you have.


I think it would be useful to mention caveats with plugins though, just 
like it would be useful to mention exceptions for libiberty and similar 
libraries that gcc builds.  It only helps makes things clearer in terms 
of what security coverage the project provides.


Thanks,
Sid

Re: [RFC] GCC Security policy


On 2023-08-08 04:16, Richard Biener wrote:

On Mon, Aug 7, 2023 at 7:30 PM David Edelsohn via Gcc-patches
 wrote:


FOSS Best Practices recommends that projects have an official Security
policy stated in a SECURITY.md or SECURITY.txt file at the root of the
repository.  GLIBC and Binutils have added such documents.

Appended is a prototype for a Security policy file for GCC based on the
Binutils document because GCC seems to have more affinity with Binutils as
a tool. Do the runtime libraries distributed with GCC, especially libgcc,
require additional security policies?

[ ] Is it appropriate to use the Binutils SECURITY.txt as the starting
point or should GCC use GLIBC SECURITY.md as the starting point for the GCC
Security policy?

[ ] Does GCC, or some components of GCC, require additional care because of
runtime libraries like libgcc and libstdc++, and because of gcov and
profile-directed feedback?


I do think that the runtime libraries should at least be explicitly mentioned
because they fall into the "generated output" category and bugs in the
runtime are usually more severe as affecting a wider class of inputs.


Ack, I'd expect libstdc++ and libgcc to be aligned with glibc's 
policies.  libiberty and others on the other hand, would probably be 
more suitably aligned with binutils libbfd, where we assume trusted input.



Thoughts?

Thanks, David

GCC Security Process


What is a GCC security bug?
===

 A security bug is one that threatens the security of a system or
 network, or might compromise the security of data stored on it.
 In the context of GCC there are two ways in which such
 bugs might occur.  In the first, the programs themselves might be
 tricked into a direct compromise of security.  In the second, the
 tools might introduce a vulnerability in the generated output that
 was not already present in the files used as input.

 Other than that, all other bugs will be treated as non-security
 issues.  This does not mean that they will be ignored, just that
 they will not be given the priority that is given to security bugs.

 This stance applies to the creation tools in the GCC (e.g.,
 gcc, g++, gfortran, gccgo, gccrs, gnat, cpp, gcov, etc.) and the
 libraries that they use.

Notes:
==

 None of the programs in GCC need elevated privileges to operate and
 it is recommended that users do not use them from accounts where such
 privileges are automatically available.


I'll note that we could ourselves mitigate some of that by handling privileged
invocation of the driver specially, dropping privs on exec of the sibling tools
and possibly using temporary files or pipes to do the parts of the I/O that
need to be privileged.


It's not a bad idea, but it ends up giving legitimizing running the 
compiler as root, pushing the responsibility of privilege management to 
the driver.  How about rejecting invocation as root altogether by 
default, bypassed with a --run-as-root flag instead?


I've also been thinking about a --sandbox flag that isolates the build 
process (for gcc as well as binutils) into a separate namespace so that 
it's usable in a restricted mode on untrusted sources without exposing 
the rest of the system to it.


Thanks,
Sid

Re: One question on the source code of tree-object-size.cc


On 2023-08-04 15:06, Qing Zhao wrote:

Yes, that's what I'm thinking.


so `q` must be pointing to a single element.  So you could deduce:

1. the minimum size of the whole object that q points to.

You mean that the TYPE will determine the minimum size of the whole object?  
(Does this include the size of the flexible array member, or only the other 
part of the structure except the flexible array member?)


Only the constant sized part of the structure.

Okay. I see.
But if the “counted_by” info is available, then from p->array, we can deduce the 
minimum size too, as sizeof(struct A) + q->foo * sizeof(int), right?


Yes.




Actually for minimum size we'd also need a guarantee that `alloc_buf_more` 
returns a valid allocated object.

Why? Please explain a little bit here.


So `alloc_buf_more` could return NULL, a valid pointer or an invalid pointer.  
So, we could end up returning a non-zero minimum size for an invalid or NULL 
pointer, which is incorrect, we don't know that.


I see what’ s you mean now.

However, if we already see p->array, then the p is guaranteed a valid pointer and not 
a NULL, right?  (We are discussing on __builtin_dynamic_object_size (q->array, 2), we 
see q->array already)


Yes, you could argue that for p->array, I agree, but not for p.



We won't need the object validity guarantee for (2) beyond, e.g. guarding 
against a new NULL pointer dereference because it's a *maximum* estimate; an 
invalid or NULL pointer would have 0 size.  So for such cases, __bos(q, 0) 
could return

sizeof(*q) + (q ? q->foo:0)

and __bos(q->array, 0) could be

sizeof(*q) + q->foo - offsetof(q, array)

There's no need to guard against a dereference in the second case because the 
q->array dereference already assumes that q is valid.


q->array should also guarantee that q is a valid pointer for minimum size, 
right? Or do I miss anything here?


Yes.

Thanks,
Sid

Re: One question on the source code of tree-object-size.cc


On 2023-08-04 11:27, Qing Zhao wrote:




On Aug 4, 2023, at 10:40 AM, Siddhesh Poyarekar  wrote:

On 2023-08-03 13:34, Qing Zhao wrote:

One thing I need to point out first is, currently, even for regular fixed size 
array in the structure,
We have this same issue, for example:
#define LENGTH 10
struct fix {
   size_t foo;
   int array[LENGTH];
};
…
int main ()
{
   struct fix *p;
   p = alloc_buf_more ();
   expect(__builtin_object_size(p->array, 1), LENGTH * sizeof(int));
   expect(__builtin_object_size(p->array, 0), -1);
}
Currently, for __builtin_object_size(p->array, 0),  GCC return UNKNOWN for it.
This is not a special issue for flexible array member.


That's fine for fixed arrays at the end of a struct because the "whole object" 
size could be anything; `p` could be pointing to the beginning of an array for all we 
know.  If however `array` is strictly a flex array, i.e.:

```
struct A
{
  size_t foo;
  int array[];
};
```

then there's no way in valid C to have an array of `struct fix`,


Yes!!   this is exactly the place that makes difference between structures with 
fixed arrays and the ones with flexible arrays.

With such difference, I guess that using the type of the structure with flexible 
array member for p->array to get the size of the whole object p point to might 
be reasonable?


Yes, that's what I'm thinking.


so `q` must be pointing to a single element.  So you could deduce:

1. the minimum size of the whole object that q points to.


You mean that the TYPE will determine the minimum size of the whole object?  
(Does this include the size of the flexible array member, or only the other 
part of the structure except the flexible array member?)


Only the constant sized part of the structure.


Actually for minimum size we'd also need a guarantee that `alloc_buf_more` 
returns a valid allocated object.


Why? Please explain a little bit here.


So `alloc_buf_more` could return NULL, a valid pointer or an invalid 
pointer.  So, we could end up returning a non-zero minimum size for an 
invalid or NULL pointer, which is incorrect, we don't know that.


We won't need the object validity guarantee for (2) beyond, e.g. 
guarding against a new NULL pointer dereference because it's a *maximum* 
estimate; an invalid or NULL pointer would have 0 size.  So for such 
cases, __bos(q, 0) could return


sizeof(*q) + (q ? q->foo:0)

and __bos(q->array, 0) could be

sizeof(*q) + q->foo - offsetof(q, array)

There's no need to guard against a dereference in the second case 
because the q->array dereference already assumes that q is valid.




and

2. if you're able to determine the size of the flex array (through 
__element_count__(foo) for example), you could even determine the maximum size 
of the whole object.

For (2) though, you'd break applications that overallocate and then expect to 
be able to use that overallocation despite the space not being reflected in the 
__element_count__.  I think it's a bug in the application and I can't see a way 
for an application to be able to do this in a valid way so I'm inclined towards 
breaking it.


Currently, we allow the situation when the allocation size for the whole object 
is larger than the value reflected in the “counted_by” attribute (the old name 
is __element_count__). But don’t allow the other way around (i.e, when the 
allocation size for the whole object is smaller than the value reflected in the 
“counted_by” attribute.


Right, that's going to be the "break".  For underallocation __bos will 
only end up overestimating the space available, which is not ideal, but 
won't end up breaking compatibility.




Of course, the fact that gcc allows flex arrays to be in the middle of structs 
breaks the base assumption but that's something we need to get rid of anyway 
since there's no way for valid C programs to use that safely.


Since GCC14, we started to deprecate this extension (allow flex array to be in 
the middle of structs).
https://gcc.gnu.org/pipermail/gcc-cvs/2023-June/385730.html


Yes, that's what I'm banking on.

Thanks,
Sid

Re: One question on the source code of tree-object-size.cc


On 2023-08-04 10:40, Siddhesh Poyarekar wrote:

On 2023-08-03 13:34, Qing Zhao wrote:
One thing I need to point out first is, currently, even for regular 
fixed size array in the structure,

We have this same issue, for example:

#define LENGTH 10

struct fix {
   size_t foo;
   int array[LENGTH];
};

…
int main ()
{
   struct fix *p;
   p = alloc_buf_more ();

   expect(__builtin_object_size(p->array, 1), LENGTH * sizeof(int));
   expect(__builtin_object_size(p->array, 0), -1);
}

Currently, for __builtin_object_size(p->array, 0),  GCC return UNKNOWN 
for it.

This is not a special issue for flexible array member.


That's fine for fixed arrays at the end of a struct because the "whole 
object" size could be anything; `p` could be pointing to the beginning 
of an array for all we know.  If however `array` is strictly a flex 
array, i.e.:


```
struct A
{
   size_t foo;
   int array[];
};
```

then there's no way in valid C to have an array of `struct fix`, so `q` 
must be pointing to a single element.  So you could deduce:


1. the minimum size of the whole object that q points to.


Actually for minimum size we'd also need a guarantee that 
`alloc_buf_more` returns a valid allocated object.


Sid

Re: One question on the source code of tree-object-size.cc