Re: [PATCH v8 0/5] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-03-29 Thread Qing Zhao
Hi,  Tom,

Thanks a lot for the comments. 

It’s good to hear that this new attribute might be able to be used to help gdb. 

We might spend some time to study to use this information in other consumers, 
for example, gdb, in the future, if necessary and possible.  If you have good 
examples to show the importance of using such information in gdb, please let me 
know. I’m glad to study a little more. 

At this time, I agree with Kees, it’s better for the initial patches of the 
“counted-by” support to focus on the the attribute itself and the immediate 
security consumers, such as array bound sanitizer and dynamic object size, etc. 

So, let’s delay the possible support to gdb in a later patch. 

Does this sound reasonable to you?

Qing



> On Mar 29, 2024, at 15:16, Kees Cook  wrote:
> 
> On Fri, Mar 29, 2024 at 12:09:15PM -0600, Tom Tromey wrote:
>>>>>>> Qing Zhao  writes:
>> 
>>> This is the 8th version of the patch.
>> 
>>> compare with the 7th version, the difference are:
>> 
>> [...]
>> 
>> Hi.  I was curious to know if the information supplied by this attribute
>> shows up in the DWARF.  It would be good if it did, because that would
>> let gdb correctly print these arrays without user intervention.
> 
> Does DWARF have such an annotation? Regardless, I think this could be a
> future patch to not hold up landing the initial feature.
> 
> -- 
> Kees Cook



[PATCH v8 4/5] Use the .ACCESS_WITH_SIZE in bound sanitizer.

2024-03-29 Thread Qing Zhao
gcc/c-family/ChangeLog:

* c-ubsan.cc (get_bound_from_access_with_size): New function.
(ubsan_instrument_bounds): Handle call to .ACCESS_WITH_SIZE.

gcc/testsuite/ChangeLog:

* gcc.dg/ubsan/flex-array-counted-by-bounds-2.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds-3.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds-4.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds.c: New test.
---
 gcc/c-family/c-ubsan.cc   | 42 +
 .../ubsan/flex-array-counted-by-bounds-2.c| 45 ++
 .../ubsan/flex-array-counted-by-bounds-3.c| 34 ++
 .../ubsan/flex-array-counted-by-bounds-4.c| 34 ++
 .../ubsan/flex-array-counted-by-bounds.c  | 46 +++
 5 files changed, 201 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-3.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-4.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c

diff --git a/gcc/c-family/c-ubsan.cc b/gcc/c-family/c-ubsan.cc
index 940982819ddf..7cd3c6aa5b88 100644
--- a/gcc/c-family/c-ubsan.cc
+++ b/gcc/c-family/c-ubsan.cc
@@ -376,6 +376,40 @@ ubsan_instrument_return (location_t loc)
   return build_call_expr_loc (loc, t, 1, build_fold_addr_expr_loc (loc, data));
 }
 
+/* Get the tree that represented the number of counted_by, i.e, the maximum
+   number of the elements of the object that the call to .ACCESS_WITH_SIZE
+   points to, this number will be the bound of the corresponding array.  */
+static tree
+get_bound_from_access_with_size (tree call)
+{
+  if (!is_access_with_size_p (call))
+return NULL_TREE;
+
+  tree ref_to_size = CALL_EXPR_ARG (call, 1);
+  unsigned int class_of_size = TREE_INT_CST_LOW (CALL_EXPR_ARG (call, 2));
+  tree type = TREE_TYPE (CALL_EXPR_ARG (call, 3));
+  tree size = fold_build2 (MEM_REF, type, unshare_expr (ref_to_size),
+  build_int_cst (ptr_type_node, 0));
+  /* If size is negative value, treat it as zero.  */
+  if (!TYPE_UNSIGNED (type))
+  {
+tree cond = fold_build2 (LT_EXPR, boolean_type_node,
+unshare_expr (size), build_zero_cst (type));
+size = fold_build3 (COND_EXPR, type, cond,
+   build_zero_cst (type), size);
+  }
+
+  /* Only when class_of_size is 1, i.e, the number of the elements of
+ the object type, return the size.  */
+  if (class_of_size != 1)
+return NULL_TREE;
+  else
+size = fold_convert (sizetype, size);
+
+  return size;
+}
+
+
 /* Instrument array bounds for ARRAY_REFs.  We create special builtin,
that gets expanded in the sanopt pass, and make an array dimension
of it.  ARRAY is the array, *INDEX is an index to the array.
@@ -401,6 +435,14 @@ ubsan_instrument_bounds (location_t loc, tree array, tree 
*index,
  && COMPLETE_TYPE_P (type)
  && integer_zerop (TYPE_SIZE (type)))
bound = build_int_cst (TREE_TYPE (TYPE_MIN_VALUE (domain)), -1);
+  else if (INDIRECT_REF_P (array)
+  && is_access_with_size_p ((TREE_OPERAND (array, 0
+   {
+ bound = get_bound_from_access_with_size ((TREE_OPERAND (array, 0)));
+ bound = fold_build2 (MINUS_EXPR, TREE_TYPE (bound),
+  bound,
+  build_int_cst (TREE_TYPE (bound), 1));
+   }
   else
return NULL_TREE;
 }
diff --git a/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c 
b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
new file mode 100644
index ..b503320628d2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
@@ -0,0 +1,45 @@
+/* Test the attribute counted_by and its usage in
+   bounds sanitizer combined with VLA.  */
+/* { dg-do run } */
+/* { dg-options "-fsanitize=bounds" } */
+/* { dg-output "index 11 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 20 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 11 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 10 out of bounds for type 'int 
\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+
+
+#include 
+
+void __attribute__((__noinline__)) setup_and_test_vla (int n, int m)
+{
+   struct foo {
+   int n;
+   int p[][n] __attribute__((counted_by(n)));
+   } *f;
+
+   f = (struct foo *) malloc (sizeof(struct foo) + m*sizeof(int[n]));
+   f->n = m;
+   f->p[m][n-1]=1;
+   return;
+}
+
+void __attribute__((__noinline__)) setup_and_test_vla_1 (int n1, int n2, int m)
+{
+  struct foo {
+int n;
+int p[][n2][n1] __attribute__((counted_by(n)));
+  } *f;
+
+  f = (struct foo *) malloc 

[PATCH v8 3/5] Use the .ACCESS_WITH_SIZE in builtin object size.

2024-03-29 Thread Qing Zhao
gcc/ChangeLog:

* tree-object-size.cc (access_with_size_object_size): New function.
(call_object_size): Call the new function.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-object-size-common.h: Add a new macro EXPECT.
* gcc.dg/flex-array-counted-by-3.c: New test.
* gcc.dg/flex-array-counted-by-4.c: New test.
* gcc.dg/flex-array-counted-by-5.c: New test.
---
 .../gcc.dg/builtin-object-size-common.h   |  11 ++
 .../gcc.dg/flex-array-counted-by-3.c  |  63 +++
 .../gcc.dg/flex-array-counted-by-4.c  | 178 ++
 .../gcc.dg/flex-array-counted-by-5.c  |  48 +
 gcc/tree-object-size.cc   |  60 ++
 5 files changed, 360 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-5.c

diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-common.h 
b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
index 66ff7cdd953a..b677067c6e6b 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-common.h
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
@@ -30,3 +30,14 @@ unsigned nfails = 0;
   __builtin_abort ();\
 return 0;\
   } while (0)
+
+#define EXPECT(p, _v) do {   \
+  size_t v = _v; \
+  if (p == v)\
+__builtin_printf ("ok:  %s == %zd\n", #p, p);\
+  else   \
+{\
+  __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v);\
+  FAIL ();   \
+}\
+} while (0);
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
new file mode 100644
index ..78f50230e891
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
@@ -0,0 +1,63 @@
+/* Test the attribute counted_by and its usage in
+ * __builtin_dynamic_object_size.  */ 
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "builtin-object-size-common.h"
+
+struct flex {
+  int b;
+  int c[];
+} *array_flex;
+
+struct annotated {
+  int b;
+  int c[] __attribute__ ((counted_by (b)));
+} *array_annotated;
+
+struct nested_annotated {
+  struct {
+union {
+  int b;
+  float f; 
+};
+int n;
+  };
+  int c[] __attribute__ ((counted_by (b)));
+} *array_nested_annotated;
+
+void __attribute__((__noinline__)) setup (int normal_count, int attr_count)
+{
+  array_flex
+= (struct flex *)malloc (sizeof (struct flex)
++ normal_count *  sizeof (int));
+  array_flex->b = normal_count;
+
+  array_annotated
+= (struct annotated *)malloc (sizeof (struct annotated)
+ + attr_count *  sizeof (int));
+  array_annotated->b = attr_count;
+
+  array_nested_annotated
+= (struct nested_annotated *)malloc (sizeof (struct nested_annotated)
++ attr_count *  sizeof (int));
+  array_nested_annotated->b = attr_count;
+
+  return;
+}
+
+void __attribute__((__noinline__)) test ()
+{
+EXPECT(__builtin_dynamic_object_size(array_flex->c, 1), -1);
+EXPECT(__builtin_dynamic_object_size(array_annotated->c, 1),
+  array_annotated->b * sizeof (int));
+EXPECT(__builtin_dynamic_object_size(array_nested_annotated->c, 1),
+  array_nested_annotated->b * sizeof (int));
+}
+
+int main(int argc, char *argv[])
+{
+  setup (10,10);   
+  test ();
+  DONE ();
+}
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
new file mode 100644
index ..20103d58ef51
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
@@ -0,0 +1,178 @@
+/* Test the attribute counted_by and its usage in
+__builtin_dynamic_object_size: what's the correct behavior when the
+allocation size mismatched with the value of counted_by attribute?
+We should always use the latest value that is hold by the counted_by
+field.  */
+/* { dg-do run } */
+/* { dg-options "-O -fstrict-flex-arrays=3" } */
+
+#include "builtin-object-size-common.h"
+
+struct annotated {
+  size_t foo;
+  char others;
+  char array[] __attribute__((counted_by (foo)));
+};
+
+#define noinline __attribute__((__noinline__))
+#define SIZE_BUMP 10 
+#define MAX(a, b) ((a) > (b) ? (a) : (b))
+
+/* In general, Due to type casting, the type for the pointee of a pointer
+   does not say 

[PATCH v8 2/5] Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.

2024-03-29 Thread Qing Zhao
Including the following changes:
* The definition of the new internal function .ACCESS_WITH_SIZE
  in internal-fn.def.
* C FE converts every reference to a FAM with a "counted_by" attribute
  to a call to the internal function .ACCESS_WITH_SIZE.
  (build_component_ref in c_typeck.cc)

  This includes the case when the object is statically allocated and
  initialized.
  In order to make this working, the routines initializer_constant_valid_p_1
  and output_constant in varasm.cc are updated to handle calls to
  .ACCESS_WITH_SIZE.
  (initializer_constant_valid_p_1 and output_constant in varasm.c)

  However, for the reference inside "offsetof", the "counted_by" attribute is
  ignored since it's not useful at all.
  (c_parser_postfix_expression in c/c-parser.cc)

  In addtion to "offsetof", for the reference inside operator "typeof" and
  "alignof", we ignore counted_by attribute too.

  When building ADDR_EXPR for the .ACCESS_WITH_SIZE in C FE,
  replace the call with its first argument.

* Convert every call to .ACCESS_WITH_SIZE to its first argument.
  (expand_ACCESS_WITH_SIZE in internal-fn.cc)
* Adjust alias analysis to exclude the new internal from clobbering anything.
  (ref_maybe_used_by_call_p_1 and call_may_clobber_ref_p_1 in tree-ssa-alias.cc)
* Adjust dead code elimination to eliminate the call to .ACCESS_WITH_SIZE when
  it's LHS is eliminated as dead code.
  (eliminate_unnecessary_stmts in tree-ssa-dce.cc)
* Provide the utility routines to check the call is .ACCESS_WITH_SIZE and
  get the reference from the call to .ACCESS_WITH_SIZE.
  (is_access_with_size_p and get_ref_from_access_with_size in tree.cc)

gcc/c/ChangeLog:

* c-parser.cc (c_parser_postfix_expression): Ignore the counted-by
attribute when build_component_ref inside offsetof operator.
* c-tree.h (build_component_ref): Add one more parameter.
* c-typeck.cc (build_counted_by_ref): New function.
(build_access_with_size_for_counted_by): New function.
(build_component_ref): Check the counted-by attribute and build
call to .ACCESS_WITH_SIZE.
(build_unary_op): When building ADDR_EXPR for
.ACCESS_WITH_SIZE, use its first argument.
(lvalue_p): Accept call to .ACCESS_WITH_SIZE.

gcc/ChangeLog:

* internal-fn.cc (expand_ACCESS_WITH_SIZE): New function.
* internal-fn.def (ACCESS_WITH_SIZE): New internal function.
* tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Special case
IFN_ACCESS_WITH_SIZE.
(call_may_clobber_ref_p_1): Special case IFN_ACCESS_WITH_SIZE.
* tree-ssa-dce.cc (eliminate_unnecessary_stmts): Eliminate the call
to .ACCESS_WITH_SIZE when its LHS is dead.
* tree.cc (process_call_operands): Adjust side effect for function
.ACCESS_WITH_SIZE.
(is_access_with_size_p): New function.
(get_ref_from_access_with_size): New function.
* tree.h (is_access_with_size_p): New prototype.
(get_ref_from_access_with_size): New prototype.
* varasm.cc (initializer_constant_valid_p_1): Handle call to
.ACCESS_WITH_SIZE.
(output_constant): Handle call to .ACCESS_WITH_SIZE.

gcc/testsuite/ChangeLog:

* gcc.dg/flex-array-counted-by-2.c: New test.
---
 gcc/c/c-parser.cc |  10 +-
 gcc/c/c-tree.h|   2 +-
 gcc/c/c-typeck.cc | 128 +-
 gcc/internal-fn.cc|  35 +
 gcc/internal-fn.def   |   4 +
 .../gcc.dg/flex-array-counted-by-2.c  | 112 +++
 gcc/tree-ssa-alias.cc |   2 +
 gcc/tree-ssa-dce.cc   |   5 +-
 gcc/tree.cc   |  25 +++-
 gcc/tree.h|   8 ++
 gcc/varasm.cc |  10 ++
 11 files changed, 331 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index c31349dae2ff..a6ed5ac43bb1 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -10850,9 +10850,12 @@ c_parser_postfix_expression (c_parser *parser)
if (c_parser_next_token_is (parser, CPP_NAME))
  {
c_token *comp_tok = c_parser_peek_token (parser);
+   /* Ignore the counted_by attribute for reference inside
+  offsetof since the information is not useful at all.  */
offsetof_ref
  = build_component_ref (loc, offsetof_ref, comp_tok->value,
-comp_tok->location, UNKNOWN_LOCATION);
+comp_tok->location, UNKNOWN_LOCATION,
+false);
c_parser_consume_token (parser);
while (c_parser_next_token_is (parser, CPP_DOT)
   || 

[PATCH v8 5/5] Add the 6th argument to .ACCESS_WITH_SIZE

2024-03-29 Thread Qing Zhao
to carry the TYPE of the flexible array.

Such information is needed during tree-object-size.cc.

We cannot use the result type or the type of the 1st argument
of the routine .ACCESS_WITH_SIZE to decide the element type
of the original array due to possible type casting in the
source code.

gcc/c/ChangeLog:

* c-typeck.cc (build_access_with_size_for_counted_by): Add the 6th
argument to .ACCESS_WITH_SIZE.

gcc/ChangeLog:

* tree-object-size.cc (access_with_size_object_size): Use the type
of the 6th argument for the type of the element.

gcc/testsuite/ChangeLog:

* gcc.dg/flex-array-counted-by-6.c: New test.
---
 gcc/c/c-typeck.cc | 11 +++--
 gcc/internal-fn.cc|  2 +
 .../gcc.dg/flex-array-counted-by-6.c  | 46 +++
 gcc/tree-object-size.cc   | 16 ---
 4 files changed, 66 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-6.c

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index f7b0e08459b0..05948f76039e 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -2608,7 +2608,8 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
*counted_by_type)
 
to:
 
-   (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1))
+   (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1,
+   (TYPE_OF_ARRAY *)0))
 
NOTE: The return type of this function is the POINTER type pointing
to the original flexible array type.
@@ -2620,6 +2621,9 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
*counted_by_type)
The 4th argument of the call is a constant 0 with the TYPE of the
object pointed by COUNTED_BY_REF.
 
+   The 6th argument of the call is a constant 0 with the pointer TYPE
+   to the original flexible array type.
+
   */
 static tree
 build_access_with_size_for_counted_by (location_t loc, tree ref,
@@ -2632,12 +2636,13 @@ build_access_with_size_for_counted_by (location_t loc, 
tree ref,
 
   tree call
 = build_call_expr_internal_loc (loc, IFN_ACCESS_WITH_SIZE,
-   result_type, 5,
+   result_type, 6,
array_to_pointer_conversion (loc, ref),
counted_by_ref,
build_int_cst (integer_type_node, 1),
build_int_cst (counted_by_type, 0),
-   build_int_cst (integer_type_node, -1));
+   build_int_cst (integer_type_node, -1),
+   build_int_cst (result_type, 0));
   /* Wrap the call with an INDIRECT_REF with the flexible array type.  */
   call = build1 (INDIRECT_REF, TREE_TYPE (ref), call);
   SET_EXPR_LOCATION (call, loc);
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index e744080ee670..34e4a4aea534 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3411,6 +3411,8 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
  1: read_only
  2: write_only
  3: read_write
+   6th argument: A constant 0 with the pointer TYPE to the original flexible
+ array type.
 
Both the return type and the type of the first argument of this
function have been converted from the incomplete array type to
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c
new file mode 100644
index ..65fa01443d95
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c
@@ -0,0 +1,46 @@
+/* Test the attribute counted_by and its usage in
+ * __builtin_dynamic_object_size: when the type of the flexible array member
+ * is casting to another type.  */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "builtin-object-size-common.h"
+
+typedef unsigned short u16;
+
+struct info {
+   u16 data_len;
+   char data[] __attribute__((counted_by(data_len)));
+};
+
+struct foo {
+   int a;
+   int b;
+};
+
+static __attribute__((__noinline__))
+struct info *setup ()
+{
+ struct info *p;
+ size_t bytes = 3 * sizeof(struct foo);
+
+ p = (struct info *)malloc (sizeof (struct info) + bytes);
+ p->data_len = bytes;
+
+ return p;
+}
+
+static void
+__attribute__((__noinline__)) report (struct info *p)
+{
+ struct foo *bar = (struct foo *)p->data;
+ EXPECT(__builtin_dynamic_object_size((char *)(bar + 1), 1), 16);
+ EXPECT(__builtin_dynamic_object_size((char *)(bar + 2), 1), 8);
+}
+
+int main(int argc, char *argv[])
+{
+ struct info *p = setup();
+ report(p);
+ return 0;
+}
diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc
index 8de264d1dee2..4c1fa9b555fa 100644
--- a/gcc/tree-object-size.cc
+++ b/gcc/tree-object-size.cc
@@ -762,9 +762,11 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
  1: the number of the elements of the object type;
4th argument 

[PATCH v8 0/5] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-03-29 Thread Qing Zhao
Hi,

This is the 8th version of the patch.

compare with the 7th version, the difference are:

updates per Joseph's comments:

1. Wording changes in diagnostics;
   "non flexible" to "non-flexible";
   Diagnostics starts with a lowercase letter;
2. Documentation changes:
   "named ``@var{count}'' to ``@var{count}'';
   use present tense in the documentation;
3. Checking "INTEGRAL_TYPE_P" instead of just INTEGER_TYPE for integer types.
   Add testcases for _Bool/enum/_BitInt count fields. 
4. Add handling for multiple counted_by attributes on the same field:
   Allow duplicates if they name the same field;
   Error when they name different fields.
   Add testcase for this.
5. Updates for comments style.


The 7th version is at:
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648087.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648088.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648089.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648090.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648091.html

It based on the following original proposal:

https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635884.html
Represent the missing dependence for the "counted_by" attribute and its 
consumers

**The summary of the proposal is:

* Add a new internal function ".ACCESS_WITH_SIZE" to carry the size information 
for every reference to a FAM field;
* In C FE, Replace every reference to a FAM field whose TYPE has the 
"counted_by" attribute with the new internal function ".ACCESS_WITH_SIZE";
* In every consumer of the size information, for example, BDOS or array bound 
sanitizer, query the size information or ACCESS_MODE information from the new 
internal function;
* When expansing to RTL, replace the internal function with the actual 
reference to the FAM field;
* Some adjustment to ipa alias analysis, and other SSA passes to mitigate the 
impact to the optimizer and code generation.


**The new internal function

  .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, TYPE_OF_SIZE, 
ACCESS_MODE, TYPE_OF_REF)

INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)

which returns the "REF_TO_OBJ" same as the 1st argument;

Both the return type and the type of the first argument of this function have 
been converted from the incomplete array type to the corresponding pointer type.

The call to .ACCESS_WITH_SIZE is wrapped with an INDIRECT_REF, whose type is 
the original imcomplete array type.

Please see the following link for why:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638793.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639605.html

1st argument "REF_TO_OBJ": The reference to the object;
2nd argument "REF_TO_SIZE": The reference to the size of the object,
3rd argument "CLASS_OF_SIZE": The size referenced by the REF_TO_SIZE represents
   0: the number of bytes;
   1: the number of the elements of the object type;
4th argument "TYPE_OF_SIZE": A constant 0 with the TYPE of the object
  refed by REF_TO_SIZE
5th argument "ACCESS_MODE":
  -1: Unknown access semantics
   0: none
   1: read_only
   2: write_only
   3: read_write
6th argument "TYPE_OF_REF": A constant 0 with the pointer TYPE to
  to the original flexible array type.

** The Patch sets included:

1. Provide counted_by attribute to flexible array member field;
  which includes:
  * "counted_by" attribute documentation;
  * C FE handling of the new attribute;
syntax checking, error reporting;
  * testing cases;

2. Convert "counted_by" attribute to/from .ACCESS_WITH_SIZE.
  which includes:
  * The definition of the new internal function .ACCESS_WITH_SIZE in 
internal-fn.def.
  * C FE converts every reference to a FAM with "counted_by" attribute to a 
call to the internal function .ACCESS_WITH_SIZE.
(build_component_ref in c_typeck.cc)
This includes the case when the object is statically allocated and 
initialized.
In order to make this working, we should update 
initializer_constant_valid_p_1 and output_constant in varasm.cc to include 
calls to .ACCESS_WITH_SIZE.

However, for the reference inside "offsetof", ignore the "counted_by" 
attribute since it's not useful at all. (c_parser_postfix_expression in 
c/c-parser.cc)
In addtion to "offsetof", for the reference inside operator "typeof" and
  "alignof", we ignore counted_by attribute too.
When building ADDR_EXPR for the .ACCESS_WITH_SIZE in C FE,
  replace the call with its first argument.

  * Convert every call to .ACCESS_WITH_SIZE to its first argument.
(expand_ACCESS_WITH_SIZE in internal-fn.cc)
  * adjust alias analysis to exclude the new internal from clobbering 
anything.
(ref_maybe_used_by_call_p_1 and call_may_clobber_ref_p_1 in 
tree-ssa-alias.cc)
  * adjust dead code elimination to eliminate the call to .ACCESS_WITH_SIZE 
when
it's LHS is eliminated as dead code.

[PATCH v8 1/5] Provide counted_by attribute to flexible array member field (PR108896)

2024-03-29 Thread Qing Zhao
'counted_by (COUNT)'
 The 'counted_by' attribute may be attached to the C99 flexible
 array member of a structure.  It indicates that the number of the
 elements of the array is given by the field "COUNT" in the
 same structure as the flexible array member.
 GCC may use this information to improve detection of object size 
information
 for such structures and provide better results in compile-time diagnostics
 and runtime features like the array bound sanitizer and
 the '__builtin_dynamic_object_size'.

 For instance, the following code:

  struct P {
size_t count;
char other;
char array[] __attribute__ ((counted_by (count)));
  } *p;

 specifies that the 'array' is a flexible array member whose number
 of elements is given by the field 'count' in the same structure.

 The field that represents the number of the elements should have an
 integer type.  Otherwise, the compiler reports an error and
 ignores the attribute.

 When the field that represents the number of the elements is assigned a
 negative integer value, the compiler treats the value as zero.

 An explicit 'counted_by' annotation defines a relationship between
 two objects, 'p->array' and 'p->count', and there are the following
 requirementthat on the relationship between this pair:

* 'p->count' must be initialized before the first reference to
  'p->array';

* 'p->array' has _at least_ 'p->count' number of elements
  available all the time.  This relationship must hold even
  after any of these related objects are updated during the
  program.

 It's the user's responsibility to make sure the above requirements
 to be kept all the time.  Otherwise the compiler reports
 warnings, at the same time, the results of the array bound
 sanitizer and the '__builtin_dynamic_object_size' is undefined.

 One important feature of the attribute is, a reference to the
 flexible array member field uses the latest value assigned to
 the field that represents the number of the elements before that
 reference.  For example,

p->count = val1;
p->array[20] = 0;  // ref1 to p->array
p->count = val2;
p->array[30] = 0;  // ref2 to p->array

 in the above, 'ref1' uses 'val1' as the number of the elements
 in 'p->array', and 'ref2' uses 'val2' as the number of elements
 in 'p->array'.

gcc/c-family/ChangeLog:

PR C/108896
* c-attribs.cc (handle_counted_by_attribute): New function.
(attribute_takes_identifier_p): Add counted_by attribute to the list.
* c-common.cc (c_flexible_array_member_type_p): ...To this.
* c-common.h (c_flexible_array_member_type_p): New prototype.

gcc/c/ChangeLog:

PR C/108896
* c-decl.cc (flexible_array_member_type_p): Renamed and moved to...
(add_flexible_array_elts_to_size): Use renamed function.
(is_flexible_array_member_p): Use renamed function.
(verify_counted_by_attribute): New function.
(finish_struct): Use renamed function and verify counted_by
attribute.
* c-tree.h (lookup_field): New prototype.
* c-typeck.cc (lookup_field): Expose as extern function.

gcc/ChangeLog:

PR C/108896
* doc/extend.texi: Document attribute counted_by.

gcc/testsuite/ChangeLog:

PR C/108896
* gcc.dg/flex-array-counted-by.c: New test.
* gcc.dg/flex-array-counted-by-7.c: New test.
---
 gcc/c-family/c-attribs.cc | 68 +++-
 gcc/c-family/c-common.cc  | 13 
 gcc/c-family/c-common.h   |  1 +
 gcc/c/c-decl.cc   | 78 +++
 gcc/c/c-tree.h|  1 +
 gcc/c/c-typeck.cc |  3 +-
 gcc/doc/extend.texi   | 68 
 .../gcc.dg/flex-array-counted-by-7.c  |  8 ++
 gcc/testsuite/gcc.dg/flex-array-counted-by.c  | 62 +++
 9 files changed, 282 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-7.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by.c

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 40a0cf90295d..39e5824ee7a5 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -105,6 +105,8 @@ static tree handle_warn_if_not_aligned_attribute (tree *, 
tree, tree,
  int, bool *);
 static tree handle_strict_flex_array_attribute (tree *, tree, tree,
 int, bool *);
+static tree handle_counted_by_attribute (tree *, tree, tree,
+  int, bool *);
 static tree handle_weak_attribute (tree *, tree, tree, int, bool *) ;
 

Re: [PATCH v7 1/5] Provide counted_by attribute to flexible array member field (PR108896)

2024-03-27 Thread Qing Zhao


> On Mar 26, 2024, at 13:20, Joseph Myers  wrote:
> 
> On Tue, 26 Mar 2024, Qing Zhao wrote:
> 
>>> What happens when there are multiple counted_by attributes on the same 
>>> field?  As far as I can see, all but one end up being ignored (by the code 
>>> that actually uses the attribute).
>> 
>> In general, is there any rule for handling multiple same attributes in 
>> GCC? i.e, from left to right, the last one wins? Or something else? I’d 
>> like to following the consistent rule with other places in GCC.
> 
> Sometimes, they are meaningful and all can be respected.  (An example is 
> the format_arg attribute, where ngettext legitimately has two such 
> attributes.)
> 
> When not meaningful, an error is appropriate.  For example, with section 
> attributes you can get
> 
>error ("section of %q+D conflicts with previous declaration",
>   *node);
> 
> if different sections are named.  I think that's a suitable model for the 
> new attribute here: allow duplicates if they name the same field, but give 
> errors if they name different fields, just as with the section attribute.
> 
> Once you give an error for multiple attributes naming different fields, 
> which one wins is just a question of error recovery; the specific choice 
> doesn't matter much, as long as you don't get an ICE in later processing.

Agreed and fixed as suggested.

Thanks.

Qing
> 
> -- 
> Joseph S. Myers
> josmy...@redhat.com



Re: [PATCH v7 1/5] Provide counted_by attribute to flexible array member field (PR108896)

2024-03-26 Thread Qing Zhao



> On Mar 26, 2024, at 11:21, Joseph Myers  wrote:
> 
> On Tue, 26 Mar 2024, Qing Zhao wrote:
> 
>>>> +@cindex @code{counted_by} variable attribute
>>>> +@item counted_by (@var{count})
>>>> +The @code{counted_by} attribute may be attached to the C99 flexible array
>>>> +member of a structure.  It indicates that the number of the elements of 
>>>> the
>>>> +array is given by the field named "@var{count}" in the same structure as 
>>>> the
>>>> +flexible array member.
>>> 
>>> You shouldn't use ASCII quotes like that in Texinfo (outside @code etc. 
>>> where they represent literal quotes in programming language source code).  
>>> You can say ``@var{count}'' if you wish to quote the name.
>> A little confused with the above..
>> So, what should I change in the above statement?
> 
> I don't think you actually need quotes (or "named") at all; just
> 
>  the field @var{count}
> 
> in place of
> 
>  the field named "@var{count}"
> 
> would suffice.

Okay, I see. -:) 
>  But if you use quotes (for an English-language quotation, 
> as opposed to when the quotes themselves are part of programming-language 
> source code given in the manual), in Texinfo you should use ``'' rather 
> than "".

Thanks for the explanation.

Qing
> 
> -- 
> Joseph S. Myers
> josmy...@redhat.com
> 



Re: [PATCH v7 3/5] Use the .ACCESS_WITH_SIZE in builtin object size.

2024-03-26 Thread Qing Zhao
Thanks, will update.

Qing

> On Mar 25, 2024, at 16:50, Joseph Myers  wrote:
> 
> On Wed, 20 Mar 2024, Qing Zhao wrote:
> 
>> +   the size of the element can be retrived from the result type of the call,
>> +   which is the pointer to the array type.  */
> 
> Again, start a sentence with an uppercase letter.
> 
>> +  /* if not for dynamic object size, return.  */
> 
>> +  /* result type is a pointer type to the original flexible array type.  */
> 
> Likewise.
> 
> -- 
> Joseph S. Myers
> josmy...@redhat.com
> 



Re: [PATCH v7 2/5] Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.

2024-03-26 Thread Qing Zhao


> On Mar 25, 2024, at 16:48, Joseph Myers  wrote:
> 
> On Wed, 20 Mar 2024, Qing Zhao wrote:
> 
>> +  /* get the TYPE of the counted_by field.  */
> 
> Start comments with an uppercase letter.
Okay.
> 
>> +   The type of the first argument of this function is a POINTER type
>> +   to the orignal flexible array type.
> 
> s/orignal/original/
Okay.
> 
>> +   If HANDLE_COUNTED_BY is true, check the counted_by attribute and generate
>> +   call to .ACCESS_WITH_SIZE. otherwise, ignore the attribute.  */
> 
> A sentence should start with an uppercase letter, "Otherwise”.
Okay.

> 
>> -  /* Ordinary case; arg is a COMPONENT_REF or a decl.  */
>> +  /* Ordinary case; arg is a COMPONENT_REF or a decl,or a call to
>> + .ACCESS_WITH_SIZE.  */
> 
> There should be a space after a comma.
Okay. (I remembered that I used contrib/check_GNU_style.sh check all the 
patches, not sure why such errors were not caught).
> 
>> +/* Get the corresponding reference from the call to a .ACCESS_WITH_SIZE.
>> + * i.e the first argument of this call. return NULL_TREE otherwise.  */
>> +extern tree get_ref_from_access_with_size (tree);
> 
> Again, start a sentence with an uppercase letter.
Okay.
> 
>> +case CALL_EXPR:
>> +  /* for a call to .ACCESS_WITH_SIZE, check the first argument.  */
> 
> Likewise.
Okay.
> 
>> +  /* for a call to .ACCESS_WITH_SIZE, check the first argument.  */
> 
> Likewise.
Okay.

Will update accordingly.

thanks.
Qing

> 
> -- 
> Joseph S. Myers
> josmy...@redhat.com
> 



Re: [PATCH v7 1/5] Provide counted_by attribute to flexible array member field (PR108896)

2024-03-26 Thread Qing Zhao
Hi, Joseph,

Thanks a lot for the reviews.

> On Mar 25, 2024, at 16:44, Joseph Myers  wrote:
> 
> On Wed, 20 Mar 2024, Qing Zhao wrote:
> 
>> +  /* This attribute only applies to a C99 flexible array member type.  */
>> +  else if (! c_flexible_array_member_type_p (TREE_TYPE (decl)))
>> +{
>> +  error_at (DECL_SOURCE_LOCATION (decl),
>> +"%qE attribute is not allowed for a non"
>> +" flexible array member field",
> 
> "non-flexible" not "non flexible" ("non" shouldn't appear as a word on its 
> own).

Okay.
> 
>> +  /* Error when the field is not found in the containing structure.  */
>> +  if (!counted_by_field)
>> +error_at (DECL_SOURCE_LOCATION (field_decl),
>> +  "Argument %qE to the %qE attribute is not a field declaration"
>> +  " in the same structure as %qD", fieldname,
> 
> Diagnostics should start with a lowercase letter, "argument" not 
> "Argument”.
Okay.

> 
>> +  if (TREE_CODE (TREE_TYPE (real_field)) != INTEGER_TYPE)
>> +error_at (DECL_SOURCE_LOCATION (field_decl),
>> +  "Argument %qE to the %qE attribute is not a field declaration"
>> +  " with an integer type", fieldname,
> 
> Likewise.
Okay.
> 
> Generally checks for integer types should allow any INTEGRAL_TYPE_P, 
> rather than just INTEGER_TYPE.  For example, it should be valid to use 
> this attribute with a field with _BitInt type.  (It would be fairly 
> useless with a _BitInt larger than size_t, but maybe e.g. someone knows 
> the size in their code must fit into 24 bits and so uses unsigned 
> _BitInt(24) for the field.)

Okay.  Will change this. 
> 
> Of course there should be corresponding testcases for _Bool / enum / 
> _BitInt count fields.

And add corresponding testing cases.
> 
> What happens when there are multiple counted_by attributes on the same 
> field?  As far as I can see, all but one end up being ignored (by the code 
> that actually uses the attribute).

In general, is there any rule for handling multiple same attributes in GCC? 
i.e, from left to right, the last one wins? Or something else? I’d like to 
following the consistent rule with other places in GCC. 


>  I think multiple such attributes using 
> different identifiers should be diagnosed, even if all the identifiers are 
> indeed integer fields in the same structure - it doesn't seem meaningful 
> to say that multiple fields give the count of elements.

Yes, this is reasonable. Shall we ignore all but the last one? And issue 
warnings at the same time? 


i.e. for the following:
struct trailing_array {
int c1;
int final;
int array_4[] __attribute ((counted_by (c1))) __attribute ((counted_by 
(final))); 
};

For the above, issue warning by default:

 multiple  'counted-by' attribute specified for the same flexible array member 
field “array_4”, only the last one “final” is valid.

??

>  (Multiple 
> attributes with the *same* identifier are probably OK to allow; maybe that 
> could arise in code using complicated macros that end up adding the 
> attribute more than once.)

Okay. 
Shall we issue warnings for this case? (Probably not??)

> 
>> +@cindex @code{counted_by} variable attribute
>> +@item counted_by (@var{count})
>> +The @code{counted_by} attribute may be attached to the C99 flexible array
>> +member of a structure.  It indicates that the number of the elements of the
>> +array is given by the field named "@var{count}" in the same structure as the
>> +flexible array member.
> 
> You shouldn't use ASCII quotes like that in Texinfo (outside @code etc. 
> where they represent literal quotes in programming language source code).  
> You can say ``@var{count}'' if you wish to quote the name.
A little confused with the above..
So, what should I change in the above statement?
> 
>> +The field that represents the number of the elements should have an
>> +integer type.  Otherwise, the compiler will report a warning and ignore
>> +the attribute.
>> +When the field that represents the number of the elements is assigned a
>> +negative integer value, the compiler will treat the value as zero.
> 
> In general it's best for documentation to be in the present tense (so the 
> compiler *reports* a warning rather than "will report", *treats* the value 
> as zero rather than "will treat").

 thanks, will update them. 
> 
>> +It's the user's responsibility to make sure the above requirements to
>> +be kept all the time.  Otherwise the compiler will report warnings,
>> +at the same time, the results of the array bound sanitizer and the
>> +@code{__builtin_dynamic_object_size} is undefined.
> 
> Likewise.

Okay. 
Thanks a lot.
Qing

> 
> -- 
> Joseph S. Myers
> josmy...@redhat.com
> 



[PATCH v7 4/5] Use the .ACCESS_WITH_SIZE in bound sanitizer.

2024-03-20 Thread Qing Zhao
gcc/c-family/ChangeLog:

* c-ubsan.cc (get_bound_from_access_with_size): New function.
(ubsan_instrument_bounds): Handle call to .ACCESS_WITH_SIZE.

gcc/testsuite/ChangeLog:

* gcc.dg/ubsan/flex-array-counted-by-bounds-2.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds-3.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds-4.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds.c: New test.
---
 gcc/c-family/c-ubsan.cc   | 42 +
 .../ubsan/flex-array-counted-by-bounds-2.c| 45 ++
 .../ubsan/flex-array-counted-by-bounds-3.c| 34 ++
 .../ubsan/flex-array-counted-by-bounds-4.c| 34 ++
 .../ubsan/flex-array-counted-by-bounds.c  | 46 +++
 5 files changed, 201 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-3.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-4.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c

diff --git a/gcc/c-family/c-ubsan.cc b/gcc/c-family/c-ubsan.cc
index 940982819ddf..7cd3c6aa5b88 100644
--- a/gcc/c-family/c-ubsan.cc
+++ b/gcc/c-family/c-ubsan.cc
@@ -376,6 +376,40 @@ ubsan_instrument_return (location_t loc)
   return build_call_expr_loc (loc, t, 1, build_fold_addr_expr_loc (loc, data));
 }
 
+/* Get the tree that represented the number of counted_by, i.e, the maximum
+   number of the elements of the object that the call to .ACCESS_WITH_SIZE
+   points to, this number will be the bound of the corresponding array.  */
+static tree
+get_bound_from_access_with_size (tree call)
+{
+  if (!is_access_with_size_p (call))
+return NULL_TREE;
+
+  tree ref_to_size = CALL_EXPR_ARG (call, 1);
+  unsigned int class_of_size = TREE_INT_CST_LOW (CALL_EXPR_ARG (call, 2));
+  tree type = TREE_TYPE (CALL_EXPR_ARG (call, 3));
+  tree size = fold_build2 (MEM_REF, type, unshare_expr (ref_to_size),
+  build_int_cst (ptr_type_node, 0));
+  /* If size is negative value, treat it as zero.  */
+  if (!TYPE_UNSIGNED (type))
+  {
+tree cond = fold_build2 (LT_EXPR, boolean_type_node,
+unshare_expr (size), build_zero_cst (type));
+size = fold_build3 (COND_EXPR, type, cond,
+   build_zero_cst (type), size);
+  }
+
+  /* Only when class_of_size is 1, i.e, the number of the elements of
+ the object type, return the size.  */
+  if (class_of_size != 1)
+return NULL_TREE;
+  else
+size = fold_convert (sizetype, size);
+
+  return size;
+}
+
+
 /* Instrument array bounds for ARRAY_REFs.  We create special builtin,
that gets expanded in the sanopt pass, and make an array dimension
of it.  ARRAY is the array, *INDEX is an index to the array.
@@ -401,6 +435,14 @@ ubsan_instrument_bounds (location_t loc, tree array, tree 
*index,
  && COMPLETE_TYPE_P (type)
  && integer_zerop (TYPE_SIZE (type)))
bound = build_int_cst (TREE_TYPE (TYPE_MIN_VALUE (domain)), -1);
+  else if (INDIRECT_REF_P (array)
+  && is_access_with_size_p ((TREE_OPERAND (array, 0
+   {
+ bound = get_bound_from_access_with_size ((TREE_OPERAND (array, 0)));
+ bound = fold_build2 (MINUS_EXPR, TREE_TYPE (bound),
+  bound,
+  build_int_cst (TREE_TYPE (bound), 1));
+   }
   else
return NULL_TREE;
 }
diff --git a/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c 
b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
new file mode 100644
index ..148934975ee5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
@@ -0,0 +1,45 @@
+/* test the attribute counted_by and its usage in
+   bounds sanitizer combined with VLA.  */
+/* { dg-do run } */
+/* { dg-options "-fsanitize=bounds" } */
+/* { dg-output "index 11 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 20 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 11 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 10 out of bounds for type 'int 
\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+
+
+#include 
+
+void __attribute__((__noinline__)) setup_and_test_vla (int n, int m)
+{
+   struct foo {
+   int n;
+   int p[][n] __attribute__((counted_by(n)));
+   } *f;
+
+   f = (struct foo *) malloc (sizeof(struct foo) + m*sizeof(int[n]));
+   f->n = m;
+   f->p[m][n-1]=1;
+   return;
+}
+
+void __attribute__((__noinline__)) setup_and_test_vla_1 (int n1, int n2, int m)
+{
+  struct foo {
+int n;
+int p[][n2][n1] __attribute__((counted_by(n)));
+  } *f;
+
+  f = (struct foo *) malloc 

[PATCH v7 5/5] Add the 6th argument to .ACCESS_WITH_SIZE

2024-03-20 Thread Qing Zhao
to carry the TYPE of the flexible array.

Such information is needed during tree-object-size.cc.

We cannot use the result type or the type of the 1st argument
of the routine .ACCESS_WITH_SIZE to decide the element type
of the original array due to possible type casting in the
source code.

gcc/c/ChangeLog:

* c-typeck.cc (build_access_with_size_for_counted_by): Add the 6th
argument to .ACCESS_WITH_SIZE.

gcc/ChangeLog:

* tree-object-size.cc (access_with_size_object_size): Use the type
of the 6th argument for the type of the element.

gcc/testsuite/ChangeLog:

* gcc.dg/flex-array-counted-by-6.c: New test.
---
 gcc/c/c-typeck.cc | 11 +++--
 gcc/internal-fn.cc|  2 +
 .../gcc.dg/flex-array-counted-by-6.c  | 46 +++
 gcc/tree-object-size.cc   | 16 ---
 4 files changed, 66 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-6.c

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index a29a7d7ec029..c17ac6862546 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -2608,7 +2608,8 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
*counted_by_type)
 
to:
 
-   (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1))
+   (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1,
+   (TYPE_OF_ARRAY *)0))
 
NOTE: The return type of this function is the POINTER type pointing
to the original flexible array type.
@@ -2620,6 +2621,9 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
*counted_by_type)
The 4th argument of the call is a constant 0 with the TYPE of the
object pointed by COUNTED_BY_REF.
 
+   The 6th argument of the call is a constant 0 with the pointer TYPE
+   to the original flexible array type.
+
   */
 static tree
 build_access_with_size_for_counted_by (location_t loc, tree ref,
@@ -2632,12 +2636,13 @@ build_access_with_size_for_counted_by (location_t loc, 
tree ref,
 
   tree call
 = build_call_expr_internal_loc (loc, IFN_ACCESS_WITH_SIZE,
-   result_type, 5,
+   result_type, 6,
array_to_pointer_conversion (loc, ref),
counted_by_ref,
build_int_cst (integer_type_node, 1),
build_int_cst (counted_by_type, 0),
-   build_int_cst (integer_type_node, -1));
+   build_int_cst (integer_type_node, -1),
+   build_int_cst (result_type, 0));
   /* Wrap the call with an INDIRECT_REF with the flexible array type.  */
   call = build1 (INDIRECT_REF, TREE_TYPE (ref), call);
   SET_EXPR_LOCATION (call, loc);
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index e744080ee670..34e4a4aea534 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3411,6 +3411,8 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
  1: read_only
  2: write_only
  3: read_write
+   6th argument: A constant 0 with the pointer TYPE to the original flexible
+ array type.
 
Both the return type and the type of the first argument of this
function have been converted from the incomplete array type to
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c
new file mode 100644
index ..65a401796479
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c
@@ -0,0 +1,46 @@
+/* test the attribute counted_by and its usage in
+ * __builtin_dynamic_object_size. when the type of the flexible array member
+ * is casting to another type.  */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "builtin-object-size-common.h"
+
+typedef unsigned short u16;
+
+struct info {
+   u16 data_len;
+   char data[] __attribute__((counted_by(data_len)));
+};
+
+struct foo {
+   int a;
+   int b;
+};
+
+static __attribute__((__noinline__))
+struct info *setup ()
+{
+ struct info *p;
+ size_t bytes = 3 * sizeof(struct foo);
+
+ p = (struct info *)malloc (sizeof (struct info) + bytes);
+ p->data_len = bytes;
+
+ return p;
+}
+
+static void
+__attribute__((__noinline__)) report (struct info *p)
+{
+ struct foo *bar = (struct foo *)p->data;
+ EXPECT(__builtin_dynamic_object_size((char *)(bar + 1), 1), 16);
+ EXPECT(__builtin_dynamic_object_size((char *)(bar + 2), 1), 8);
+}
+
+int main(int argc, char *argv[])
+{
+ struct info *p = setup();
+ report(p);
+ return 0;
+}
diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc
index d258d0947545..ee9a0415c21c 100644
--- a/gcc/tree-object-size.cc
+++ b/gcc/tree-object-size.cc
@@ -762,9 +762,11 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
  1: the number of the elements of the object type;
4th argument 

[PATCH v7 3/5] Use the .ACCESS_WITH_SIZE in builtin object size.

2024-03-20 Thread Qing Zhao
gcc/ChangeLog:

* tree-object-size.cc (access_with_size_object_size): New function.
(call_object_size): Call the new function.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-object-size-common.h: Add a new macro EXPECT.
* gcc.dg/flex-array-counted-by-3.c: New test.
* gcc.dg/flex-array-counted-by-4.c: New test.
* gcc.dg/flex-array-counted-by-5.c: New test.
---
 .../gcc.dg/builtin-object-size-common.h   |  11 ++
 .../gcc.dg/flex-array-counted-by-3.c  |  63 +++
 .../gcc.dg/flex-array-counted-by-4.c  | 178 ++
 .../gcc.dg/flex-array-counted-by-5.c  |  48 +
 gcc/tree-object-size.cc   |  60 ++
 5 files changed, 360 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-5.c

diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-common.h 
b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
index 66ff7cdd953a..b677067c6e6b 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-common.h
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
@@ -30,3 +30,14 @@ unsigned nfails = 0;
   __builtin_abort ();\
 return 0;\
   } while (0)
+
+#define EXPECT(p, _v) do {   \
+  size_t v = _v; \
+  if (p == v)\
+__builtin_printf ("ok:  %s == %zd\n", #p, p);\
+  else   \
+{\
+  __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v);\
+  FAIL ();   \
+}\
+} while (0);
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
new file mode 100644
index ..0066c32ca808
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
@@ -0,0 +1,63 @@
+/* test the attribute counted_by and its usage in
+ * __builtin_dynamic_object_size.  */ 
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "builtin-object-size-common.h"
+
+struct flex {
+  int b;
+  int c[];
+} *array_flex;
+
+struct annotated {
+  int b;
+  int c[] __attribute__ ((counted_by (b)));
+} *array_annotated;
+
+struct nested_annotated {
+  struct {
+union {
+  int b;
+  float f; 
+};
+int n;
+  };
+  int c[] __attribute__ ((counted_by (b)));
+} *array_nested_annotated;
+
+void __attribute__((__noinline__)) setup (int normal_count, int attr_count)
+{
+  array_flex
+= (struct flex *)malloc (sizeof (struct flex)
++ normal_count *  sizeof (int));
+  array_flex->b = normal_count;
+
+  array_annotated
+= (struct annotated *)malloc (sizeof (struct annotated)
+ + attr_count *  sizeof (int));
+  array_annotated->b = attr_count;
+
+  array_nested_annotated
+= (struct nested_annotated *)malloc (sizeof (struct nested_annotated)
++ attr_count *  sizeof (int));
+  array_nested_annotated->b = attr_count;
+
+  return;
+}
+
+void __attribute__((__noinline__)) test ()
+{
+EXPECT(__builtin_dynamic_object_size(array_flex->c, 1), -1);
+EXPECT(__builtin_dynamic_object_size(array_annotated->c, 1),
+  array_annotated->b * sizeof (int));
+EXPECT(__builtin_dynamic_object_size(array_nested_annotated->c, 1),
+  array_nested_annotated->b * sizeof (int));
+}
+
+int main(int argc, char *argv[])
+{
+  setup (10,10);   
+  test ();
+  DONE ();
+}
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
new file mode 100644
index ..3ce7f3545549
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
@@ -0,0 +1,178 @@
+/* test the attribute counted_by and its usage in
+__builtin_dynamic_object_size: what's the correct behavior when the
+allocation size mismatched with the value of counted_by attribute?
+we should always use the latest value that is hold by the counted_by
+field.  */
+/* { dg-do run } */
+/* { dg-options "-O -fstrict-flex-arrays=3" } */
+
+#include "builtin-object-size-common.h"
+
+struct annotated {
+  size_t foo;
+  char others;
+  char array[] __attribute__((counted_by (foo)));
+};
+
+#define noinline __attribute__((__noinline__))
+#define SIZE_BUMP 10 
+#define MAX(a, b) ((a) > (b) ? (a) : (b))
+
+/* In general, Due to type casting, the type for the pointee of a pointer
+   does not say 

[PATCH v7 2/5] Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.

2024-03-20 Thread Qing Zhao
Including the following changes:
* The definition of the new internal function .ACCESS_WITH_SIZE
  in internal-fn.def.
* C FE converts every reference to a FAM with a "counted_by" attribute
  to a call to the internal function .ACCESS_WITH_SIZE.
  (build_component_ref in c_typeck.cc)

  This includes the case when the object is statically allocated and
  initialized.
  In order to make this working, the routines initializer_constant_valid_p_1
  and output_constant in varasm.cc are updated to handle calls to
  .ACCESS_WITH_SIZE.
  (initializer_constant_valid_p_1 and output_constant in varasm.c)

  However, for the reference inside "offsetof", the "counted_by" attribute is
  ignored since it's not useful at all.
  (c_parser_postfix_expression in c/c-parser.cc)

  In addtion to "offsetof", for the reference inside operator "typeof" and
  "alignof", we ignore counted_by attribute too.

  When building ADDR_EXPR for the .ACCESS_WITH_SIZE in C FE,
  replace the call with its first argument.

* Convert every call to .ACCESS_WITH_SIZE to its first argument.
  (expand_ACCESS_WITH_SIZE in internal-fn.cc)
* Adjust alias analysis to exclude the new internal from clobbering anything.
  (ref_maybe_used_by_call_p_1 and call_may_clobber_ref_p_1 in tree-ssa-alias.cc)
* Adjust dead code elimination to eliminate the call to .ACCESS_WITH_SIZE when
  it's LHS is eliminated as dead code.
  (eliminate_unnecessary_stmts in tree-ssa-dce.cc)
* Provide the utility routines to check the call is .ACCESS_WITH_SIZE and
  get the reference from the call to .ACCESS_WITH_SIZE.
  (is_access_with_size_p and get_ref_from_access_with_size in tree.cc)

gcc/c/ChangeLog:

* c-parser.cc (c_parser_postfix_expression): Ignore the counted-by
attribute when build_component_ref inside offsetof operator.
* c-tree.h (build_component_ref): Add one more parameter.
* c-typeck.cc (build_counted_by_ref): New function.
(build_access_with_size_for_counted_by): New function.
(build_component_ref): Check the counted-by attribute and build
call to .ACCESS_WITH_SIZE.
(build_unary_op): When building ADDR_EXPR for
.ACCESS_WITH_SIZE, use its first argument.
(lvalue_p): Accept call to .ACCESS_WITH_SIZE.

gcc/ChangeLog:

* internal-fn.cc (expand_ACCESS_WITH_SIZE): New function.
* internal-fn.def (ACCESS_WITH_SIZE): New internal function.
* tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Special case
IFN_ACCESS_WITH_SIZE.
(call_may_clobber_ref_p_1): Special case IFN_ACCESS_WITH_SIZE.
* tree-ssa-dce.cc (eliminate_unnecessary_stmts): Eliminate the call
to .ACCESS_WITH_SIZE when its LHS is dead.
* tree.cc (process_call_operands): Adjust side effect for function
.ACCESS_WITH_SIZE.
(is_access_with_size_p): New function.
(get_ref_from_access_with_size): New function.
* tree.h (is_access_with_size_p): New prototype.
(get_ref_from_access_with_size): New prototype.
* varasm.cc (initializer_constant_valid_p_1): Handle call to
.ACCESS_WITH_SIZE.
(output_constant): Handle call to .ACCESS_WITH_SIZE.

gcc/testsuite/ChangeLog:

* gcc.dg/flex-array-counted-by-2.c: New test.
---
 gcc/c/c-parser.cc |  10 +-
 gcc/c/c-tree.h|   2 +-
 gcc/c/c-typeck.cc | 128 +-
 gcc/internal-fn.cc|  35 +
 gcc/internal-fn.def   |   4 +
 .../gcc.dg/flex-array-counted-by-2.c  | 112 +++
 gcc/tree-ssa-alias.cc |   2 +
 gcc/tree-ssa-dce.cc   |   5 +-
 gcc/tree.cc   |  25 +++-
 gcc/tree.h|   8 ++
 gcc/varasm.cc |  10 ++
 11 files changed, 331 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index c31349dae2ff..a6ed5ac43bb1 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -10850,9 +10850,12 @@ c_parser_postfix_expression (c_parser *parser)
if (c_parser_next_token_is (parser, CPP_NAME))
  {
c_token *comp_tok = c_parser_peek_token (parser);
+   /* Ignore the counted_by attribute for reference inside
+  offsetof since the information is not useful at all.  */
offsetof_ref
  = build_component_ref (loc, offsetof_ref, comp_tok->value,
-comp_tok->location, UNKNOWN_LOCATION);
+comp_tok->location, UNKNOWN_LOCATION,
+false);
c_parser_consume_token (parser);
while (c_parser_next_token_is (parser, CPP_DOT)
   || 

[PATCH v7 1/5] Provide counted_by attribute to flexible array member field (PR108896)

2024-03-20 Thread Qing Zhao
'counted_by (COUNT)'
 The 'counted_by' attribute may be attached to the C99 flexible
 array member of a structure.  It indicates that the number of the
 elements of the array is given by the field named "COUNT" in the
 same structure as the flexible array member.
 GCC may use this information to improve detection of object size 
information
 for such structures and provide better results in compile-time diagnostics
 and runtime features like the array bound sanitizer and
 the '__builtin_dynamic_object_size'.

 For instance, the following code:

  struct P {
size_t count;
char other;
char array[] __attribute__ ((counted_by (count)));
  } *p;

 specifies that the 'array' is a flexible array member whose number
 of elements is given by the field 'count' in the same structure.

 The field that represents the number of the elements should have an
 integer type.  Otherwise, the compiler will report a warning and
 ignore the attribute.

 When the field that represents the number of the elements is assigned a
 negative integer value, the compiler will treat the value as zero.

 An explicit 'counted_by' annotation defines a relationship between
 two objects, 'p->array' and 'p->count', and there are the following
 requirementthat on the relationship between this pair:

* 'p->count' must be initialized before the first reference to
  'p->array';

* 'p->array' has _at least_ 'p->count' number of elements
  available all the time.  This relationship must hold even
  after any of these related objects are updated during the
  program.

 It's the user's responsibility to make sure the above requirements
 to be kept all the time.  Otherwise the compiler will report
 warnings, at the same time, the results of the array bound
 sanitizer and the '__builtin_dynamic_object_size' is undefined.

 One important feature of the attribute is, a reference to the
 flexible array member field will use the latest value assigned to
 the field that represents the number of the elements before that
 reference.  For example,

p->count = val1;
p->array[20] = 0;  // ref1 to p->array
p->count = val2;
p->array[30] = 0;  // ref2 to p->array

 in the above, 'ref1' will use 'val1' as the number of the elements
 in 'p->array', and 'ref2' will use 'val2' as the number of elements
 in 'p->array'.

gcc/c-family/ChangeLog:

PR C/108896
* c-attribs.cc (handle_counted_by_attribute): New function.
(attribute_takes_identifier_p): Add counted_by attribute to the list.
* c-common.cc (c_flexible_array_member_type_p): ...To this.
* c-common.h (c_flexible_array_member_type_p): New prototype.

gcc/c/ChangeLog:

PR C/108896
* c-decl.cc (flexible_array_member_type_p): Renamed and moved to...
(add_flexible_array_elts_to_size): Use renamed function.
(is_flexible_array_member_p): Use renamed function.
(verify_counted_by_attribute): New function.
(finish_struct): Use renamed function and verify counted_by
attribute.
* c-tree.h (lookup_field): New prototype.
* c-typeck.cc (lookup_field): Expose as extern function.

gcc/ChangeLog:

PR C/108896
* doc/extend.texi: Document attribute counted_by.

gcc/testsuite/ChangeLog:

PR C/108896
* gcc.dg/flex-array-counted-by.c: New test.
---
 gcc/c-family/c-attribs.cc| 54 +-
 gcc/c-family/c-common.cc | 13 
 gcc/c-family/c-common.h  |  1 +
 gcc/c/c-decl.cc  | 78 +++-
 gcc/c/c-tree.h   |  1 +
 gcc/c/c-typeck.cc|  3 +-
 gcc/doc/extend.texi  | 67 +
 gcc/testsuite/gcc.dg/flex-array-counted-by.c | 40 ++
 8 files changed, 237 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by.c

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 40a0cf90295d..51cf91c4fbfd 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -105,6 +105,8 @@ static tree handle_warn_if_not_aligned_attribute (tree *, 
tree, tree,
  int, bool *);
 static tree handle_strict_flex_array_attribute (tree *, tree, tree,
 int, bool *);
+static tree handle_counted_by_attribute (tree *, tree, tree,
+  int, bool *);
 static tree handle_weak_attribute (tree *, tree, tree, int, bool *) ;
 static tree handle_noplt_attribute (tree *, tree, tree, int, bool *) ;
 static tree handle_alias_ifunc_attribute (bool, tree *, tree, tree, bool *);
@@ -412,6 

[PATCH v7 0/5] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-03-20 Thread Qing Zhao
Hi,

This is the 7th version of the patch.

compare with the 6th version, the difference are:

updates per Siddhesh's comments:
1. update the error messages in "handle_counted_by_attribute"
   then update the testing case accordingly;
2. update the error messages in "verify_counted_by_attribute"
   then update the testing case accordingly;
3. update the documentation of "counted_by" in extend.texi
4. for the 3rd argument of ACCESS_WITH_SIZE, change it as following:
+   3rd argument CLASS_OF_SIZE: The size referenced by the REF_TO_SIZE 
represents
+ 0: the number of bytes;
+ 1: the number of the elements of the object type;

Update all other places accordingly.
5. update the comments of the routine "access_with_size_object_size"
   bail out if (object_size_type & OST_DYNAMIC) == 0 for this routine.
   change the variable name of "type_of_size" to "class_of_size" for 
   consistence.
6. add one more testing case for bound sanitizer to handle the case when
   counted-by field is zero value.


It based on the following original proposal:

https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635884.html
Represent the missing dependence for the "counted_by" attribute and its 
consumers

**The summary of the proposal is:

* Add a new internal function ".ACCESS_WITH_SIZE" to carry the size information 
for every reference to a FAM field;
* In C FE, Replace every reference to a FAM field whose TYPE has the 
"counted_by" attribute with the new internal function ".ACCESS_WITH_SIZE";
* In every consumer of the size information, for example, BDOS or array bound 
sanitizer, query the size information or ACCESS_MODE information from the new 
internal function;
* When expansing to RTL, replace the internal function with the actual 
reference to the FAM field;
* Some adjustment to ipa alias analysis, and other SSA passes to mitigate the 
impact to the optimizer and code generation.


**The new internal function

  .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, TYPE_OF_SIZE, 
ACCESS_MODE, TYPE_OF_REF)

INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)

which returns the "REF_TO_OBJ" same as the 1st argument;

Both the return type and the type of the first argument of this function have 
been converted from the incomplete array type to the corresponding pointer type.

The call to .ACCESS_WITH_SIZE is wrapped with an INDIRECT_REF, whose type is 
the original imcomplete array type.

Please see the following link for why:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638793.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639605.html

1st argument "REF_TO_OBJ": The reference to the object;
2nd argument "REF_TO_SIZE": The reference to the size of the object,
3rd argument "CLASS_OF_SIZE": The size referenced by the REF_TO_SIZE represents
   0: the number of bytes;
   1: the number of the elements of the object type;
4th argument "TYPE_OF_SIZE": A constant 0 with the TYPE of the object
  refed by REF_TO_SIZE
5th argument "ACCESS_MODE":
  -1: Unknown access semantics
   0: none
   1: read_only
   2: write_only
   3: read_write
6th argument "TYPE_OF_REF": A constant 0 with the pointer TYPE to
  to the original flexible array type.

** The Patch sets included:

1. Provide counted_by attribute to flexible array member field;
  which includes:
  * "counted_by" attribute documentation;
  * C FE handling of the new attribute;
syntax checking, error reporting;
  * testing cases;

2. Convert "counted_by" attribute to/from .ACCESS_WITH_SIZE.
  which includes:
  * The definition of the new internal function .ACCESS_WITH_SIZE in 
internal-fn.def.
  * C FE converts every reference to a FAM with "counted_by" attribute to a 
call to the internal function .ACCESS_WITH_SIZE.
(build_component_ref in c_typeck.cc)
This includes the case when the object is statically allocated and 
initialized.
In order to make this working, we should update 
initializer_constant_valid_p_1 and output_constant in varasm.cc to include 
calls to .ACCESS_WITH_SIZE.

However, for the reference inside "offsetof", ignore the "counted_by" 
attribute since it's not useful at all. (c_parser_postfix_expression in 
c/c-parser.cc)
In addtion to "offsetof", for the reference inside operator "typeof" and
  "alignof", we ignore counted_by attribute too.
When building ADDR_EXPR for the .ACCESS_WITH_SIZE in C FE,
  replace the call with its first argument.

  * Convert every call to .ACCESS_WITH_SIZE to its first argument.
(expand_ACCESS_WITH_SIZE in internal-fn.cc)
  * adjust alias analysis to exclude the new internal from clobbering 
anything.
(ref_maybe_used_by_call_p_1 and call_may_clobber_ref_p_1 in 
tree-ssa-alias.cc)
  * adjust dead code elimination to eliminate the call to .ACCESS_WITH_SIZE 
when
it's LHS is eliminated as dead code.
(eliminate_unnecessary_stmts in tree-ssa-dce.cc)

Re: [PATCH][tree-object-size]Pass OST_DYNAMIC information to early_object_size phase

2024-03-19 Thread Qing Zhao


On Mar 19, 2024, at 09:46, Jakub Jelinek  wrote:

On Tue, Mar 19, 2024 at 01:14:51PM +, Qing Zhao wrote:
Currently, the OST_DYNAMIC information is not passed to
early_object_sizes phase. Pass this information to it, and adjust the code
and testing case accordingly.

Can you explain why do you think this is desirable?
Having both passes emit the dynamic instrumentation is IMHO undesirable,
the first pass exists just to catch subobject properties which are later
lost.

Okay, thanks for the comments. This makes good sense to me. So, the dynamic 
information
was intended to be ignored in the early pass.

I will try to fix the original issue (for the counted-by patches) in the other 
direction.


In any case, if this isn't a regression fix, it isn't suitable for
stage4, seems quite risky.

Agreed.

thanks.

Qing



* tree-object-size.cc (early_object_sizes_execute_one): Add one more
argument is_dynamic.
(object_sizes_execute): Call early_object_sizes_execute_one with one
more argument.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-dynamic-object-size-10.c: Update testing case.
---
gcc/testsuite/gcc.dg/builtin-dynamic-object-size-10.c |  4 ++--
gcc/tree-object-size.cc   | 11 ---
2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-10.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-10.c
index 3a2d9821a44e..3c5430b51358 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-10.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-10.c
@@ -7,5 +7,5 @@

/* early_objsz should resolve __builtin_dynamic_object_size like
   __builtin_object_size.  */
-/* { dg-final { scan-tree-dump "maximum object size 21" "early_objsz" } } */
-/* { dg-final { scan-tree-dump "maximum subobject size 16" "early_objsz" } } */
+/* { dg-final { scan-tree-dump "maximum dynamic object size 21" "early_objsz" 
} } */
+/* { dg-final { scan-tree-dump "maximum dynamic subobject size 16" 
"early_objsz" } } */
diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc
index 018fbc30cbb6..57739eed3abf 100644
--- a/gcc/tree-object-size.cc
+++ b/gcc/tree-object-size.cc
@@ -2050,7 +2050,8 @@ do_valueize (tree t)
   since we're only looking for constant bounds.  */

static void
-early_object_sizes_execute_one (gimple_stmt_iterator *i, gimple *call)
+early_object_sizes_execute_one (gimple_stmt_iterator *i, gimple *call,
+ bool is_dynamic)
{
  tree ost = gimple_call_arg (call, 1);
  tree lhs = gimple_call_lhs (call);
@@ -2060,9 +2061,12 @@ early_object_sizes_execute_one (gimple_stmt_iterator *i, 
gimple *call)
return;

  unsigned HOST_WIDE_INT object_size_type = tree_to_uhwi (ost);
+  if (is_dynamic)
+object_size_type |= OST_DYNAMIC;
+
  tree ptr = gimple_call_arg (call, 0);

-  if (object_size_type != 1 && object_size_type != 3)
+  if ((object_size_type & OST_SUBOBJECT) == 0)
return;

  if (TREE_CODE (ptr) != ADDR_EXPR && TREE_CODE (ptr) != SSA_NAME)
@@ -2071,6 +2075,7 @@ early_object_sizes_execute_one (gimple_stmt_iterator *i, 
gimple *call)
  tree type = TREE_TYPE (lhs);
  tree bytes;
  if (!compute_builtin_object_size (ptr, object_size_type, )
+  || (TREE_CODE (bytes) != INTEGER_CST)
  || !int_fits_type_p (bytes, type))
return;

@@ -2153,7 +2158,7 @@ object_sizes_execute (function *fun, bool early)
 __builtin_dynamic_object_size too.  */
  if (early)
{
-   early_object_sizes_execute_one (, call);
+   early_object_sizes_execute_one (, call, dynamic);
  continue;
}

--
2.31.1

Jakub



[PATCH][tree-object-size]Pass OST_DYNAMIC information to early_object_size phase

2024-03-19 Thread Qing Zhao
 Currently, the OST_DYNAMIC information is not passed to
 early_object_sizes phase. Pass this information to it, and adjust the code
 and testing case accordingly.

bootstrapped and regress tested on both x86 and aarch64. no issue.

Okay for trunk?

thanks.

Qing

gcc/ChangeLog:

* tree-object-size.cc (early_object_sizes_execute_one): Add one more
argument is_dynamic.
(object_sizes_execute): Call early_object_sizes_execute_one with one
more argument.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-dynamic-object-size-10.c: Update testing case.
---
 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-10.c |  4 ++--
 gcc/tree-object-size.cc   | 11 ---
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-10.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-10.c
index 3a2d9821a44e..3c5430b51358 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-10.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-10.c
@@ -7,5 +7,5 @@
 
 /* early_objsz should resolve __builtin_dynamic_object_size like
__builtin_object_size.  */
-/* { dg-final { scan-tree-dump "maximum object size 21" "early_objsz" } } */
-/* { dg-final { scan-tree-dump "maximum subobject size 16" "early_objsz" } } */
+/* { dg-final { scan-tree-dump "maximum dynamic object size 21" "early_objsz" 
} } */
+/* { dg-final { scan-tree-dump "maximum dynamic subobject size 16" 
"early_objsz" } } */
diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc
index 018fbc30cbb6..57739eed3abf 100644
--- a/gcc/tree-object-size.cc
+++ b/gcc/tree-object-size.cc
@@ -2050,7 +2050,8 @@ do_valueize (tree t)
since we're only looking for constant bounds.  */
 
 static void
-early_object_sizes_execute_one (gimple_stmt_iterator *i, gimple *call)
+early_object_sizes_execute_one (gimple_stmt_iterator *i, gimple *call,
+   bool is_dynamic)
 {
   tree ost = gimple_call_arg (call, 1);
   tree lhs = gimple_call_lhs (call);
@@ -2060,9 +2061,12 @@ early_object_sizes_execute_one (gimple_stmt_iterator *i, 
gimple *call)
 return;
 
   unsigned HOST_WIDE_INT object_size_type = tree_to_uhwi (ost);
+  if (is_dynamic)
+object_size_type |= OST_DYNAMIC;
+
   tree ptr = gimple_call_arg (call, 0);
 
-  if (object_size_type != 1 && object_size_type != 3)
+  if ((object_size_type & OST_SUBOBJECT) == 0)
 return;
 
   if (TREE_CODE (ptr) != ADDR_EXPR && TREE_CODE (ptr) != SSA_NAME)
@@ -2071,6 +2075,7 @@ early_object_sizes_execute_one (gimple_stmt_iterator *i, 
gimple *call)
   tree type = TREE_TYPE (lhs);
   tree bytes;
   if (!compute_builtin_object_size (ptr, object_size_type, )
+  || (TREE_CODE (bytes) != INTEGER_CST)
   || !int_fits_type_p (bytes, type))
 return;
 
@@ -2153,7 +2158,7 @@ object_sizes_execute (function *fun, bool early)
 __builtin_dynamic_object_size too.  */
  if (early)
{
- early_object_sizes_execute_one (, call);
+ early_object_sizes_execute_one (, call, dynamic);
  continue;
}
 
-- 
2.31.1



Re: [PATCH v6 3/5] Use the .ACCESS_WITH_SIZE in builtin object size.

2024-03-18 Thread Qing Zhao


> On Mar 18, 2024, at 12:30, Siddhesh Poyarekar  wrote:
> 
> On 2024-03-18 12:28, Qing Zhao wrote:
>>>> This should probably bail out if object_size_type & OST_DYNAMIC == 0.
>>> Okay. Will add this.
>> When add this into access_with_size_object_size, I found some old bugs in 
>> early_object_sizes_execute_one, and fixed them as well.
> 
> Would you be able to isolate this fix and post them separately?  I reckon 
> they would be relevant for gcc 14 too.

Yes, that’s a good idea, I can do that.
No specific testing case for it, though. 

thanks.

Qing

> 
> Thanks,
> Sid



Re: [PATCH v6 3/5] Use the .ACCESS_WITH_SIZE in builtin object size.

2024-03-18 Thread Qing Zhao


On Mar 13, 2024, at 15:17, Qing Zhao  wrote:



On Mar 11, 2024, at 13:11, Siddhesh Poyarekar  wrote:



On 2024-02-16 14:47, Qing Zhao wrote:
gcc/ChangeLog:
* tree-object-size.cc (access_with_size_object_size): New function.
(call_object_size): Call the new function.
gcc/testsuite/ChangeLog:
* gcc.dg/builtin-object-size-common.h: Add a new macro EXPECT.
* gcc.dg/flex-array-counted-by-3.c: New test.
* gcc.dg/flex-array-counted-by-4.c: New test.
* gcc.dg/flex-array-counted-by-5.c: New test.
---
 .../gcc.dg/builtin-object-size-common.h   |  11 ++
 .../gcc.dg/flex-array-counted-by-3.c  |  63 +++
 .../gcc.dg/flex-array-counted-by-4.c  | 178 ++
 .../gcc.dg/flex-array-counted-by-5.c  |  48 +
 gcc/tree-object-size.cc   |  59 ++
 5 files changed, 359 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-5.c
diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-common.h 
b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
index 66ff7cdd953a..b677067c6e6b 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-common.h
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
@@ -30,3 +30,14 @@ unsigned nfails = 0;
   __builtin_abort ();   \
 return 0;   \
   } while (0)
+
+#define EXPECT(p, _v) do {   \
+  size_t v = _v;   \
+  if (p == v)   \
+__builtin_printf ("ok:  %s == %zd\n", #p, p);   \
+  else   \
+{   \
+  __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v);   \
+  FAIL ();   \
+}   \
+} while (0);
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
new file mode 100644
index ..0066c32ca808
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
@@ -0,0 +1,63 @@
+/* test the attribute counted_by and its usage in
+ * __builtin_dynamic_object_size.  */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "builtin-object-size-common.h"
+
+struct flex {
+  int b;
+  int c[];
+} *array_flex;
+
+struct annotated {
+  int b;
+  int c[] __attribute__ ((counted_by (b)));
+} *array_annotated;
+
+struct nested_annotated {
+  struct {
+union {
+  int b;
+  float f;
+};
+int n;
+  };
+  int c[] __attribute__ ((counted_by (b)));
+} *array_nested_annotated;
+
+void __attribute__((__noinline__)) setup (int normal_count, int attr_count)
+{
+  array_flex
+= (struct flex *)malloc (sizeof (struct flex)
+  + normal_count *  sizeof (int));
+  array_flex->b = normal_count;
+
+  array_annotated
+= (struct annotated *)malloc (sizeof (struct annotated)
+   + attr_count *  sizeof (int));
+  array_annotated->b = attr_count;
+
+  array_nested_annotated
+= (struct nested_annotated *)malloc (sizeof (struct nested_annotated)
+  + attr_count *  sizeof (int));
+  array_nested_annotated->b = attr_count;
+
+  return;
+}
+
+void __attribute__((__noinline__)) test ()
+{
+EXPECT(__builtin_dynamic_object_size(array_flex->c, 1), -1);
+EXPECT(__builtin_dynamic_object_size(array_annotated->c, 1),
+array_annotated->b * sizeof (int));
+EXPECT(__builtin_dynamic_object_size(array_nested_annotated->c, 1),
+array_nested_annotated->b * sizeof (int));
+}
+
+int main(int argc, char *argv[])
+{
+  setup (10,10);
+  test ();
+  DONE ();
+}
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
new file mode 100644
index ..3ce7f3545549
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
@@ -0,0 +1,178 @@
+/* test the attribute counted_by and its usage in
+__builtin_dynamic_object_size: what's the correct behavior when the
+allocation size mismatched with the value of counted_by attribute?
+we should always use the latest value that is hold by the counted_by
+field.  */
+/* { dg-do run } */
+/* { dg-options "-O -fstrict-flex-arrays=3" } */
+
+#include "builtin-object-size-common.h"
+
+struct annotated {
+  size_t foo;
+  char others;
+  char array[] __attribute__((counted_by (foo)));
+};
+
+#define noinline __attribute__((__noinline__))
+#define SIZE_BUMP 10
+#define MAX(a, b) ((a) > (b) ? (a) : (b))
+
+/* In general, Due to type casting, the type for the pointee of a pointer
+   does not say anything about the object it points to,
+   So, __builtin_object_size can not directly use the type of the pointee
+   to decide the size of the object the pointer points to.
+
+   there are only two reliable ways:
+   A. observed allocations  (call to the allocation functions in the routine)
+   B. observed accesses (read or write access to the location of the
+ pointer points to)
+
+   that provide information abou

Re: [PATCH v6 4/5] Use the .ACCESS_WITH_SIZE in bound sanitizer.

2024-03-15 Thread Qing Zhao


On Mar 13, 2024, at 15:19, Qing Zhao  wrote:



On Mar 11, 2024, at 13:15, Siddhesh Poyarekar  wrote:



On 2024-02-16 14:47, Qing Zhao wrote:
gcc/c-family/ChangeLog:
* c-ubsan.cc (get_bound_from_access_with_size): New function.
(ubsan_instrument_bounds): Handle call to .ACCESS_WITH_SIZE.
gcc/testsuite/ChangeLog:
* gcc.dg/ubsan/flex-array-counted-by-bounds-2.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds-3.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds.c: New test.
---
 gcc/c-family/c-ubsan.cc   | 42 +
 .../ubsan/flex-array-counted-by-bounds-2.c| 45 ++
 .../ubsan/flex-array-counted-by-bounds-3.c| 34 ++
 .../ubsan/flex-array-counted-by-bounds.c  | 46 +++
 4 files changed, 167 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-3.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c
diff --git a/gcc/c-family/c-ubsan.cc b/gcc/c-family/c-ubsan.cc
index 940982819ddf..164b29845b3a 100644
--- a/gcc/c-family/c-ubsan.cc
+++ b/gcc/c-family/c-ubsan.cc
@@ -376,6 +376,40 @@ ubsan_instrument_return (location_t loc)
   return build_call_expr_loc (loc, t, 1, build_fold_addr_expr_loc (loc, data));
 }
 +/* Get the tree that represented the number of counted_by, i.e, the maximum
+   number of the elements of the object that the call to .ACCESS_WITH_SIZE
+   points to, this number will be the bound of the corresponding array.  */
+static tree
+get_bound_from_access_with_size (tree call)
+{
+  if (!is_access_with_size_p (call))
+return NULL_TREE;
+
+  tree ref_to_size = CALL_EXPR_ARG (call, 1);
+  unsigned int type_of_size = TREE_INT_CST_LOW (CALL_EXPR_ARG (call, 2));

Again for consistency, this should probably be class_of_size.

Okay, I will update this consistently with the change relate to the 3rd 
argument.

+  tree type = TREE_TYPE (CALL_EXPR_ARG (call, 3));
+  tree size = fold_build2 (MEM_REF, type, unshare_expr (ref_to_size),
+build_int_cst (ptr_type_node, 0));
+  /* If size is negative value, treat it as zero.  */
+  if (!TYPE_UNSIGNED (type))
+  {
+tree cond = fold_build2 (LT_EXPR, boolean_type_node,
+  unshare_expr (size), build_zero_cst (type));
+size = fold_build3 (COND_EXPR, type, cond,
+ build_zero_cst (type), size);
+  }
+
+  /* Only when type_of_size is 1,i.e, the number of the elements of
+ the object type, return the size.  */
+  if (type_of_size != 1)
+return NULL_TREE;
+  else
+size = fold_convert (sizetype, size);
+
+  return size;
+}
+
+
 /* Instrument array bounds for ARRAY_REFs.  We create special builtin,
that gets expanded in the sanopt pass, and make an array dimension
of it.  ARRAY is the array, *INDEX is an index to the array.
@@ -401,6 +435,14 @@ ubsan_instrument_bounds (location_t loc, tree array, tree 
*index,
&& COMPLETE_TYPE_P (type)
&& integer_zerop (TYPE_SIZE (type)))
  bound = build_int_cst (TREE_TYPE (TYPE_MIN_VALUE (domain)), -1);
+  else if (INDIRECT_REF_P (array)
+&& is_access_with_size_p ((TREE_OPERAND (array, 0
+ {
+   bound = get_bound_from_access_with_size ((TREE_OPERAND (array, 0)));
+   bound = fold_build2 (MINUS_EXPR, TREE_TYPE (bound),
+bound,
+build_int_cst (TREE_TYPE (bound), 1));
+ }

This will wrap if bound == 0, maybe that needs to be special-cased.  And maybe 
also add a test for it below.

Will check on this to see whether a new testing is needed.

Checked, the current code can handle the case when bound==0 correctly.
I just add a new testing case for this.

thanks.

Qing

Thanks a lot for the review.

Qing

   else
  return NULL_TREE;
 }
diff --git a/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c 
b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
new file mode 100644
index ..148934975ee5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
@@ -0,0 +1,45 @@
+/* test the attribute counted_by and its usage in
+   bounds sanitizer combined with VLA.  */
+/* { dg-do run } */
+/* { dg-options "-fsanitize=bounds" } */
+/* { dg-output "index 11 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 20 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 11 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 10 out of bounds for type 'int 
\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+
+
+#include 
+
+void __attribute__((__noinline__)) setup_and_test_vla (int n, int m)
+{
+   struct foo {
+   int n;
+   int p[][n] __attribute__((counted_by(n)));
+   } *f;
+
+   f = (struct foo *) malloc (sizeof(str

Re: [PATCH v6 4/5] Use the .ACCESS_WITH_SIZE in bound sanitizer.

2024-03-13 Thread Qing Zhao


On Mar 11, 2024, at 13:15, Siddhesh Poyarekar  wrote:



On 2024-02-16 14:47, Qing Zhao wrote:
gcc/c-family/ChangeLog:
* c-ubsan.cc (get_bound_from_access_with_size): New function.
(ubsan_instrument_bounds): Handle call to .ACCESS_WITH_SIZE.
gcc/testsuite/ChangeLog:
* gcc.dg/ubsan/flex-array-counted-by-bounds-2.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds-3.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds.c: New test.
---
 gcc/c-family/c-ubsan.cc   | 42 +
 .../ubsan/flex-array-counted-by-bounds-2.c| 45 ++
 .../ubsan/flex-array-counted-by-bounds-3.c| 34 ++
 .../ubsan/flex-array-counted-by-bounds.c  | 46 +++
 4 files changed, 167 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-3.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c
diff --git a/gcc/c-family/c-ubsan.cc b/gcc/c-family/c-ubsan.cc
index 940982819ddf..164b29845b3a 100644
--- a/gcc/c-family/c-ubsan.cc
+++ b/gcc/c-family/c-ubsan.cc
@@ -376,6 +376,40 @@ ubsan_instrument_return (location_t loc)
   return build_call_expr_loc (loc, t, 1, build_fold_addr_expr_loc (loc, data));
 }
 +/* Get the tree that represented the number of counted_by, i.e, the maximum
+   number of the elements of the object that the call to .ACCESS_WITH_SIZE
+   points to, this number will be the bound of the corresponding array.  */
+static tree
+get_bound_from_access_with_size (tree call)
+{
+  if (!is_access_with_size_p (call))
+return NULL_TREE;
+
+  tree ref_to_size = CALL_EXPR_ARG (call, 1);
+  unsigned int type_of_size = TREE_INT_CST_LOW (CALL_EXPR_ARG (call, 2));

Again for consistency, this should probably be class_of_size.

Okay, I will update this consistently with the change relate to the 3rd 
argument.

+  tree type = TREE_TYPE (CALL_EXPR_ARG (call, 3));
+  tree size = fold_build2 (MEM_REF, type, unshare_expr (ref_to_size),
+build_int_cst (ptr_type_node, 0));
+  /* If size is negative value, treat it as zero.  */
+  if (!TYPE_UNSIGNED (type))
+  {
+tree cond = fold_build2 (LT_EXPR, boolean_type_node,
+  unshare_expr (size), build_zero_cst (type));
+size = fold_build3 (COND_EXPR, type, cond,
+ build_zero_cst (type), size);
+  }
+
+  /* Only when type_of_size is 1,i.e, the number of the elements of
+ the object type, return the size.  */
+  if (type_of_size != 1)
+return NULL_TREE;
+  else
+size = fold_convert (sizetype, size);
+
+  return size;
+}
+
+
 /* Instrument array bounds for ARRAY_REFs.  We create special builtin,
that gets expanded in the sanopt pass, and make an array dimension
of it.  ARRAY is the array, *INDEX is an index to the array.
@@ -401,6 +435,14 @@ ubsan_instrument_bounds (location_t loc, tree array, tree 
*index,
&& COMPLETE_TYPE_P (type)
&& integer_zerop (TYPE_SIZE (type)))
  bound = build_int_cst (TREE_TYPE (TYPE_MIN_VALUE (domain)), -1);
+  else if (INDIRECT_REF_P (array)
+&& is_access_with_size_p ((TREE_OPERAND (array, 0
+ {
+   bound = get_bound_from_access_with_size ((TREE_OPERAND (array, 0)));
+   bound = fold_build2 (MINUS_EXPR, TREE_TYPE (bound),
+bound,
+build_int_cst (TREE_TYPE (bound), 1));
+ }

This will wrap if bound == 0, maybe that needs to be special-cased.  And maybe 
also add a test for it below.

Will check on this to see whether a new testing is needed.

Thanks a lot for the review.

Qing

   else
  return NULL_TREE;
 }
diff --git a/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c 
b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
new file mode 100644
index ..148934975ee5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
@@ -0,0 +1,45 @@
+/* test the attribute counted_by and its usage in
+   bounds sanitizer combined with VLA.  */
+/* { dg-do run } */
+/* { dg-options "-fsanitize=bounds" } */
+/* { dg-output "index 11 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 20 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 11 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 10 out of bounds for type 'int 
\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+
+
+#include 
+
+void __attribute__((__noinline__)) setup_and_test_vla (int n, int m)
+{
+   struct foo {
+   int n;
+   int p[][n] __attribute__((counted_by(n)));
+   } *f;
+
+   f = (struct foo *) malloc (sizeof(struct foo) + m*sizeof(int[n]));
+   f->n = m;
+   f->p[m][n-1]=1;
+   return;
+}
+
+void __attribute__((__noinline__)) setup_and_test_vla_1 (int n1, int n2, int m)
+{
+  struc

Re: [PATCH v6 3/5] Use the .ACCESS_WITH_SIZE in builtin object size.

2024-03-13 Thread Qing Zhao


On Mar 11, 2024, at 13:11, Siddhesh Poyarekar  wrote:



On 2024-02-16 14:47, Qing Zhao wrote:
gcc/ChangeLog:
* tree-object-size.cc (access_with_size_object_size): New function.
(call_object_size): Call the new function.
gcc/testsuite/ChangeLog:
* gcc.dg/builtin-object-size-common.h: Add a new macro EXPECT.
* gcc.dg/flex-array-counted-by-3.c: New test.
* gcc.dg/flex-array-counted-by-4.c: New test.
* gcc.dg/flex-array-counted-by-5.c: New test.
---
 .../gcc.dg/builtin-object-size-common.h   |  11 ++
 .../gcc.dg/flex-array-counted-by-3.c  |  63 +++
 .../gcc.dg/flex-array-counted-by-4.c  | 178 ++
 .../gcc.dg/flex-array-counted-by-5.c  |  48 +
 gcc/tree-object-size.cc   |  59 ++
 5 files changed, 359 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-5.c
diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-common.h 
b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
index 66ff7cdd953a..b677067c6e6b 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-common.h
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
@@ -30,3 +30,14 @@ unsigned nfails = 0;
   __builtin_abort ();   \
 return 0;   \
   } while (0)
+
+#define EXPECT(p, _v) do {   \
+  size_t v = _v;   \
+  if (p == v)   \
+__builtin_printf ("ok:  %s == %zd\n", #p, p);   \
+  else   \
+{   \
+  __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v);   \
+  FAIL ();   \
+}   \
+} while (0);
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
new file mode 100644
index ..0066c32ca808
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
@@ -0,0 +1,63 @@
+/* test the attribute counted_by and its usage in
+ * __builtin_dynamic_object_size.  */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "builtin-object-size-common.h"
+
+struct flex {
+  int b;
+  int c[];
+} *array_flex;
+
+struct annotated {
+  int b;
+  int c[] __attribute__ ((counted_by (b)));
+} *array_annotated;
+
+struct nested_annotated {
+  struct {
+union {
+  int b;
+  float f;
+};
+int n;
+  };
+  int c[] __attribute__ ((counted_by (b)));
+} *array_nested_annotated;
+
+void __attribute__((__noinline__)) setup (int normal_count, int attr_count)
+{
+  array_flex
+= (struct flex *)malloc (sizeof (struct flex)
+  + normal_count *  sizeof (int));
+  array_flex->b = normal_count;
+
+  array_annotated
+= (struct annotated *)malloc (sizeof (struct annotated)
+   + attr_count *  sizeof (int));
+  array_annotated->b = attr_count;
+
+  array_nested_annotated
+= (struct nested_annotated *)malloc (sizeof (struct nested_annotated)
+  + attr_count *  sizeof (int));
+  array_nested_annotated->b = attr_count;
+
+  return;
+}
+
+void __attribute__((__noinline__)) test ()
+{
+EXPECT(__builtin_dynamic_object_size(array_flex->c, 1), -1);
+EXPECT(__builtin_dynamic_object_size(array_annotated->c, 1),
+array_annotated->b * sizeof (int));
+EXPECT(__builtin_dynamic_object_size(array_nested_annotated->c, 1),
+array_nested_annotated->b * sizeof (int));
+}
+
+int main(int argc, char *argv[])
+{
+  setup (10,10);
+  test ();
+  DONE ();
+}
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
new file mode 100644
index ..3ce7f3545549
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
@@ -0,0 +1,178 @@
+/* test the attribute counted_by and its usage in
+__builtin_dynamic_object_size: what's the correct behavior when the
+allocation size mismatched with the value of counted_by attribute?
+we should always use the latest value that is hold by the counted_by
+field.  */
+/* { dg-do run } */
+/* { dg-options "-O -fstrict-flex-arrays=3" } */
+
+#include "builtin-object-size-common.h"
+
+struct annotated {
+  size_t foo;
+  char others;
+  char array[] __attribute__((counted_by (foo)));
+};
+
+#define noinline __attribute__((__noinline__))
+#define SIZE_BUMP 10
+#define MAX(a, b) ((a) > (b) ? (a) : (b))
+
+/* In general, Due to type casting, the type for the pointee of a pointer
+   does not say anything about the object it points to,
+   So, __builtin_object_size can not directly use the type of the pointee
+   to decide the size of the object the pointer points to.
+
+   there are only two reliable ways:
+   A. observed allocations  (call to the allocation functions in the routine)
+   B. observed accesses (read or write access to the location of the
+ pointer points to)
+
+   that provide information about the type/existence of an object at
+   the corr

Re: [PATCH v6 2/5] Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.

2024-03-13 Thread Qing Zhao


> On Mar 11, 2024, at 13:09, Siddhesh Poyarekar  wrote:
> 
> 
> 
> On 2024-02-16 14:47, Qing Zhao wrote:
>> Including the following changes:
>> * The definition of the new internal function .ACCESS_WITH_SIZE
>>   in internal-fn.def.
>> * C FE converts every reference to a FAM with a "counted_by" attribute
>>   to a call to the internal function .ACCESS_WITH_SIZE.
>>   (build_component_ref in c_typeck.cc)
>>   This includes the case when the object is statically allocated and
>>   initialized.
>>   In order to make this working, the routines initializer_constant_valid_p_1
>>   and output_constant in varasm.cc are updated to handle calls to
>>   .ACCESS_WITH_SIZE.
>>   (initializer_constant_valid_p_1 and output_constant in varasm.c)
>>   However, for the reference inside "offsetof", the "counted_by" attribute is
>>   ignored since it's not useful at all.
>>   (c_parser_postfix_expression in c/c-parser.cc)
>>   In addtion to "offsetof", for the reference inside operator "typeof" and
>>   "alignof", we ignore counted_by attribute too.
>>   When building ADDR_EXPR for the .ACCESS_WITH_SIZE in C FE,
>>   replace the call with its first argument.
>> * Convert every call to .ACCESS_WITH_SIZE to its first argument.
>>   (expand_ACCESS_WITH_SIZE in internal-fn.cc)
>> * Adjust alias analysis to exclude the new internal from clobbering anything.
>>   (ref_maybe_used_by_call_p_1 and call_may_clobber_ref_p_1 in 
>> tree-ssa-alias.cc)
>> * Adjust dead code elimination to eliminate the call to .ACCESS_WITH_SIZE 
>> when
>>   it's LHS is eliminated as dead code.
>>   (eliminate_unnecessary_stmts in tree-ssa-dce.cc)
>> * Provide the utility routines to check the call is .ACCESS_WITH_SIZE and
>>   get the reference from the call to .ACCESS_WITH_SIZE.
>>   (is_access_with_size_p and get_ref_from_access_with_size in tree.cc)
>> gcc/c/ChangeLog:
>>  * c-parser.cc (c_parser_postfix_expression): Ignore the counted-by
>>  attribute when build_component_ref inside offsetof operator.
>>  * c-tree.h (build_component_ref): Add one more parameter.
>>  * c-typeck.cc (build_counted_by_ref): New function.
>>  (build_access_with_size_for_counted_by): New function.
>>  (build_component_ref): Check the counted-by attribute and build
>>  call to .ACCESS_WITH_SIZE.
>>  (build_unary_op): When building ADDR_EXPR for
>> .ACCESS_WITH_SIZE, use its first argument.
>> (lvalue_p): Accept call to .ACCESS_WITH_SIZE.
>> gcc/ChangeLog:
>>  * internal-fn.cc (expand_ACCESS_WITH_SIZE): New function.
>>  * internal-fn.def (ACCESS_WITH_SIZE): New internal function.
>>  * tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Special case
>>  IFN_ACCESS_WITH_SIZE.
>>  (call_may_clobber_ref_p_1): Special case IFN_ACCESS_WITH_SIZE.
>>  * tree-ssa-dce.cc (eliminate_unnecessary_stmts): Eliminate the call
>>  to .ACCESS_WITH_SIZE when its LHS is dead.
>>  * tree.cc (process_call_operands): Adjust side effect for function
>>  .ACCESS_WITH_SIZE.
>>  (is_access_with_size_p): New function.
>>  (get_ref_from_access_with_size): New function.
>>  * tree.h (is_access_with_size_p): New prototype.
>>  (get_ref_from_access_with_size): New prototype.
>>  * varasm.cc (initializer_constant_valid_p_1): Handle call to
>>  .ACCESS_WITH_SIZE.
>>  (output_constant): Handle call to .ACCESS_WITH_SIZE.
>> gcc/testsuite/ChangeLog:
>>  * gcc.dg/flex-array-counted-by-2.c: New test.
>> ---
>>  gcc/c/c-parser.cc |  10 +-
>>  gcc/c/c-tree.h|   2 +-
>>  gcc/c/c-typeck.cc | 128 +-
>>  gcc/internal-fn.cc|  36 +
>>  gcc/internal-fn.def   |   4 +
>>  .../gcc.dg/flex-array-counted-by-2.c  | 112 +++
>>  gcc/tree-ssa-alias.cc |   2 +
>>  gcc/tree-ssa-dce.cc   |   5 +-
>>  gcc/tree.cc   |  25 +++-
>>  gcc/tree.h|   8 ++
>>  gcc/varasm.cc |  10 ++
>>  11 files changed, 332 insertions(+), 10 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
>> diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
>> index c31349dae2ff..a6ed5ac43bb1 100644
>> --- a/gcc/c/c-parser.cc
>> +++ b/gcc/c/c-parser.c

Re: [PATCH v6 1/5] Provide counted_by attribute to flexible array member field (PR108896)

2024-03-13 Thread Qing Zhao
Sid,

Thanks a lot for your time to review the code.
See my reply below:

On Mar 11, 2024, at 10:57, Siddhesh Poyarekar  wrote:

On 2024-02-16 14:47, Qing Zhao wrote:
 return true;
   else
 return targetm.attribute_takes_identifier_p (attr_id);
@@ -2806,6 +2811,53 @@ handle_strict_flex_array_attribute (tree *node, tree 
name,
   return NULL_TREE;
 }
 +/* Handle a "counted_by" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_counted_by_attribute (tree *node, tree name,
+  tree args, int ARG_UNUSED (flags),
+  bool *no_add_attrs)
+{
+  tree decl = *node;
+  tree argval = TREE_VALUE (args);
+
+  /* This attribute only applies to field decls of a structure.  */
+  if (TREE_CODE (decl) != FIELD_DECL)
+{
+  error_at (DECL_SOURCE_LOCATION (decl),
+ "%qE attribute may not be specified for non-field"
+ " declaration %q+D", name, decl);
+  *no_add_attrs = true;
+}
+  /* This attribute only applies to field with array type.  */
+  else if (TREE_CODE (TREE_TYPE (decl)) != ARRAY_TYPE)
+{
+  error_at (DECL_SOURCE_LOCATION (decl),
+ "%qE attribute may not be specified for a non-array field",
+ name);
+  *no_add_attrs = true;
+}
+  /* This attribute only applies to a C99 flexible array member type.  */
+  else if (! c_flexible_array_member_type_p (TREE_TYPE (decl)))
+{
+  error_at (DECL_SOURCE_LOCATION (decl),
+ "%qE attribute may not be specified for a non"
+ " flexible array member field",
+ name);
+  *no_add_attrs = true;
+}

How about "not allowed" instead of "may not be specified"?

Okay, will update them.

+  /* The argument should be an identifier.  */
+  else if (TREE_CODE (argval) != IDENTIFIER_NODE)
+{
+  error_at (DECL_SOURCE_LOCATION (decl),
+ "% argument not an identifier");
+  *no_add_attrs = true;
+}

Validate that the attribute only applies to a C99 flexible array member of a 
structure and the argument should be an identifier node.  OK. 
verify_counted_by_attribute does more extensive validation on argval.
Yes.

+
+  return NULL_TREE;
+}
+
 /* Handle a "weak" attribute; arguments as in
struct attribute_spec.handler.  */
 diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index e15eff698dfd..56d828e3dfaf 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -9909,6 +9909,19 @@ c_common_finalize_early_debug (void)
   (*debug_hooks->early_global_decl) (cnode->decl);
 }
 +/* Determine whether TYPE is a ISO C99 flexible array memeber type "[]".  */

s/memeber/member/
Okay, will update it.

+bool
+c_flexible_array_member_type_p (const_tree type)
+{
+  if (TREE_CODE (type) == ARRAY_TYPE
+  && TYPE_SIZE (type) == NULL_TREE
+  && TYPE_DOMAIN (type) != NULL_TREE
+  && TYPE_MAX_VALUE (TYPE_DOMAIN (type)) == NULL_TREE)
+return true;
+
+  return false;
+}
+

Moved from c/c-decl.cc<http://c-decl.cc/>.  OK.

 /* Get the LEVEL of the strict_flex_array for the ARRAY_FIELD based on the
values of attribute strict_flex_array and the flag_strict_flex_arrays.  */
 unsigned int
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 2d5f53998855..3e0eed0548b0 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -904,6 +904,7 @@ extern tree fold_for_warn (tree);
 extern tree c_common_get_narrower (tree, int *);
 extern bool get_attribute_operand (tree, unsigned HOST_WIDE_INT *);
 extern void c_common_finalize_early_debug (void);
+extern bool c_flexible_array_member_type_p (const_tree);
 extern unsigned int c_strict_flex_array_level_of (tree);
 extern bool c_option_is_from_cpp_diagnostics (int);
 extern tree c_hardbool_type_attr_1 (tree, tree *, tree *);
diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index fe20bc21c926..4348123502e4 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -5301,19 +5301,6 @@ set_array_declarator_inner (struct c_declarator *decl,
   return decl;
 }
 -/* Determine whether TYPE is a ISO C99 flexible array memeber type "[]".  */
-static bool
-flexible_array_member_type_p (const_tree type)
-{
-  if (TREE_CODE (type) == ARRAY_TYPE
-  && TYPE_SIZE (type) == NULL_TREE
-  && TYPE_DOMAIN (type) != NULL_TREE
-  && TYPE_MAX_VALUE (TYPE_DOMAIN (type)) == NULL_TREE)
-return true;
-
-  return false;
-}
-
 /* Determine whether TYPE is a one-element array type "[1]".  */
 static bool
 one_element_array_type_p (const_tree type)
@@ -5350,7 +5337,7 @@ add_flexible_array_elts_to_size (tree decl, tree init)
 elt = CONSTRUCTOR_ELTS (init)->last ().value;
   type = TREE_TYPE (elt);
-  if (flexible_array_member_type_p (type))
+  if (c_flexible_array_member_type_p (type))
 {
   complete_array_type (, elt, false);
   DECL_SIZE (decl)
@@ -9317,7 +9304,7 @@ is_flexible_array_member_p (

Re: [PATCH v6 0/5]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-03-01 Thread Qing Zhao
Ping on this patch set.

Thanks a lot!

Qing

> On Feb 16, 2024, at 14:47, Qing Zhao  wrote:
> 
> Hi,
> 
> This is the 6th version of the patch.
> 
> compare with the 5th version, the only difference is:
> 
> 1. Add the 6th argument to .ACCESS_WITH_SIZE
>   to carry the TYPE of the flexible array.
>   Such information is needed during tree-object-size.cc.
> 
>   previously, we use the result type of the routine
>   .ACCESS_WITH_SIZE to decide the element type of the
>   original array, however, the result type of the routine
>   might be changed during tree optimizations due to 
>   possible type casting in the source code.
> 
> 
> compare with the 4th version, the major difference are:
> 
> 1. Change the return type of the routine .ACCESS_WITH_SIZE 
>   FROM:
> Pointer to the type of the element of the flexible array;
>   TO:
> Pointer to the type of the flexible array;
>And then wrap the call with an indirection reference. 
> 
> 2. Adjust all other parts with this change, (this will simplify the bound 
> sanitizer instrument);
> 
> 3. Add the fixes to the kernel building failures, which include:
>A. The operator “typeof” cannot return correct type for a->array; 
>B. The operator “&” cannot return correct address for a->array;
> 
> 4. Correctly handle the case when the value of “counted-by” is zero or 
> negative as following
>   4.1. Update the counted-by doc with the following:
>When the counted-by field is assigned a negative integer value, the 
> compiler will treat the value as zero. 
>   4.2. Adjust __bdos and array bound sanitizer to handle correctly when 
> “counted-by” is zero. 
> 
> 
> It based on the following proposal:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635884.html
> Represent the missing dependence for the "counted_by" attribute and its 
> consumers
> 
> **The summary of the proposal is:
> 
> * Add a new internal function ".ACCESS_WITH_SIZE" to carry the size 
> information for every reference to a FAM field;
> * In C FE, Replace every reference to a FAM field whose TYPE has the 
> "counted_by" attribute with the new internal function ".ACCESS_WITH_SIZE";
> * In every consumer of the size information, for example, BDOS or array bound 
> sanitizer, query the size information or ACCESS_MODE information from the new 
> internal function;
> * When expansing to RTL, replace the internal function with the actual 
> reference to the FAM field;
> * Some adjustment to ipa alias analysis, and other SSA passes to mitigate the 
> impact to the optimizer and code generation.
> 
> 
> **The new internal function
> 
>  .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, TYPE_OF_SIZE, 
> ACCESS_MODE, TYPE_OF_REF)
> 
> INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)
> 
> which returns the "REF_TO_OBJ" same as the 1st argument;
> 
> Both the return type and the type of the first argument of this function have 
> been converted from the incomplete array type to the corresponding pointer 
> type.
> 
> The call to .ACCESS_WITH_SIZE is wrapped with an INDIRECT_REF, whose type is 
> the original imcomplete array type.
> 
> Please see the following link for why:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638793.html
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639605.html
> 
> 1st argument "REF_TO_OBJ": The reference to the object;
> 2nd argument "REF_TO_SIZE": The reference to the size of the object,
> 3rd argument "CLASS_OF_SIZE": The size referenced by the REF_TO_SIZE 
> represents
>   0: unknown;
>   1: the number of the elements of the object type;
>   2: the number of bytes;
> 4th argument "TYPE_OF_SIZE": A constant 0 with the TYPE of the object
>  refed by REF_TO_SIZE
> 5th argument "ACCESS_MODE":
>  -1: Unknown access semantics
>   0: none
>   1: read_only
>   2: write_only
>   3: read_write
> 6th argument "TYPE_OF_REF": A constant 0 with the pointer TYPE to
>  the original flexible array type.
> 
> ** The Patch sets included:
> 
> 1. Provide counted_by attribute to flexible array member field;
>  which includes:
>  * "counted_by" attribute documentation;
>  * C FE handling of the new attribute;
>syntax checking, error reporting;
>  * testing cases;
> 
> 2. Convert "counted_by" attribute to/from .ACCESS_WITH_SIZE.
>  which includes:
>  * The definition of the new internal function .ACCESS_WITH_SIZE in 
> internal-fn.def.
>  * C FE converts every reference to a FAM with "cou

[PATCH v6 1/5] Provide counted_by attribute to flexible array member field (PR108896)

2024-02-16 Thread Qing Zhao
'counted_by (COUNT)'
 The 'counted_by' attribute may be attached to the C99 flexible
 array member of a structure.  It indicates that the number of the
 elements of the array is given by the field named "COUNT" in the
 same structure as the flexible array member.  GCC uses this
 information to improve the results of the array bound sanitizer and
 the '__builtin_dynamic_object_size'.

 For instance, the following code:

  struct P {
size_t count;
char other;
char array[] __attribute__ ((counted_by (count)));
  } *p;

 specifies that the 'array' is a flexible array member whose number
 of elements is given by the field 'count' in the same structure.

 The field that represents the number of the elements should have an
 integer type.  Otherwise, the compiler will report a warning and
 ignore the attribute.

 When the field that represents the number of the elements is assigned a
 negative integer value, the compiler will treat the value as zero.

 An explicit 'counted_by' annotation defines a relationship between
 two objects, 'p->array' and 'p->count', and there are the following
 requirementthat on the relationship between this pair:

* 'p->count' should be initialized before the first reference to
  'p->array';

* 'p->array' has _at least_ 'p->count' number of elements
  available all the time.  This relationship must hold even
  after any of these related objects are updated during the
  program.

 It's the user's responsibility to make sure the above requirements
 to be kept all the time.  Otherwise the compiler will report
 warnings, at the same time, the results of the array bound
 sanitizer and the '__builtin_dynamic_object_size' is undefined.

 One important feature of the attribute is, a reference to the
 flexible array member field will use the latest value assigned to
 the field that represents the number of the elements before that
 reference.  For example,

p->count = val1;
p->array[20] = 0;  // ref1 to p->array
p->count = val2;
p->array[30] = 0;  // ref2 to p->array

 in the above, 'ref1' will use 'val1' as the number of the elements
 in 'p->array', and 'ref2' will use 'val2' as the number of elements
 in 'p->array'.

gcc/c-family/ChangeLog:

PR C/108896
* c-attribs.cc (handle_counted_by_attribute): New function.
(attribute_takes_identifier_p): Add counted_by attribute to the list.
* c-common.cc (c_flexible_array_member_type_p): ...To this.
* c-common.h (c_flexible_array_member_type_p): New prototype.

gcc/c/ChangeLog:

PR C/108896
* c-decl.cc (flexible_array_member_type_p): Renamed and moved to...
(add_flexible_array_elts_to_size): Use renamed function.
(is_flexible_array_member_p): Use renamed function.
(verify_counted_by_attribute): New function.
(finish_struct): Use renamed function and verify counted_by
attribute.
* c-tree.h (lookup_field): New prototype.
* c-typeck.cc (lookup_field): Expose as extern function.

gcc/ChangeLog:

PR C/108896
* doc/extend.texi: Document attribute counted_by.

gcc/testsuite/ChangeLog:

PR C/108896
* gcc.dg/flex-array-counted-by.c: New test.
---
 gcc/c-family/c-attribs.cc| 54 -
 gcc/c-family/c-common.cc | 13 +++
 gcc/c-family/c-common.h  |  1 +
 gcc/c/c-decl.cc  | 85 
 gcc/c/c-tree.h   |  1 +
 gcc/c/c-typeck.cc|  3 +-
 gcc/doc/extend.texi  | 64 +++
 gcc/testsuite/gcc.dg/flex-array-counted-by.c | 40 +
 8 files changed, 241 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by.c

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 40a0cf90295d..4395c0656b14 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -105,6 +105,8 @@ static tree handle_warn_if_not_aligned_attribute (tree *, 
tree, tree,
  int, bool *);
 static tree handle_strict_flex_array_attribute (tree *, tree, tree,
 int, bool *);
+static tree handle_counted_by_attribute (tree *, tree, tree,
+  int, bool *);
 static tree handle_weak_attribute (tree *, tree, tree, int, bool *) ;
 static tree handle_noplt_attribute (tree *, tree, tree, int, bool *) ;
 static tree handle_alias_ifunc_attribute (bool, tree *, tree, tree, bool *);
@@ -412,6 +414,8 @@ const struct attribute_spec c_common_gnu_attributes[] =
  handle_warn_if_not_aligned_attribute, NULL },

[PATCH v6 5/5] Add the 6th argument to .ACCESS_WITH_SIZE

2024-02-16 Thread Qing Zhao
to carry the TYPE of the flexible array.

Such information is needed during tree-object-size.cc.

We cannot use the result type or the type of the 1st argument
of the routine .ACCESS_WITH_SIZE to decide the element type
of the original array due to possible type casting in the
source code.

gcc/c/ChangeLog:

* c-typeck.cc (build_access_with_size_for_counted_by): Add the 6th
argument to .ACCESS_WITH_SIZE.

gcc/ChangeLog:

* tree-object-size.cc (access_with_size_object_size): Use the type
of the 6th argument for the type of the element.

gcc/testsuite/ChangeLog:

* gcc.dg/flex-array-counted-by-6.c: New test.
---
 gcc/c/c-typeck.cc | 11 +++--
 .../gcc.dg/flex-array-counted-by-6.c  | 46 +++
 gcc/tree-object-size.cc   | 16 ---
 3 files changed, 64 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-6.c

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index a29a7d7ec029..c17ac6862546 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -2608,7 +2608,8 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
*counted_by_type)
 
to:
 
-   (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1))
+   (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1,
+   (TYPE_OF_ARRAY *)0))
 
NOTE: The return type of this function is the POINTER type pointing
to the original flexible array type.
@@ -2620,6 +2621,9 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
*counted_by_type)
The 4th argument of the call is a constant 0 with the TYPE of the
object pointed by COUNTED_BY_REF.
 
+   The 6th argument of the call is a constant 0 with the pointer TYPE
+   to the original flexible array type.
+
   */
 static tree
 build_access_with_size_for_counted_by (location_t loc, tree ref,
@@ -2632,12 +2636,13 @@ build_access_with_size_for_counted_by (location_t loc, 
tree ref,
 
   tree call
 = build_call_expr_internal_loc (loc, IFN_ACCESS_WITH_SIZE,
-   result_type, 5,
+   result_type, 6,
array_to_pointer_conversion (loc, ref),
counted_by_ref,
build_int_cst (integer_type_node, 1),
build_int_cst (counted_by_type, 0),
-   build_int_cst (integer_type_node, -1));
+   build_int_cst (integer_type_node, -1),
+   build_int_cst (result_type, 0));
   /* Wrap the call with an INDIRECT_REF with the flexible array type.  */
   call = build1 (INDIRECT_REF, TREE_TYPE (ref), call);
   SET_EXPR_LOCATION (call, loc);
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c
new file mode 100644
index ..65a401796479
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c
@@ -0,0 +1,46 @@
+/* test the attribute counted_by and its usage in
+ * __builtin_dynamic_object_size. when the type of the flexible array member
+ * is casting to another type.  */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "builtin-object-size-common.h"
+
+typedef unsigned short u16;
+
+struct info {
+   u16 data_len;
+   char data[] __attribute__((counted_by(data_len)));
+};
+
+struct foo {
+   int a;
+   int b;
+};
+
+static __attribute__((__noinline__))
+struct info *setup ()
+{
+ struct info *p;
+ size_t bytes = 3 * sizeof(struct foo);
+
+ p = (struct info *)malloc (sizeof (struct info) + bytes);
+ p->data_len = bytes;
+
+ return p;
+}
+
+static void
+__attribute__((__noinline__)) report (struct info *p)
+{
+ struct foo *bar = (struct foo *)p->data;
+ EXPECT(__builtin_dynamic_object_size((char *)(bar + 1), 1), 16);
+ EXPECT(__builtin_dynamic_object_size((char *)(bar + 2), 1), 8);
+}
+
+int main(int argc, char *argv[])
+{
+ struct info *p = setup();
+ report(p);
+ return 0;
+}
diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc
index 630b0a7aaa4b..c3098c521a43 100644
--- a/gcc/tree-object-size.cc
+++ b/gcc/tree-object-size.cc
@@ -763,17 +763,21 @@ addr_object_size (struct object_size_info *osi, 
const_tree ptr,
  2: the number of bytes;
4th argument TYPE_OF_SIZE: A constant 0 with the TYPE of the object
  refed by REF_TO_SIZE
+   6th argument: A constant 0 with the pointer TYPE to the original flexible
+ array type.
 
-   the size of the element can be retrived from the result type of the call,
-   which is the pointer to the array type.  */
+   the size of the element can be retrived from the TYPE of the 6th argument
+   of the call, which is the pointer to the array type.  */
 static tree
 access_with_size_object_size (const gcall *call, int object_size_type)
 {
   gcc_assert (gimple_call_internal_p (call, 

[PATCH v6 4/5] Use the .ACCESS_WITH_SIZE in bound sanitizer.

2024-02-16 Thread Qing Zhao
gcc/c-family/ChangeLog:

* c-ubsan.cc (get_bound_from_access_with_size): New function.
(ubsan_instrument_bounds): Handle call to .ACCESS_WITH_SIZE.

gcc/testsuite/ChangeLog:

* gcc.dg/ubsan/flex-array-counted-by-bounds-2.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds-3.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds.c: New test.
---
 gcc/c-family/c-ubsan.cc   | 42 +
 .../ubsan/flex-array-counted-by-bounds-2.c| 45 ++
 .../ubsan/flex-array-counted-by-bounds-3.c| 34 ++
 .../ubsan/flex-array-counted-by-bounds.c  | 46 +++
 4 files changed, 167 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-3.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c

diff --git a/gcc/c-family/c-ubsan.cc b/gcc/c-family/c-ubsan.cc
index 940982819ddf..164b29845b3a 100644
--- a/gcc/c-family/c-ubsan.cc
+++ b/gcc/c-family/c-ubsan.cc
@@ -376,6 +376,40 @@ ubsan_instrument_return (location_t loc)
   return build_call_expr_loc (loc, t, 1, build_fold_addr_expr_loc (loc, data));
 }
 
+/* Get the tree that represented the number of counted_by, i.e, the maximum
+   number of the elements of the object that the call to .ACCESS_WITH_SIZE
+   points to, this number will be the bound of the corresponding array.  */
+static tree
+get_bound_from_access_with_size (tree call)
+{
+  if (!is_access_with_size_p (call))
+return NULL_TREE;
+
+  tree ref_to_size = CALL_EXPR_ARG (call, 1);
+  unsigned int type_of_size = TREE_INT_CST_LOW (CALL_EXPR_ARG (call, 2));
+  tree type = TREE_TYPE (CALL_EXPR_ARG (call, 3));
+  tree size = fold_build2 (MEM_REF, type, unshare_expr (ref_to_size),
+  build_int_cst (ptr_type_node, 0));
+  /* If size is negative value, treat it as zero.  */
+  if (!TYPE_UNSIGNED (type))
+  {
+tree cond = fold_build2 (LT_EXPR, boolean_type_node,
+unshare_expr (size), build_zero_cst (type));
+size = fold_build3 (COND_EXPR, type, cond,
+   build_zero_cst (type), size);
+  }
+
+  /* Only when type_of_size is 1,i.e, the number of the elements of
+ the object type, return the size.  */
+  if (type_of_size != 1)
+return NULL_TREE;
+  else
+size = fold_convert (sizetype, size);
+
+  return size;
+}
+
+
 /* Instrument array bounds for ARRAY_REFs.  We create special builtin,
that gets expanded in the sanopt pass, and make an array dimension
of it.  ARRAY is the array, *INDEX is an index to the array.
@@ -401,6 +435,14 @@ ubsan_instrument_bounds (location_t loc, tree array, tree 
*index,
  && COMPLETE_TYPE_P (type)
  && integer_zerop (TYPE_SIZE (type)))
bound = build_int_cst (TREE_TYPE (TYPE_MIN_VALUE (domain)), -1);
+  else if (INDIRECT_REF_P (array)
+  && is_access_with_size_p ((TREE_OPERAND (array, 0
+   {
+ bound = get_bound_from_access_with_size ((TREE_OPERAND (array, 0)));
+ bound = fold_build2 (MINUS_EXPR, TREE_TYPE (bound),
+  bound,
+  build_int_cst (TREE_TYPE (bound), 1));
+   }
   else
return NULL_TREE;
 }
diff --git a/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c 
b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
new file mode 100644
index ..148934975ee5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
@@ -0,0 +1,45 @@
+/* test the attribute counted_by and its usage in
+   bounds sanitizer combined with VLA.  */
+/* { dg-do run } */
+/* { dg-options "-fsanitize=bounds" } */
+/* { dg-output "index 11 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 20 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 11 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 10 out of bounds for type 'int 
\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+
+
+#include 
+
+void __attribute__((__noinline__)) setup_and_test_vla (int n, int m)
+{
+   struct foo {
+   int n;
+   int p[][n] __attribute__((counted_by(n)));
+   } *f;
+
+   f = (struct foo *) malloc (sizeof(struct foo) + m*sizeof(int[n]));
+   f->n = m;
+   f->p[m][n-1]=1;
+   return;
+}
+
+void __attribute__((__noinline__)) setup_and_test_vla_1 (int n1, int n2, int m)
+{
+  struct foo {
+int n;
+int p[][n2][n1] __attribute__((counted_by(n)));
+  } *f;
+
+  f = (struct foo *) malloc (sizeof(struct foo) + m*sizeof(int[n2][n1]));
+  f->n = m;
+  f->p[m][n2][n1]=1;
+  return;
+}
+
+int main(int argc, char *argv[])
+{
+  setup_and_test_vla (10, 11);
+  setup_and_test_vla_1 (10, 11, 20);
+  return 0;

[PATCH v6 3/5] Use the .ACCESS_WITH_SIZE in builtin object size.

2024-02-16 Thread Qing Zhao
gcc/ChangeLog:

* tree-object-size.cc (access_with_size_object_size): New function.
(call_object_size): Call the new function.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-object-size-common.h: Add a new macro EXPECT.
* gcc.dg/flex-array-counted-by-3.c: New test.
* gcc.dg/flex-array-counted-by-4.c: New test.
* gcc.dg/flex-array-counted-by-5.c: New test.
---
 .../gcc.dg/builtin-object-size-common.h   |  11 ++
 .../gcc.dg/flex-array-counted-by-3.c  |  63 +++
 .../gcc.dg/flex-array-counted-by-4.c  | 178 ++
 .../gcc.dg/flex-array-counted-by-5.c  |  48 +
 gcc/tree-object-size.cc   |  59 ++
 5 files changed, 359 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-5.c

diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-common.h 
b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
index 66ff7cdd953a..b677067c6e6b 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-common.h
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
@@ -30,3 +30,14 @@ unsigned nfails = 0;
   __builtin_abort ();\
 return 0;\
   } while (0)
+
+#define EXPECT(p, _v) do {   \
+  size_t v = _v; \
+  if (p == v)\
+__builtin_printf ("ok:  %s == %zd\n", #p, p);\
+  else   \
+{\
+  __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v);\
+  FAIL ();   \
+}\
+} while (0);
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
new file mode 100644
index ..0066c32ca808
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
@@ -0,0 +1,63 @@
+/* test the attribute counted_by and its usage in
+ * __builtin_dynamic_object_size.  */ 
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "builtin-object-size-common.h"
+
+struct flex {
+  int b;
+  int c[];
+} *array_flex;
+
+struct annotated {
+  int b;
+  int c[] __attribute__ ((counted_by (b)));
+} *array_annotated;
+
+struct nested_annotated {
+  struct {
+union {
+  int b;
+  float f; 
+};
+int n;
+  };
+  int c[] __attribute__ ((counted_by (b)));
+} *array_nested_annotated;
+
+void __attribute__((__noinline__)) setup (int normal_count, int attr_count)
+{
+  array_flex
+= (struct flex *)malloc (sizeof (struct flex)
++ normal_count *  sizeof (int));
+  array_flex->b = normal_count;
+
+  array_annotated
+= (struct annotated *)malloc (sizeof (struct annotated)
+ + attr_count *  sizeof (int));
+  array_annotated->b = attr_count;
+
+  array_nested_annotated
+= (struct nested_annotated *)malloc (sizeof (struct nested_annotated)
++ attr_count *  sizeof (int));
+  array_nested_annotated->b = attr_count;
+
+  return;
+}
+
+void __attribute__((__noinline__)) test ()
+{
+EXPECT(__builtin_dynamic_object_size(array_flex->c, 1), -1);
+EXPECT(__builtin_dynamic_object_size(array_annotated->c, 1),
+  array_annotated->b * sizeof (int));
+EXPECT(__builtin_dynamic_object_size(array_nested_annotated->c, 1),
+  array_nested_annotated->b * sizeof (int));
+}
+
+int main(int argc, char *argv[])
+{
+  setup (10,10);   
+  test ();
+  DONE ();
+}
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
new file mode 100644
index ..3ce7f3545549
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
@@ -0,0 +1,178 @@
+/* test the attribute counted_by and its usage in
+__builtin_dynamic_object_size: what's the correct behavior when the
+allocation size mismatched with the value of counted_by attribute?
+we should always use the latest value that is hold by the counted_by
+field.  */
+/* { dg-do run } */
+/* { dg-options "-O -fstrict-flex-arrays=3" } */
+
+#include "builtin-object-size-common.h"
+
+struct annotated {
+  size_t foo;
+  char others;
+  char array[] __attribute__((counted_by (foo)));
+};
+
+#define noinline __attribute__((__noinline__))
+#define SIZE_BUMP 10 
+#define MAX(a, b) ((a) > (b) ? (a) : (b))
+
+/* In general, Due to type casting, the type for the pointee of a pointer
+   does not say 

[PATCH v6 2/5] Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.

2024-02-16 Thread Qing Zhao
Including the following changes:
* The definition of the new internal function .ACCESS_WITH_SIZE
  in internal-fn.def.
* C FE converts every reference to a FAM with a "counted_by" attribute
  to a call to the internal function .ACCESS_WITH_SIZE.
  (build_component_ref in c_typeck.cc)

  This includes the case when the object is statically allocated and
  initialized.
  In order to make this working, the routines initializer_constant_valid_p_1
  and output_constant in varasm.cc are updated to handle calls to
  .ACCESS_WITH_SIZE.
  (initializer_constant_valid_p_1 and output_constant in varasm.c)

  However, for the reference inside "offsetof", the "counted_by" attribute is
  ignored since it's not useful at all.
  (c_parser_postfix_expression in c/c-parser.cc)

  In addtion to "offsetof", for the reference inside operator "typeof" and
  "alignof", we ignore counted_by attribute too.

  When building ADDR_EXPR for the .ACCESS_WITH_SIZE in C FE,
  replace the call with its first argument.

* Convert every call to .ACCESS_WITH_SIZE to its first argument.
  (expand_ACCESS_WITH_SIZE in internal-fn.cc)
* Adjust alias analysis to exclude the new internal from clobbering anything.
  (ref_maybe_used_by_call_p_1 and call_may_clobber_ref_p_1 in tree-ssa-alias.cc)
* Adjust dead code elimination to eliminate the call to .ACCESS_WITH_SIZE when
  it's LHS is eliminated as dead code.
  (eliminate_unnecessary_stmts in tree-ssa-dce.cc)
* Provide the utility routines to check the call is .ACCESS_WITH_SIZE and
  get the reference from the call to .ACCESS_WITH_SIZE.
  (is_access_with_size_p and get_ref_from_access_with_size in tree.cc)

gcc/c/ChangeLog:

* c-parser.cc (c_parser_postfix_expression): Ignore the counted-by
attribute when build_component_ref inside offsetof operator.
* c-tree.h (build_component_ref): Add one more parameter.
* c-typeck.cc (build_counted_by_ref): New function.
(build_access_with_size_for_counted_by): New function.
(build_component_ref): Check the counted-by attribute and build
call to .ACCESS_WITH_SIZE.
(build_unary_op): When building ADDR_EXPR for
.ACCESS_WITH_SIZE, use its first argument.
(lvalue_p): Accept call to .ACCESS_WITH_SIZE.

gcc/ChangeLog:

* internal-fn.cc (expand_ACCESS_WITH_SIZE): New function.
* internal-fn.def (ACCESS_WITH_SIZE): New internal function.
* tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Special case
IFN_ACCESS_WITH_SIZE.
(call_may_clobber_ref_p_1): Special case IFN_ACCESS_WITH_SIZE.
* tree-ssa-dce.cc (eliminate_unnecessary_stmts): Eliminate the call
to .ACCESS_WITH_SIZE when its LHS is dead.
* tree.cc (process_call_operands): Adjust side effect for function
.ACCESS_WITH_SIZE.
(is_access_with_size_p): New function.
(get_ref_from_access_with_size): New function.
* tree.h (is_access_with_size_p): New prototype.
(get_ref_from_access_with_size): New prototype.
* varasm.cc (initializer_constant_valid_p_1): Handle call to
.ACCESS_WITH_SIZE.
(output_constant): Handle call to .ACCESS_WITH_SIZE.

gcc/testsuite/ChangeLog:

* gcc.dg/flex-array-counted-by-2.c: New test.
---
 gcc/c/c-parser.cc |  10 +-
 gcc/c/c-tree.h|   2 +-
 gcc/c/c-typeck.cc | 128 +-
 gcc/internal-fn.cc|  36 +
 gcc/internal-fn.def   |   4 +
 .../gcc.dg/flex-array-counted-by-2.c  | 112 +++
 gcc/tree-ssa-alias.cc |   2 +
 gcc/tree-ssa-dce.cc   |   5 +-
 gcc/tree.cc   |  25 +++-
 gcc/tree.h|   8 ++
 gcc/varasm.cc |  10 ++
 11 files changed, 332 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index c31349dae2ff..a6ed5ac43bb1 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -10850,9 +10850,12 @@ c_parser_postfix_expression (c_parser *parser)
if (c_parser_next_token_is (parser, CPP_NAME))
  {
c_token *comp_tok = c_parser_peek_token (parser);
+   /* Ignore the counted_by attribute for reference inside
+  offsetof since the information is not useful at all.  */
offsetof_ref
  = build_component_ref (loc, offsetof_ref, comp_tok->value,
-comp_tok->location, UNKNOWN_LOCATION);
+comp_tok->location, UNKNOWN_LOCATION,
+false);
c_parser_consume_token (parser);
while (c_parser_next_token_is (parser, CPP_DOT)
   || 

[PATCH v6 0/5]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-02-16 Thread Qing Zhao
e inside operator "typeof" and
  "alignof", we ignore counted_by attribute too.
When building ADDR_EXPR for the .ACCESS_WITH_SIZE in C FE,
  replace the call with its first argument.

  * Convert every call to .ACCESS_WITH_SIZE to its first argument.
(expand_ACCESS_WITH_SIZE in internal-fn.cc)
  * adjust alias analysis to exclude the new internal from clobbering 
anything.
(ref_maybe_used_by_call_p_1 and call_may_clobber_ref_p_1 in 
tree-ssa-alias.cc)
  * adjust dead code elimination to eliminate the call to .ACCESS_WITH_SIZE 
when
it's LHS is eliminated as dead code.
(eliminate_unnecessary_stmts in tree-ssa-dce.cc)
  * Provide the utility routines to check the call is .ACCESS_WITH_SIZE and
get the reference from the call to .ACCESS_WITH_SIZE.
(is_access_with_size_p and get_ref_from_access_with_size in tree.cc)
  * testing cases. (for offsetof, static initialization, generation of 
calls to
.ACCESS_WITH_SIZE, code runs correctly after calls to .ACCESS_WITH_SIZE 
are
converted back)

3. Use the .ACCESS_WITH_SIZE in builtin object size (sub-object only)
  which includes:
  * use the size info of the .ACCESS_WITH_SIZE for sub-object.
  * when the size is a negative integer, treat it as zero.
  * testing cases. 

4 Use the .ACCESS_WITH_SIZE in bound sanitizer
  which includes:
  * Instrument array_ref with a call to .ACCESS_WITH_SIZE for bound 
sanitizer.
  * when the size is a negative integer, treat it as zero.
  * testing cases. 

5. Add the 6th argument to .ACCESS_WITH_SIZE to carry the TYPE of the flexible 
array.
  which includes:
  * Add the 6th argument to .ACCESS_WITH_SIZE.
  * use the type of the 6th argument of the routine in tree-object-size.cc
  * testing case.

**Remaining works: 

6  Improve __bdos to use the counted_by info in whole-object size for the 
structure with FAM.
7  Emit warnings when the user breaks the requirments for the new counted_by 
attribute
   compilation time: -Wcounted-by
   run time: -fsanitizer=counted-by
  * The initialization to the size field should be done before the first 
reference to the FAM field.
  * the array has at least # of elements specified by the size field all 
the time during the program.

I have bootstrapped and regression tested on both x86 and aarch64, no issue.
Linux kernel linux-6.8-rc4 has been built and exposed one bug with the new 
counted-by, fixed.

Let me know your comments.

thanks.

Qing

Qing Zhao (5):
  Provide counted_by attribute to flexible array member field (PR108896)
  Convert references with "counted_by" attributes to/from
.ACCESS_WITH_SIZE.
  Use the .ACCESS_WITH_SIZE in builtin object size.
  Use the .ACCESS_WITH_SIZE in bound sanitizer.
  Add the 6th argument to .ACCESS_WITH_SIZE


Re: [PATCH v5 0/4] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-02-16 Thread Qing Zhao
An update to the 5th version of the patches:

Kees helped me to do more testings, and found one issue:

===
   We cannot use the result type or the type of the 1st argument
of the routine .ACCESS_WITH_SIZE to decide the element type
of the original array due to possible type casting in the
source code.

The element type of the original array is needed during tree-object-size.cc 
<http://tree-object-size.cc/>.
===

In order to resolve this issue, I added the 6th argument to the routine 
.ACCESS_WITH_SIZE
to carry the original type of the array:

-   (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1))
+   (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1,
+   (TYPE_OF_ARRAY *)0))

+   The 6th argument of the call is a constant 0 with the pointer TYPE
+   to the original flexible array type.
+

With this fix. The kernel (with counted-by annotation) has been built 
successfully and the gcc with counted-by
Support found one kernel bug!!.

Other testings were all good.

I will send the 6th version of the patch soon.  (The only change of the 6th 
version compared to the 5th version
Is the above fix).

Thanks.

Qing

> On Feb 9, 2024, at 10:54 AM, Qing Zhao  wrote:
> 
> Hi,
> 
> This is the 5th version of the patch.
> 
> compare with the 4th version, the major difference are:
> 
> 1. Change the return type of the routine .ACCESS_WITH_SIZE 
>   FROM:
> Pointer to the type of the element of the flexible array;
>   TO:
> Pointer to the type of the flexible array;
>And then wrap the call with an indirection reference. 
> 
> 2. Adjust all other parts with this change, (this will simplify the bound 
> sanitizer instrument);
> 
> 3. Add the fixes to the kernel building failures, which include:
>A. The operator “typeof” cannot return correct type for a->array; 
>B. The operator “&” cannot return correct address for a->array;
> 
> 4. Correctly handle the case when the value of “counted-by” is zero or 
> negative as following
>   4.1. Update the counted-by doc with the following:
>When the counted-by field is assigned a negative integer value, the 
> compiler will treat the value as zero. 
>   4.2. Adjust __bdos and array bound sanitizer to handle correctly when 
> “counted-by” is zero. 
> 
> 
> It based on the following proposal:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635884.html
> Represent the missing dependence for the "counted_by" attribute and its 
> consumers
> 
> **The summary of the proposal is:
> 
> * Add a new internal function ".ACCESS_WITH_SIZE" to carry the size 
> information for every reference to a FAM field;
> * In C FE, Replace every reference to a FAM field whose TYPE has the 
> "counted_by" attribute with the new internal function ".ACCESS_WITH_SIZE";
> * In every consumer of the size information, for example, BDOS or array bound 
> sanitizer, query the size information or ACCESS_MODE information from the new 
> internal function;
> * When expansing to RTL, replace the internal function with the actual 
> reference to the FAM field;
> * Some adjustment to ipa alias analysis, and other SSA passes to mitigate the 
> impact to the optimizer and code generation.
> 
> 
> **The new internal function
> 
>  .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, TYPE_OF_SIZE, 
> ACCESS_MODE)
> 
> INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)
> 
> which returns the "REF_TO_OBJ" same as the 1st argument;
> 
> Both the return type and the type of the first argument of this function have 
> been converted from the incomplete array type to the corresponding pointer 
> type.
> 
> The call to .ACCESS_WITH_SIZE is wrapped with an INDIRECT_REF, whose type is 
> the original imcomplete array type.
> 
> Please see the following link for why:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638793.html
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639605.html
> 
> 1st argument "REF_TO_OBJ": The reference to the object;
> 2nd argument "REF_TO_SIZE": The reference to the size of the object,
> 3rd argument "CLASS_OF_SIZE": The size referenced by the REF_TO_SIZE 
> represents
>   0: unknown;
>   1: the number of the elements of the object type;
>   2: the number of bytes;
> 4th argument TYPE_OF_SIZE: A constant 0 with the TYPE of the object
>  refed by REF_TO_SIZE
> 5th argument "ACCESS_MODE":
>  -1: Unknown access semantics
>   0: none
>   1: read_only
>   2: write_only
>   3: read_write
> 
> ** The Patch sets included:
> 
> 1. Provide counted_by attribute to flexible array member fie

[PATCH v5 4/4] Use the .ACCESS_WITH_SIZE in bound sanitizer.

2024-02-09 Thread Qing Zhao
gcc/c-family/ChangeLog:

* c-ubsan.cc (get_bound_from_access_with_size): New function.
(ubsan_instrument_bounds): Handle call to .ACCESS_WITH_SIZE.

gcc/testsuite/ChangeLog:

* gcc.dg/ubsan/flex-array-counted-by-bounds-2.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds-3.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds.c: New test.
---
 gcc/c-family/c-ubsan.cc   | 42 +
 .../ubsan/flex-array-counted-by-bounds-2.c| 45 ++
 .../ubsan/flex-array-counted-by-bounds-3.c| 34 ++
 .../ubsan/flex-array-counted-by-bounds.c  | 46 +++
 4 files changed, 167 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-3.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c

diff --git a/gcc/c-family/c-ubsan.cc b/gcc/c-family/c-ubsan.cc
index 940982819ddf..164b29845b3a 100644
--- a/gcc/c-family/c-ubsan.cc
+++ b/gcc/c-family/c-ubsan.cc
@@ -376,6 +376,40 @@ ubsan_instrument_return (location_t loc)
   return build_call_expr_loc (loc, t, 1, build_fold_addr_expr_loc (loc, data));
 }
 
+/* Get the tree that represented the number of counted_by, i.e, the maximum
+   number of the elements of the object that the call to .ACCESS_WITH_SIZE
+   points to, this number will be the bound of the corresponding array.  */
+static tree
+get_bound_from_access_with_size (tree call)
+{
+  if (!is_access_with_size_p (call))
+return NULL_TREE;
+
+  tree ref_to_size = CALL_EXPR_ARG (call, 1);
+  unsigned int type_of_size = TREE_INT_CST_LOW (CALL_EXPR_ARG (call, 2));
+  tree type = TREE_TYPE (CALL_EXPR_ARG (call, 3));
+  tree size = fold_build2 (MEM_REF, type, unshare_expr (ref_to_size),
+  build_int_cst (ptr_type_node, 0));
+  /* If size is negative value, treat it as zero.  */
+  if (!TYPE_UNSIGNED (type))
+  {
+tree cond = fold_build2 (LT_EXPR, boolean_type_node,
+unshare_expr (size), build_zero_cst (type));
+size = fold_build3 (COND_EXPR, type, cond,
+   build_zero_cst (type), size);
+  }
+
+  /* Only when type_of_size is 1,i.e, the number of the elements of
+ the object type, return the size.  */
+  if (type_of_size != 1)
+return NULL_TREE;
+  else
+size = fold_convert (sizetype, size);
+
+  return size;
+}
+
+
 /* Instrument array bounds for ARRAY_REFs.  We create special builtin,
that gets expanded in the sanopt pass, and make an array dimension
of it.  ARRAY is the array, *INDEX is an index to the array.
@@ -401,6 +435,14 @@ ubsan_instrument_bounds (location_t loc, tree array, tree 
*index,
  && COMPLETE_TYPE_P (type)
  && integer_zerop (TYPE_SIZE (type)))
bound = build_int_cst (TREE_TYPE (TYPE_MIN_VALUE (domain)), -1);
+  else if (INDIRECT_REF_P (array)
+  && is_access_with_size_p ((TREE_OPERAND (array, 0
+   {
+ bound = get_bound_from_access_with_size ((TREE_OPERAND (array, 0)));
+ bound = fold_build2 (MINUS_EXPR, TREE_TYPE (bound),
+  bound,
+  build_int_cst (TREE_TYPE (bound), 1));
+   }
   else
return NULL_TREE;
 }
diff --git a/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c 
b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
new file mode 100644
index ..148934975ee5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
@@ -0,0 +1,45 @@
+/* test the attribute counted_by and its usage in
+   bounds sanitizer combined with VLA.  */
+/* { dg-do run } */
+/* { dg-options "-fsanitize=bounds" } */
+/* { dg-output "index 11 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 20 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 11 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 10 out of bounds for type 'int 
\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+
+
+#include 
+
+void __attribute__((__noinline__)) setup_and_test_vla (int n, int m)
+{
+   struct foo {
+   int n;
+   int p[][n] __attribute__((counted_by(n)));
+   } *f;
+
+   f = (struct foo *) malloc (sizeof(struct foo) + m*sizeof(int[n]));
+   f->n = m;
+   f->p[m][n-1]=1;
+   return;
+}
+
+void __attribute__((__noinline__)) setup_and_test_vla_1 (int n1, int n2, int m)
+{
+  struct foo {
+int n;
+int p[][n2][n1] __attribute__((counted_by(n)));
+  } *f;
+
+  f = (struct foo *) malloc (sizeof(struct foo) + m*sizeof(int[n2][n1]));
+  f->n = m;
+  f->p[m][n2][n1]=1;
+  return;
+}
+
+int main(int argc, char *argv[])
+{
+  setup_and_test_vla (10, 11);
+  setup_and_test_vla_1 (10, 11, 20);
+  return 0;

[PATCH v5 2/4] Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.

2024-02-09 Thread Qing Zhao
Including the following changes:
* The definition of the new internal function .ACCESS_WITH_SIZE
  in internal-fn.def.
* C FE converts every reference to a FAM with a "counted_by" attribute
  to a call to the internal function .ACCESS_WITH_SIZE.
  (build_component_ref in c_typeck.cc)

  This includes the case when the object is statically allocated and
  initialized.
  In order to make this working, the routines initializer_constant_valid_p_1
  and output_constant in varasm.cc are updated to handle calls to
  .ACCESS_WITH_SIZE.
  (initializer_constant_valid_p_1 and output_constant in varasm.c)

  However, for the reference inside "offsetof", the "counted_by" attribute is
  ignored since it's not useful at all.
  (c_parser_postfix_expression in c/c-parser.cc)

  In addtion to "offsetof", for the reference inside operator "typeof" and
  "alignof", we ignore counted_by attribute too.

  When building ADDR_EXPR for the .ACCESS_WITH_SIZE in C FE,
  replace the call with its first argument.

* Convert every call to .ACCESS_WITH_SIZE to its first argument.
  (expand_ACCESS_WITH_SIZE in internal-fn.cc)
* Adjust alias analysis to exclude the new internal from clobbering anything.
  (ref_maybe_used_by_call_p_1 and call_may_clobber_ref_p_1 in tree-ssa-alias.cc)
* Adjust dead code elimination to eliminate the call to .ACCESS_WITH_SIZE when
  it's LHS is eliminated as dead code.
  (eliminate_unnecessary_stmts in tree-ssa-dce.cc)
* Provide the utility routines to check the call is .ACCESS_WITH_SIZE and
  get the reference from the call to .ACCESS_WITH_SIZE.
  (is_access_with_size_p and get_ref_from_access_with_size in tree.cc)

gcc/c/ChangeLog:

* c-parser.cc (c_parser_postfix_expression): Ignore the counted-by
attribute when build_component_ref inside offsetof operator.
* c-tree.h (build_component_ref): Add one more parameter.
* c-typeck.cc (build_counted_by_ref): New function.
(build_access_with_size_for_counted_by): New function.
(build_component_ref): Check the counted-by attribute and build
call to .ACCESS_WITH_SIZE.
(build_unary_op): When building ADDR_EXPR for
.ACCESS_WITH_SIZE, use its first argument.
(lvalue_p): Accept call to .ACCESS_WITH_SIZE.

gcc/ChangeLog:

* internal-fn.cc (expand_ACCESS_WITH_SIZE): New function.
* internal-fn.def (ACCESS_WITH_SIZE): New internal function.
* tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Special case
IFN_ACCESS_WITH_SIZE.
(call_may_clobber_ref_p_1): Special case IFN_ACCESS_WITH_SIZE.
* tree-ssa-dce.cc (eliminate_unnecessary_stmts): Eliminate the call
to .ACCESS_WITH_SIZE when its LHS is dead.
* tree.cc (process_call_operands): Adjust side effect for function
.ACCESS_WITH_SIZE.
(is_access_with_size_p): New function.
(get_ref_from_access_with_size): New function.
* tree.h (is_access_with_size_p): New prototype.
(get_ref_from_access_with_size): New prototype.
* varasm.cc (initializer_constant_valid_p_1): Handle call to
.ACCESS_WITH_SIZE.
(output_constant): Handle call to .ACCESS_WITH_SIZE.

gcc/testsuite/ChangeLog:

* gcc.dg/flex-array-counted-by-2.c: New test.
---
 gcc/c/c-parser.cc |  10 +-
 gcc/c/c-tree.h|   2 +-
 gcc/c/c-typeck.cc | 128 +-
 gcc/internal-fn.cc|  36 +
 gcc/internal-fn.def   |   4 +
 .../gcc.dg/flex-array-counted-by-2.c  | 112 +++
 gcc/tree-ssa-alias.cc |   2 +
 gcc/tree-ssa-dce.cc   |   5 +-
 gcc/tree.cc   |  25 +++-
 gcc/tree.h|   8 ++
 gcc/varasm.cc |  10 ++
 11 files changed, 332 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index c31349dae2ff..a6ed5ac43bb1 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -10850,9 +10850,12 @@ c_parser_postfix_expression (c_parser *parser)
if (c_parser_next_token_is (parser, CPP_NAME))
  {
c_token *comp_tok = c_parser_peek_token (parser);
+   /* Ignore the counted_by attribute for reference inside
+  offsetof since the information is not useful at all.  */
offsetof_ref
  = build_component_ref (loc, offsetof_ref, comp_tok->value,
-comp_tok->location, UNKNOWN_LOCATION);
+comp_tok->location, UNKNOWN_LOCATION,
+false);
c_parser_consume_token (parser);
while (c_parser_next_token_is (parser, CPP_DOT)
   || 

[PATCH v5 3/4] Use the .ACCESS_WITH_SIZE in builtin object size.

2024-02-09 Thread Qing Zhao
gcc/ChangeLog:

* tree-object-size.cc (access_with_size_object_size): New function.
(call_object_size): Call the new function.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-object-size-common.h: Add a new macro EXPECT.
* gcc.dg/flex-array-counted-by-3.c: New test.
* gcc.dg/flex-array-counted-by-4.c: New test.
* gcc.dg/flex-array-counted-by-5.c: New test.
---
 .../gcc.dg/builtin-object-size-common.h   |  11 ++
 .../gcc.dg/flex-array-counted-by-3.c  |  63 +++
 .../gcc.dg/flex-array-counted-by-4.c  | 178 ++
 .../gcc.dg/flex-array-counted-by-5.c  |  48 +
 gcc/tree-object-size.cc   |  59 ++
 5 files changed, 359 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-5.c

diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-common.h 
b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
index 66ff7cdd953a..b677067c6e6b 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-common.h
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
@@ -30,3 +30,14 @@ unsigned nfails = 0;
   __builtin_abort ();\
 return 0;\
   } while (0)
+
+#define EXPECT(p, _v) do {   \
+  size_t v = _v; \
+  if (p == v)\
+__builtin_printf ("ok:  %s == %zd\n", #p, p);\
+  else   \
+{\
+  __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v);\
+  FAIL ();   \
+}\
+} while (0);
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
new file mode 100644
index ..0066c32ca808
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
@@ -0,0 +1,63 @@
+/* test the attribute counted_by and its usage in
+ * __builtin_dynamic_object_size.  */ 
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "builtin-object-size-common.h"
+
+struct flex {
+  int b;
+  int c[];
+} *array_flex;
+
+struct annotated {
+  int b;
+  int c[] __attribute__ ((counted_by (b)));
+} *array_annotated;
+
+struct nested_annotated {
+  struct {
+union {
+  int b;
+  float f; 
+};
+int n;
+  };
+  int c[] __attribute__ ((counted_by (b)));
+} *array_nested_annotated;
+
+void __attribute__((__noinline__)) setup (int normal_count, int attr_count)
+{
+  array_flex
+= (struct flex *)malloc (sizeof (struct flex)
++ normal_count *  sizeof (int));
+  array_flex->b = normal_count;
+
+  array_annotated
+= (struct annotated *)malloc (sizeof (struct annotated)
+ + attr_count *  sizeof (int));
+  array_annotated->b = attr_count;
+
+  array_nested_annotated
+= (struct nested_annotated *)malloc (sizeof (struct nested_annotated)
++ attr_count *  sizeof (int));
+  array_nested_annotated->b = attr_count;
+
+  return;
+}
+
+void __attribute__((__noinline__)) test ()
+{
+EXPECT(__builtin_dynamic_object_size(array_flex->c, 1), -1);
+EXPECT(__builtin_dynamic_object_size(array_annotated->c, 1),
+  array_annotated->b * sizeof (int));
+EXPECT(__builtin_dynamic_object_size(array_nested_annotated->c, 1),
+  array_nested_annotated->b * sizeof (int));
+}
+
+int main(int argc, char *argv[])
+{
+  setup (10,10);   
+  test ();
+  DONE ();
+}
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
new file mode 100644
index ..3ce7f3545549
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
@@ -0,0 +1,178 @@
+/* test the attribute counted_by and its usage in
+__builtin_dynamic_object_size: what's the correct behavior when the
+allocation size mismatched with the value of counted_by attribute?
+we should always use the latest value that is hold by the counted_by
+field.  */
+/* { dg-do run } */
+/* { dg-options "-O -fstrict-flex-arrays=3" } */
+
+#include "builtin-object-size-common.h"
+
+struct annotated {
+  size_t foo;
+  char others;
+  char array[] __attribute__((counted_by (foo)));
+};
+
+#define noinline __attribute__((__noinline__))
+#define SIZE_BUMP 10 
+#define MAX(a, b) ((a) > (b) ? (a) : (b))
+
+/* In general, Due to type casting, the type for the pointee of a pointer
+   does not say 

[PATCH v5 0/4] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-02-09 Thread Qing Zhao
Hi,

This is the 5th version of the patch.

compare with the 4th version, the major difference are:

1. Change the return type of the routine .ACCESS_WITH_SIZE 
   FROM:
 Pointer to the type of the element of the flexible array;
   TO:
 Pointer to the type of the flexible array;
And then wrap the call with an indirection reference. 

2. Adjust all other parts with this change, (this will simplify the bound 
sanitizer instrument);

3. Add the fixes to the kernel building failures, which include:
A. The operator ???typeof??? cannot return correct type for a->array; 
B. The operator ???&??? cannot return correct address for a->array;

4. Correctly handle the case when the value of ???counted-by??? is zero or 
negative as following
   4.1. Update the counted-by doc with the following:
When the counted-by field is assigned a negative integer value, the 
compiler will treat the value as zero. 
   4.2. Adjust __bdos and array bound sanitizer to handle correctly when 
???counted-by??? is zero. 


It based on the following proposal:

https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635884.html
Represent the missing dependence for the "counted_by" attribute and its 
consumers

**The summary of the proposal is:

* Add a new internal function ".ACCESS_WITH_SIZE" to carry the size information 
for every reference to a FAM field;
* In C FE, Replace every reference to a FAM field whose TYPE has the 
"counted_by" attribute with the new internal function ".ACCESS_WITH_SIZE";
* In every consumer of the size information, for example, BDOS or array bound 
sanitizer, query the size information or ACCESS_MODE information from the new 
internal function;
* When expansing to RTL, replace the internal function with the actual 
reference to the FAM field;
* Some adjustment to ipa alias analysis, and other SSA passes to mitigate the 
impact to the optimizer and code generation.


**The new internal function

  .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, TYPE_OF_SIZE, 
ACCESS_MODE)

INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)

which returns the "REF_TO_OBJ" same as the 1st argument;

Both the return type and the type of the first argument of this function have 
been converted from the incomplete array type to the corresponding pointer type.

The call to .ACCESS_WITH_SIZE is wrapped with an INDIRECT_REF, whose type is 
the original imcomplete array type.

Please see the following link for why:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638793.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639605.html

1st argument "REF_TO_OBJ": The reference to the object;
2nd argument "REF_TO_SIZE": The reference to the size of the object,
3rd argument "CLASS_OF_SIZE": The size referenced by the REF_TO_SIZE represents
   0: unknown;
   1: the number of the elements of the object type;
   2: the number of bytes;
4th argument TYPE_OF_SIZE: A constant 0 with the TYPE of the object
  refed by REF_TO_SIZE
5th argument "ACCESS_MODE":
  -1: Unknown access semantics
   0: none
   1: read_only
   2: write_only
   3: read_write

** The Patch sets included:

1. Provide counted_by attribute to flexible array member field;
  which includes:
  * "counted_by" attribute documentation;
  * C FE handling of the new attribute;
syntax checking, error reporting;
  * testing cases;

2. Convert "counted_by" attribute to/from .ACCESS_WITH_SIZE.
  which includes:
  * The definition of the new internal function .ACCESS_WITH_SIZE in 
internal-fn.def.
  * C FE converts every reference to a FAM with "counted_by" attribute to a 
call to the internal function .ACCESS_WITH_SIZE.
(build_component_ref in c_typeck.cc)
This includes the case when the object is statically allocated and 
initialized.
In order to make this working, we should update 
initializer_constant_valid_p_1 and output_constant in varasm.cc to include 
calls to .ACCESS_WITH_SIZE.

However, for the reference inside "offsetof", ignore the "counted_by" 
attribute since it's not useful at all. (c_parser_postfix_expression in 
c/c-parser.cc)
In addtion to "offsetof", for the reference inside operator "typeof" and
  "alignof", we ignore counted_by attribute too.
When building ADDR_EXPR for the .ACCESS_WITH_SIZE in C FE,
  replace the call with its first argument.

  * Convert every call to .ACCESS_WITH_SIZE to its first argument.
(expand_ACCESS_WITH_SIZE in internal-fn.cc)
  * adjust alias analysis to exclude the new internal from clobbering 
anything.
(ref_maybe_used_by_call_p_1 and call_may_clobber_ref_p_1 in 
tree-ssa-alias.cc)
  * adjust dead code elimination to eliminate the call to .ACCESS_WITH_SIZE 
when
it's LHS is eliminated as dead code.
(eliminate_unnecessary_stmts in tree-ssa-dce.cc)
  * Provide the utility routines to check the call is .ACCESS_WITH_SIZE and
get the 

[PATCH v5 1/4] Provide counted_by attribute to flexible array member field (PR108896)

2024-02-09 Thread Qing Zhao
'counted_by (COUNT)'
 The 'counted_by' attribute may be attached to the C99 flexible
 array member of a structure.  It indicates that the number of the
 elements of the array is given by the field named "COUNT" in the
 same structure as the flexible array member.  GCC uses this
 information to improve the results of the array bound sanitizer and
 the '__builtin_dynamic_object_size'.

 For instance, the following code:

  struct P {
size_t count;
char other;
char array[] __attribute__ ((counted_by (count)));
  } *p;

 specifies that the 'array' is a flexible array member whose number
 of elements is given by the field 'count' in the same structure.

 The field that represents the number of the elements should have an
 integer type.  Otherwise, the compiler will report a warning and
 ignore the attribute.

 When the field that represents the number of the elements is assigned a
 negative integer value, the compiler will treat the value as zero.

 An explicit 'counted_by' annotation defines a relationship between
 two objects, 'p->array' and 'p->count', and there are the following
 requirementthat on the relationship between this pair:

* 'p->count' should be initialized before the first reference to
  'p->array';

* 'p->array' has _at least_ 'p->count' number of elements
  available all the time.  This relationship must hold even
  after any of these related objects are updated during the
  program.

 It's the user's responsibility to make sure the above requirements
 to be kept all the time.  Otherwise the compiler will report
 warnings, at the same time, the results of the array bound
 sanitizer and the '__builtin_dynamic_object_size' is undefined.

 One important feature of the attribute is, a reference to the
 flexible array member field will use the latest value assigned to
 the field that represents the number of the elements before that
 reference.  For example,

p->count = val1;
p->array[20] = 0;  // ref1 to p->array
p->count = val2;
p->array[30] = 0;  // ref2 to p->array

 in the above, 'ref1' will use 'val1' as the number of the elements
 in 'p->array', and 'ref2' will use 'val2' as the number of elements
 in 'p->array'.

gcc/c-family/ChangeLog:

PR C/108896
* c-attribs.cc (handle_counted_by_attribute): New function.
(attribute_takes_identifier_p): Add counted_by attribute to the list.
* c-common.cc (c_flexible_array_member_type_p): ...To this.
* c-common.h (c_flexible_array_member_type_p): New prototype.

gcc/c/ChangeLog:

PR C/108896
* c-decl.cc (flexible_array_member_type_p): Renamed and moved to...
(add_flexible_array_elts_to_size): Use renamed function.
(is_flexible_array_member_p): Use renamed function.
(verify_counted_by_attribute): New function.
(finish_struct): Use renamed function and verify counted_by
attribute.
* c-tree.h (lookup_field): New prototype.
* c-typeck.cc (lookup_field): Expose as extern function.

gcc/ChangeLog:

PR C/108896
* doc/extend.texi: Document attribute counted_by.

gcc/testsuite/ChangeLog:

PR C/108896
* gcc.dg/flex-array-counted-by.c: New test.
---
 gcc/c-family/c-attribs.cc| 54 -
 gcc/c-family/c-common.cc | 13 +++
 gcc/c-family/c-common.h  |  1 +
 gcc/c/c-decl.cc  | 85 
 gcc/c/c-tree.h   |  1 +
 gcc/c/c-typeck.cc|  3 +-
 gcc/doc/extend.texi  | 64 +++
 gcc/testsuite/gcc.dg/flex-array-counted-by.c | 40 +
 8 files changed, 241 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by.c

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 40a0cf90295d..4395c0656b14 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -105,6 +105,8 @@ static tree handle_warn_if_not_aligned_attribute (tree *, 
tree, tree,
  int, bool *);
 static tree handle_strict_flex_array_attribute (tree *, tree, tree,
 int, bool *);
+static tree handle_counted_by_attribute (tree *, tree, tree,
+  int, bool *);
 static tree handle_weak_attribute (tree *, tree, tree, int, bool *) ;
 static tree handle_noplt_attribute (tree *, tree, tree, int, bool *) ;
 static tree handle_alias_ifunc_attribute (bool, tree *, tree, tree, bool *);
@@ -412,6 +414,8 @@ const struct attribute_spec c_common_gnu_attributes[] =
  handle_warn_if_not_aligned_attribute, NULL },

Re: [PATCH v4 0/4]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-01-30 Thread Qing Zhao
Okay, Based on the comments so far, I will work on the 5th version of the 
patch, major changes will include:

1. Change the return type of the routine .ACCESS_WITH_SIZE 
FROM:
  Pointer to the type of the element of the flexible array;
TO:
   Pointer to the type of the flexible array;

 And then wrap the call with an indirection reference. 

2. Adjust all other parts with this change, (this will simplify the bound 
sanitizer instrument);

3. Add the fixes to the kernel building failures, which include:

 A. The operator “typeof” cannot return correct type for a->array; 
(I guess that the above change 1 might automatically resolve this issue)
 B. The operator “&” cannot return correct address for a->array;

4. Correctly handle the case when the value of “counted-by” is zero or negative 
as following

4.1 . Update the counted-by doc with the following:

 When the counted-by field is assigned a negative integer value, the 
compiler will treat the value as zero. 

4.2.   (possibly) Adjust __bdos and array bound sanitizer to handle 
correctly when “counted-by” is zero. 

   __bdos will return size 0 when counted-by is zero;

  Array bound sanitizer will report out-of-bound when the counted-by is 
zero for any array access. 

Let me know if I missed anything.

Thanks a lot for all the comments

 Qing


> On Jan 23, 2024, at 7:29 PM, Qing Zhao  wrote:
> 
> Hi,
> 
> This is the 4th version of the patch.
> 
> It based on the following proposal:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635884.html
> Represent the missing dependence for the "counted_by" attribute and its 
> consumers
> 
> **The summary of the proposal is:
> 
> * Add a new internal function ".ACCESS_WITH_SIZE" to carry the size 
> information for every reference to a FAM field;
> * In C FE, Replace every reference to a FAM field whose TYPE has the 
> "counted_by" attribute with the new internal function ".ACCESS_WITH_SIZE";
> * In every consumer of the size information, for example, BDOS or array bound 
> sanitizer, query the size information or ACCESS_MODE information from the new 
> internal function;
> * When expansing to RTL, replace the internal function with the actual 
> reference to the FAM field;
> * Some adjustment to ipa alias analysis, and other SSA passes to mitigate the 
> impact to the optimizer and code generation.
> 
> 
> **The new internal function
> 
>  .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, SIZE_OF_SIZE, 
> ACCESS_MODE, INDEX)
> 
> INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)
> 
> which returns the "REF_TO_OBJ" same as the 1st argument;
> 
> Both the return type and the type of the first argument of this function have 
> been converted from the incomplete array type to the corresponding pointer 
> type.
> 
> Please see the following link for why:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638793.html
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639605.html
> 
> 1st argument "REF_TO_OBJ": The reference to the object;
> 2nd argument "REF_TO_SIZE": The reference to the size of the object,
> 3rd argument "CLASS_OF_SIZE": The size referenced by the REF_TO_SIZE 
> represents
>   0: unknown;
>   1: the number of the elements of the object type;
>   2: the number of bytes;
> 4th argument "PRECISION_OF_SIZE": The precision of the integer that 
> REF_TO_SIZE points;
> 5th argument "ACCESS_MODE":
>  -1: Unknown access semantics
>   0: none
>   1: read_only
>   2: write_only
>   3: read_write
> 6th argument "INDEX": the INDEX for the original array reference.
>  -1: Unknown
> 
> NOTE: The 6th Argument is added for bound sanitizer instrumentation.
> 
> ** The Patch sets included:
> 
> 1. Provide counted_by attribute to flexible array member field;
>  which includes:
>  * "counted_by" attribute documentation;
>  * C FE handling of the new attribute;
>syntax checking, error reporting;
>  * testing cases;
> 
> 2. Convert "counted_by" attribute to/from .ACCESS_WITH_SIZE.
>  which includes:
>  * The definition of the new internal function .ACCESS_WITH_SIZE in 
> internal-fn.def.
>  * C FE converts every reference to a FAM with "counted_by" attribute to 
> a call to the internal function .ACCESS_WITH_SIZE.
>(build_component_ref in c_typeck.cc)
>This includes the case when the object is statically allocated and 
> initialized.
>In order to make this working, we should update 
> initializer_constant_valid_p_1 and output_constant in va

Re: [PATCH v4 0/4]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-01-30 Thread Qing Zhao


> On Jan 30, 2024, at 12:41 AM, Kees Cook  wrote:
> 
> On Mon, Jan 29, 2024 at 10:45:23PM +0000, Qing Zhao wrote:
>> There are two things here. 
>> 
>> 1. The value of the “counted-by” is 0; (which is easy to be understood)
>> 2. The result of the _builtin_object_size when see a “counted-by” 0.
>> 
>> For 1, it’s simple, if we see a counted-by value <= 0,  then counted-by is 0;
> 
> Okay, that's good; this matches my understanding. :)
> 
>> But for 2, when the _builtin_object_size sees a “counted-by” 0, what’s value 
>> it will return for the object size?
>> 
>> Can we return 0 for the object size? 
> 
> I don't see why not. For example:
> 
> // -O2 -fstrict-flex-arrays=3
> struct s {
>int a;
>int b[4];
> } foo;
> 
> #define report(x)   printf("%s: %zu\n", #x, (size_t)(x))
> 
> int main(int argc, char *argv[])
> {
>struct s foo;
>report(__builtin_dynamic_object_size([4], 0));
>report(__builtin_dynamic_object_size([5], 0));
>report(__builtin_dynamic_object_size([-10], 0));
>report(__builtin_dynamic_object_size([4], 1));
>report(__builtin_dynamic_object_size([5], 1));
>report(__builtin_dynamic_object_size([-10], 1));
>report(__builtin_dynamic_object_size([4], 2));
>report(__builtin_dynamic_object_size([5], 2));
>report(__builtin_dynamic_object_size([-10], 2));
>report(__builtin_dynamic_object_size([4], 3));
>report(__builtin_dynamic_object_size([5], 3));
>report(__builtin_dynamic_object_size([-10], 3));
>return 0;
> }
> 
> shows:
> 
> __builtin_dynamic_object_size([4], 0): 0
> __builtin_dynamic_object_size([5], 0): 0
> __builtin_dynamic_object_size([-10], 0): 0
> __builtin_dynamic_object_size([4], 1): 0
> __builtin_dynamic_object_size([5], 1): 0
> __builtin_dynamic_object_size([-10], 1): 0
> __builtin_dynamic_object_size([4], 2): 0
> __builtin_dynamic_object_size([5], 2): 0
> __builtin_dynamic_object_size([-10], 2): 0
> __builtin_dynamic_object_size([4], 3): 0
> __builtin_dynamic_object_size([5], 3): 0
> __builtin_dynamic_object_size([-10], 3): 0
> 
> This is showing "no bytes left" for the end of the b array, and if this
> index keeps going, it still reports 0 if we're past the end of the object
> completely. And it is similarly capped for negative indexes. This is
> true for all the __bos type bits.
> 
> A "counted-by" of 0 (or below) would have the same meaning as an out of
> bounds index here.

Okay. I will keep this behavior when counted-by is zero (and negative) for 
__bos. 
> 
>> (As I mentioned in the previous email, 0 in __builtin_object_size doesn’t 
>> mean size 0,
>> it means UNKNOWN_SIZE when the type is 2/3, So, what’s value we should 
>> return for the size 0?)
>> https://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking.html
> 
> I think I see what you mean, but I still think it should be 0 for 2/3,
> regardless of the documented interpretation. If that's the current
> response for a pathological index under 2/3, then I think it's totally
> reasonable that it should do the same for pathological bounds.

Okay, will keep this behavior for “counted-by” zero. 

(But still feel that 0 for 2/3, i.e the MINIMUM size will represent as 
UNKNOWN_SIZE.
 If that’s the value kernel expected, that’s good)
> 
> 
> And BTW, it seems there are 0-sized objects, though maybe they're some
> kind of special case:
> 
> struct s {
>int a;
>struct { } nothing;
>int b;
> };
> 
> #define report(x)   printf("%s: %zu\n", #x, (size_t)(x))
> 
> int main(int argc, char *argv[])
> {
>struct s foo;
>report(__builtin_dynamic_object_size(, 1));
> }
> 
> shows:
> 
> __builtin_dynamic_object_size(, 1): 0

Looks like that GCC has such extension: 
https://gcc.gnu.org/onlinedocs/gcc/Empty-Structures.html

***GCC permits a C structure to have no members:
struct empty {
};

The structure has size zero. In C++, empty structures are part of the language. 
G++ treats empty structures as if they had a single member of type char.

Thanks.

Qing


> 
> -Kees
> 
> -- 
> Kees Cook



Re: [PATCH v4 0/4]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-01-29 Thread Qing Zhao


> On Jan 29, 2024, at 3:19 PM, Kees Cook  wrote:
> 
> On Mon, Jan 29, 2024 at 07:32:06PM +0000, Qing Zhao wrote:
>> 
>> 
>>> On Jan 29, 2024, at 12:25 PM, Kees Cook  wrote:
>>> 
>>> On Mon, Jan 29, 2024 at 04:00:20PM +, Qing Zhao wrote:
>>>> An update on the kernel building with my version 4 patch.
>>>> 
>>>> Kees reported two FE issues with the current version 4 patch:
>>>> 
>>>> 1. The operator “typeof” cannot return correct type for a->array;
>>>> 2. The operator “&” cannot return correct address for a->array;
>>>> 
>>>> I fixed both in my local repository. 
>>>> 
>>>> With these additional fix.  Kernel with counted-by annotation can be built 
>>>> successfully. 
>>> 
>>> Thanks for the fixes!
>>> 
>>>> 
>>>> And then, Kees reported one behavioral issue with the current counted-by:
>>>> 
>>>> When the counted-by value is below zero, my current patch 
>>>> 
>>>> A. Didn’t report any warning for it.
>>>> B. Accepted the negative value as a wrapped size.
>>>> 
>>>> i.e. for:
>>>> 
>>>> struct foo {
>>>> signed char size;
>>>> unsigned char array[] __counted_by(size);
>>>> } *a;
>>>> 
>>>> ...
>>>> a->size = -3;
>>>> report(__builtin_dynamic_object_size(p->array, 1));
>>>> 
>>>> this reports 253, rather than 0.
>>>> 
>>>> And the array-bounds sanitizer doesn’t catch negative index bounds 
>>>> neither. 
>>>> 
>>>> a->size = -3;
>>>> report(a->array[1]); // does not trap
>>>> 
>>>> 
>>>> So, my questions are:
>>>> 
>>>> How should we handle the negative counted-by value?
>>> 
>>> Treat it as always 0-bounded: count < 0 ? 0 : count
>> 
>> Then the size of the object is 0?
> 
> That would be the purpose, yes. It's possible something else has
> happened, but it would mean "the array contents should not be accessed
> (since we don't have a valid size)".

This might be a new concept we need to add, from my understanding,
 C/C++ don’t have the zero-sized object. 
So, I am a little worried about where should we add this concept?

The most reasonable place I am thinking is adding such concept to the 
doc of “counted-by” attribute, but still not very sure on this.
> 
>> 
>>> 
>>>> 
>>>> My approach is:
>>>> 
>>>>  I think that this is a user error, the compiler need to Issue warning 
>>>> during runtime about this user error.
>>>> 
>>>> Since I have one remaining patch that has not been finished yet:
>>>> 
>>>> 6  Emit warnings when the user breaks the requirments for the new 
>>>> counted_by attribute
>>>> compilation time: -Wcounted-by
>>>> run time: -fsanitizer=counted-by
>>>>* The initialization to the size field should be done before the first 
>>>> reference to the FAM field.
>>> 
>>> I would hope that regular compile-time warnings would catch this.
>> If the value is known at compile-time, then compile-time should catch it.
>> 
>>> 
>>>>* the array has at least # of elements specified by the size field all 
>>>> the time during the program.
>>>>* the value of counted-by should not be negative.
>>> 
>>> This seems reasonable for a very strict program, but it won't work for
>>> the kernel as-is: a negative "count" is sometimes used to carry failure
>>> details back to other users of the structure. This could be refactored in
>>> the kernel, but I'd prefer that even without -fsanitizer=counted-by the
>>> runtime behaviors will be "safe".
>> 
>> So, In the kernel’s source code, for example:
>> 
>> struct foo {
>>  int count;
>>  short array[] __counted_by(count);
>> };
>> 
>> The field “count” will be used for two purposes:
>> A. As the counted_by for the “array” when its value > 0;
>> B. As an errno when its value < 0;  under such condition, the size of 
>> “array” is zero. 
>> 
>> Is the understanding correct?
> 
> Yes.
> 
>> Is doing this for saving space?  (Curious -:)
> 
> It seems so, yes.
> 
>>> It does not seem sensible to me that adding a buffer size validation
>>> primitive to GCC wil

Re: [PATCH v4 0/4]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-01-29 Thread Qing Zhao



> On Jan 29, 2024, at 3:35 PM, Joseph Myers  wrote:
> 
> On Mon, 29 Jan 2024, Qing Zhao wrote:
> 
>> Thank you!
>> 
>> Joseph and Richard,  could you also comment on this?
> 
> I think Martin's suggestions are reasonable.

Okay, I will update the patches based on this approach. 

Thanks a lot for the comment.

Qing
> 
> -- 
> Joseph S. Myers
> josmy...@redhat.com
> 



Re: [PATCH v4 0/4]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-01-29 Thread Qing Zhao


> On Jan 29, 2024, at 12:25 PM, Kees Cook  wrote:
> 
> On Mon, Jan 29, 2024 at 04:00:20PM +0000, Qing Zhao wrote:
>> An update on the kernel building with my version 4 patch.
>> 
>> Kees reported two FE issues with the current version 4 patch:
>> 
>> 1. The operator “typeof” cannot return correct type for a->array;
>> 2. The operator “&” cannot return correct address for a->array;
>> 
>> I fixed both in my local repository. 
>> 
>> With these additional fix.  Kernel with counted-by annotation can be built 
>> successfully. 
> 
> Thanks for the fixes!
> 
>> 
>> And then, Kees reported one behavioral issue with the current counted-by:
>> 
>> When the counted-by value is below zero, my current patch 
>> 
>> A. Didn’t report any warning for it.
>> B. Accepted the negative value as a wrapped size.
>> 
>> i.e. for:
>> 
>> struct foo {
>> signed char size;
>> unsigned char array[] __counted_by(size);
>> } *a;
>> 
>> ...
>> a->size = -3;
>> report(__builtin_dynamic_object_size(p->array, 1));
>> 
>> this reports 253, rather than 0.
>> 
>> And the array-bounds sanitizer doesn’t catch negative index bounds neither. 
>> 
>> a->size = -3;
>> report(a->array[1]); // does not trap
>> 
>> 
>> So, my questions are:
>> 
>> How should we handle the negative counted-by value?
> 
> Treat it as always 0-bounded: count < 0 ? 0 : count

Then the size of the object is 0?

> 
>> 
>> My approach is:
>> 
>>   I think that this is a user error, the compiler need to Issue warning 
>> during runtime about this user error.
>> 
>> Since I have one remaining patch that has not been finished yet:
>> 
>> 6  Emit warnings when the user breaks the requirments for the new counted_by 
>> attribute
>>  compilation time: -Wcounted-by
>>  run time: -fsanitizer=counted-by
>> * The initialization to the size field should be done before the first 
>> reference to the FAM field.
> 
> I would hope that regular compile-time warnings would catch this.
If the value is known at compile-time, then compile-time should catch it.

> 
>> * the array has at least # of elements specified by the size field all 
>> the time during the program.
>> * the value of counted-by should not be negative.
> 
> This seems reasonable for a very strict program, but it won't work for
> the kernel as-is: a negative "count" is sometimes used to carry failure
> details back to other users of the structure. This could be refactored in
> the kernel, but I'd prefer that even without -fsanitizer=counted-by the
> runtime behaviors will be "safe".

So, In the kernel’s source code, for example:

struct foo {
  int count;
  short array[] __counted_by(count);
};

The field “count” will be used for two purposes:
A. As the counted_by for the “array” when its value > 0;
B. As an errno when its value < 0;  under such condition, the size of “array” 
is zero. 

Is the understanding correct?

Is doing this for saving space?  (Curious -:)


> 
> It does not seem sensible to me that adding a buffer size validation
> primitive to GCC will result in conditions where a size calculation
> will wrap around. I prefer no surprises. :)

Might be a bug here. I guess. 
> 
>> Let me know your comment and suggestions.
> 
> Clang has implemented the safety logic I'd prefer:
> 
> * __bdos will report 0 for any sizing where the "counted_by" count
>  variable is negative. Effectively, the count variable is always
>  processed as: count < 0 ? 0 : count
> 
>  struct foo {
> int count;
> short array[] __counted_by(count);
>  } *p;
> 
>  __bdos(p->array, 1) ==> sizeof(*p->array) * (count < 0 ? 0 : count)

NOTE,  __bdo will use value 0 as UNKNOWN_SIZE for MINMUM SIZE query, i.e:

size_t __builtin_object_size (const void * ptr, int type)

Will return 0 as UNKNOW_SIZE when type= 2 or 3.

So, I am wondering: should  the 0 here is  UNKNOWN_SIZE or 0 size?

I guess should be the UNKNOWN_SIZE?  (I,e, -1 for MAXIMUM type,  0 for MINIMUM 
type).

i.e, when the value of “count” is 0 or negative,  the __bdos will return 
UNKNOWN_SIZE.  Is this correct?

> 
>  The logic for this is that __bdos can be _certain_ that the size is 0
>  when the count variable is pathological.


> 
> * -fsanitize=array-bounds similarly treats count as above, so that:
> 
>  printf("%d\n", p->array[index]); ==> trap when index > (count < 0 ? 0 : 
> count)
> 
>  Same logic for the sanitizer: any access to the array when count is
>  invalid means the access is invalid and must be trapped.

Okay, when the value of “count” is 0 or negative, bound sanitizer will report 
out-of-bound (or trap) for any access to the array. 
This should be reasonable.

Qing


> 
> 
> This means that software can run safely even in pathological conditions.
> 
> -Kees
> 
> -- 
> Kees Cook




Re: [PATCH v4 0/4]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-01-29 Thread Qing Zhao


> On Jan 29, 2024, at 10:50 AM, Martin Uecker  wrote:
> 
> Am Montag, dem 29.01.2024 um 15:09 + schrieb Qing Zhao:
>> Thank you!
>> 
>> Joseph and Richard,  could you also comment on this?
>> 
>>> On Jan 28, 2024, at 5:09 AM, Martin Uecker  wrote:
>>> 
>>> Am Freitag, dem 26.01.2024 um 14:33 + schrieb Qing Zhao:
>>>> 
>>>>> On Jan 26, 2024, at 3:04 AM, Martin Uecker  wrote:
>>>>> 
>>>>> 
>>>>> I haven't looked at the patch, but it sounds you give the result
>>>>> the wrong type. Then patching up all use cases instead of the
>>>>> type seems wrong.
>>>> 
>>>> Yes, this is for resolving a very early gimplification issue as I reported 
>>>> last Nov:
>>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638793.html
>>>> 
>>>> Since no-one responded at that time, I fixed the issue by replacing the 
>>>> ARRAY_REF
>>>> With a pointer indirection:
>>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639605.html
>>>> 
>>>> The reason for such change is:  return a flexible array member TYPE is not 
>>>> allowed
>>>> by C language (our gimplification follows this rule), so, we have to 
>>>> return a pointer TYPE instead. 
>>>> 
>>>> **The new internal function
>>>> 
>>>> .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, SIZE_OF_SIZE, 
>>>> ACCESS_MODE, INDEX)
>>>> 
>>>> INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)
>>>> 
>>>> which returns the "REF_TO_OBJ" same as the 1st argument;
>>>> 
>>>> Both the return type and the type of the first argument of this function 
>>>> have been converted from 
>>>> the incomplete array type to the corresponding pointer type.
>>>> 
>>>> As a result, the original ARRAY_REF was converted to an INDIRECT_REF, the 
>>>> original INDEX of the ARRAY_REF was lost
>>>> when converting from ARRAY_REF to INDIRECT_REF, in order to keep the INDEX 
>>>> for bound sanitizer instrumentation, I added
>>>> The 6th argument “INDEX”.
>>>> 
>>>> What’s your comment and suggestion on this solution?
>>> 
>>> I am not entirely sure but changing types in the FE seems
>>> problematic because this breaks language semantics. And
>>> then adding special code everywhere to treat it specially
>>> in the FE does not seem a good way forward.
>>> 
>>> If I understand correctly, returning an incomplete array 
>>> type is not allowed and then fails during gimplification.
>> 
>> Yes, this is the problem in gimplification. 
>> 
>>> So I would suggest to make it return a pointer to the 
>>> incomplete array (and not the element type)
>> 
>> 
>> for the following:
>> 
>> struct annotated {
>>  unsigned int size;
>>  int array[] __attribute__((counted_by (size)));
>> };
>> 
>>  struct annotated * p = ….
>>  p->array[9] = 0;
>> 
>> The IL for the above array reference p->array[9] is:
>> 
>> 1. If the return type is the original incomplete array type, 
>> 
>> .ACCESS_WITH_SIZE ((int *) >array, >size, 1, 32, -1)[9] = 0;
>> 
>> (this triggered the gimplification failure since the return type cannot be a 
>> complete type).
>> 
>> 2. When the return type is changed to a pointer to the element type of the 
>> incomplete array, (the current patch)
>> Then the original array reference naturally becomes an indirect reference 
>> through the pointer
>> 
>> *(.ACCESS_WITH_SIZE ((int *) >array, >size, 1, 32, -1, 9) + 36) = 0;
>> 
>> Since the original array reference becomes an indirect reference through the 
>> pointer to the element array, the INDEX info 
>> is mixed into the OFFSET of the indirect reference and lost, so, I added the 
>> 6th argument to the routine .ACCESS_WITH_SIZE
>> to record the INDEX. 
>> 
>> 3. With your suggestion, the return type is changed to a pointer to the 
>> incomplete array, 
>> I just tried this to change the result type :
>> 
>> 
>> --- a/gcc/c/c-typeck.cc
>> +++ b/gcc/c/c-typeck.cc
>> @@ -2619,7 +2619,7 @@ build_access_with_size_for_counted_by (location_t loc, 
>> tree ref,
>>   tree counted_by_type)
>> {
>>   gcc_assert (c_flexible

Re: [PATCH v4 0/4]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-01-29 Thread Qing Zhao
An update on the kernel building with my version 4 patch.

Kees reported two FE issues with the current version 4 patch:

1. The operator “typeof” cannot return correct type for a->array;
2. The operator “&” cannot return correct address for a->array;

I fixed both in my local repository. 

With these additional fix.  Kernel with counted-by annotation can be built 
successfully. 

And then, Kees reported one behavioral issue with the current counted-by:

When the counted-by value is below zero, my current patch 

A. Didn’t report any warning for it.
B. Accepted the negative value as a wrapped size.

i.e. for:

struct foo {
signed char size;
unsigned char array[] __counted_by(size);
} *a;

...
a->size = -3;
report(__builtin_dynamic_object_size(p->array, 1));

this reports 253, rather than 0.

And the array-bounds sanitizer doesn’t catch negative index bounds neither. 

a->size = -3;
report(a->array[1]); // does not trap


So, my questions are:

 How should we handle the negative counted-by value?

 My approach is:

   I think that this is a user error, the compiler need to Issue warning during 
runtime about this user error.

Since I have one remaining patch that has not been finished yet:

6  Emit warnings when the user breaks the requirments for the new counted_by 
attribute
  compilation time: -Wcounted-by
  run time: -fsanitizer=counted-by
 * The initialization to the size field should be done before the first 
reference to the FAM field.
 * the array has at least # of elements specified by the size field all the 
time during the program.
 * the value of counted-by should not be negative.

Let me know your comment and suggestions.

Thanks

Qing

> On Jan 25, 2024, at 3:11 PM, Qing Zhao  wrote:
> 
> Thanks a lot for the testing.
> 
> Yes, I can repeat the issue with the following small example:
> 
> #include 
> #include 
> #include 
> 
> #define MAX(a, b)  ((a) > (b) ? (a) :  (b))
> 
> struct untracked {
>   int size;
>   int array[] __attribute__((counted_by (size)));
> } *a;
> struct untracked * alloc_buf (int index)
> {
>  struct untracked *p;
>  p = (struct untracked *) malloc (MAX (sizeof (struct untracked),
>(offsetof (struct untracked, array[0])
> + (index) * sizeof (int;
>  p->size = index;
>  return p;
> }
> 
> int main()
> {
>  a = alloc_buf(10);
> printf ("same_type is %d\n",
>  (__builtin_types_compatible_p(typeof (a->array), typeof (&(a->array)[0];
>  return 0;
> }
> 
> 
> /home/opc/Install/latest-d/bin/gcc -O2 btcp.c
> same_type is 1
> 
> Looks like that the “typeof” operator need to be handled specially in C FE
> for the new internal function .ACCESS_WITH_SIZE. 
> 
> (I have specially handle the operator “offsetof” in C FE already).
> 
> Will fix this issue.
> 
> Thanks.
> 
> Qing
> 
>> On Jan 24, 2024, at 7:51 PM, Kees Cook  wrote:
>> 
>> On Wed, Jan 24, 2024 at 12:29:51AM +, Qing Zhao wrote:
>>> This is the 4th version of the patch.
>> 
>> Thanks very much for this!
>> 
>> I tripped over an unexpected behavioral change that the Linux kernel
>> depends on:
>> 
>> __builtin_types_compatible_p() no longer treats an array marked with
>> counted_by as different from that array's decayed pointer. Specifically,
>> the kernel uses these macros:
>> 
>> 
>> /*
>> * Force a compilation error if condition is true, but also produce a
>> * result (of value 0 and type int), so the expression can be used
>> * e.g. in a structure initializer (or where-ever else comma expressions
>> * aren't permitted).
>> */
>> #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
>> 
>> #define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
>> 
>> /* [0] degrades to a pointer: a different type from an array */
>> #define __must_be_array(a)   BUILD_BUG_ON_ZERO(__same_type((a), &(a)[0]))
>> 
>> 
>> This gets used in various places to make sure we're dealing with an
>> array for a macro:
>> 
>> #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + 
>> __must_be_array(arr))
>> 
>> 
>> So this builds:
>> 
>> struct untracked {
>>   int size;
>>   int array[];
>> } *a;
>> 
>> __must_be_array(a->array)
>> => 0 (as expected)
>> __builtin_types_compatible_p(typeof(a->array), typeof(&(a->array)[0]))
>> => 0 (as expected, array vs decayed array pointer)
>> 
>> 
>> But if counted_by is added, we get a build failure:
>> 
>> struct tracked {
>>   int size;
>>   int array[] __counted_by(size);
>> } *b;
>> 
>> __must_be_array(b->array)
>> => build failure (not expected)
>> __builtin_types_compatible_p(typeof(b->array), typeof(&(b->array)[0]))
>> => 1 (not expected, both pointers?)
>> 
>> 
>> 
>> 
>> -- 
>> Kees Cook
> 



Re: [PATCH v4 0/4]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-01-29 Thread Qing Zhao
Thank you!

Joseph and Richard,  could you also comment on this?

> On Jan 28, 2024, at 5:09 AM, Martin Uecker  wrote:
> 
> Am Freitag, dem 26.01.2024 um 14:33 + schrieb Qing Zhao:
>> 
>>> On Jan 26, 2024, at 3:04 AM, Martin Uecker  wrote:
>>> 
>>> 
>>> I haven't looked at the patch, but it sounds you give the result
>>> the wrong type. Then patching up all use cases instead of the
>>> type seems wrong.
>> 
>> Yes, this is for resolving a very early gimplification issue as I reported 
>> last Nov:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638793.html
>> 
>> Since no-one responded at that time, I fixed the issue by replacing the 
>> ARRAY_REF
>> With a pointer indirection:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639605.html
>> 
>> The reason for such change is:  return a flexible array member TYPE is not 
>> allowed
>> by C language (our gimplification follows this rule), so, we have to return 
>> a pointer TYPE instead. 
>> 
>> **The new internal function
>> 
>> .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, SIZE_OF_SIZE, 
>> ACCESS_MODE, INDEX)
>> 
>> INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)
>> 
>> which returns the "REF_TO_OBJ" same as the 1st argument;
>> 
>> Both the return type and the type of the first argument of this function 
>> have been converted from 
>> the incomplete array type to the corresponding pointer type.
>> 
>> As a result, the original ARRAY_REF was converted to an INDIRECT_REF, the 
>> original INDEX of the ARRAY_REF was lost
>> when converting from ARRAY_REF to INDIRECT_REF, in order to keep the INDEX 
>> for bound sanitizer instrumentation, I added
>> The 6th argument “INDEX”.
>> 
>> What’s your comment and suggestion on this solution?
> 
> I am not entirely sure but changing types in the FE seems
> problematic because this breaks language semantics. And
> then adding special code everywhere to treat it specially
> in the FE does not seem a good way forward.
> 
> If I understand correctly, returning an incomplete array 
> type is not allowed and then fails during gimplification.

Yes, this is the problem in gimplification. 

> So I would suggest to make it return a pointer to the 
> incomplete array (and not the element type)


for the following:

struct annotated {
  unsigned int size;
  int array[] __attribute__((counted_by (size)));
};

  struct annotated * p = ….
  p->array[9] = 0;

The IL for the above array reference p->array[9] is:

1. If the return type is the original incomplete array type, 

.ACCESS_WITH_SIZE ((int *) >array, >size, 1, 32, -1)[9] = 0;

(this triggered the gimplification failure since the return type cannot be a 
complete type).

2. When the return type is changed to a pointer to the element type of the 
incomplete array, (the current patch)
Then the original array reference naturally becomes an indirect reference 
through the pointer

*(.ACCESS_WITH_SIZE ((int *) >array, >size, 1, 32, -1, 9) + 36) = 0;

Since the original array reference becomes an indirect reference through the 
pointer to the element array, the INDEX info 
is mixed into the OFFSET of the indirect reference and lost, so, I added the 
6th argument to the routine .ACCESS_WITH_SIZE
to record the INDEX. 

3. With your suggestion, the return type is changed to a pointer to the 
incomplete array, 
I just tried this to change the result type :


--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -2619,7 +2619,7 @@ build_access_with_size_for_counted_by (location_t loc, 
tree ref,
   tree counted_by_type)
 {
   gcc_assert (c_flexible_array_member_type_p (TREE_TYPE (ref)));
-  tree result_type = build_pointer_type (TREE_TYPE (TREE_TYPE (ref)));
+  tree result_type = build_pointer_type (TREE_TYPE (ref));

Then, I got the following FE errors:

test.c:10:11: error: invalid use of flexible array member
   10 |   p->array[9] = 0;

The reason for the error is: when the original array_ref becomes an 
indirect_ref through the pointer to the incomplete array,
During the computation of the OFFSET to the pointer, the TYPE_SIZE_UNIT (type) 
is invalid since the type is an incomplete array. 
As a result, the OFFSET cannot computed for the indirect_ref.

Looks like even more issues with this approach.


> but then wrap
> it with an indirection when inserting this code in the FE
> so that the full replacement has the correct type again
> (of the incomplete array).

I don’t quite understand the above, could you please explain this in more 
details? (If possible, could you please use the above small example?)
thanks.

> 
> 
> Alternatively, on

Re: [PATCH v4 0/4]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-01-26 Thread Qing Zhao


> On Jan 26, 2024, at 3:04 AM, Martin Uecker  wrote:
> 
> 
> I haven't looked at the patch, but it sounds you give the result
> the wrong type. Then patching up all use cases instead of the
> type seems wrong.

Yes, this is for resolving a very early gimplification issue as I reported last 
Nov:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638793.html

Since no-one responded at that time, I fixed the issue by replacing the 
ARRAY_REF
With a pointer indirection:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639605.html

The reason for such change is:  return a flexible array member TYPE is not 
allowed
by C language (our gimplification follows this rule), so, we have to return a 
pointer TYPE instead. 

**The new internal function

 .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, SIZE_OF_SIZE, 
ACCESS_MODE, INDEX)

INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)

which returns the "REF_TO_OBJ" same as the 1st argument;

Both the return type and the type of the first argument of this function have 
been converted from 
the incomplete array type to the corresponding pointer type.

As a result, the original ARRAY_REF was converted to an INDIRECT_REF, the 
original INDEX of the ARRAY_REF was lost
when converting from ARRAY_REF to INDIRECT_REF, in order to keep the INDEX for 
bound sanitizer instrumentation, I added
The 6th argument “INDEX”.

What’s your comment and suggestion on this solution?

Thanks.

Qing


> 
> Martin
> 
> 
> Am Donnerstag, dem 25.01.2024 um 20:11 + schrieb Qing Zhao:
>> Thanks a lot for the testing.
>> 
>> Yes, I can repeat the issue with the following small example:
>> 
>> #include 
>> #include 
>> #include 
>> 
>> #define MAX(a, b)  ((a) > (b) ? (a) :  (b))
>> 
>> struct untracked {
>>   int size;
>>   int array[] __attribute__((counted_by (size)));
>> } *a;
>> struct untracked * alloc_buf (int index)
>> {
>>  struct untracked *p;
>>  p = (struct untracked *) malloc (MAX (sizeof (struct untracked),
>>(offsetof (struct untracked, array[0])
>> + (index) * sizeof (int;
>>  p->size = index;
>>  return p;
>> }
>> 
>> int main()
>> {
>>  a = alloc_buf(10);
>> printf ("same_type is %d\n",
>>  (__builtin_types_compatible_p(typeof (a->array), typeof (&(a->array)[0];
>>  return 0;
>> }
>> 
>> 
>> /home/opc/Install/latest-d/bin/gcc -O2 btcp.c
>> same_type is 1
>> 
>> Looks like that the “typeof” operator need to be handled specially in C FE
>> for the new internal function .ACCESS_WITH_SIZE. 
>> 
>> (I have specially handle the operator “offsetof” in C FE already).
>> 
>> Will fix this issue.
>> 
>> Thanks.
>> 
>> Qing
>> 
>>> On Jan 24, 2024, at 7:51 PM, Kees Cook  wrote:
>>> 
>>> On Wed, Jan 24, 2024 at 12:29:51AM +, Qing Zhao wrote:
>>>> This is the 4th version of the patch.
>>> 
>>> Thanks very much for this!
>>> 
>>> I tripped over an unexpected behavioral change that the Linux kernel
>>> depends on:
>>> 
>>> __builtin_types_compatible_p() no longer treats an array marked with
>>> counted_by as different from that array's decayed pointer. Specifically,
>>> the kernel uses these macros:
>>> 
>>> 
>>> /*
>>> * Force a compilation error if condition is true, but also produce a
>>> * result (of value 0 and type int), so the expression can be used
>>> * e.g. in a structure initializer (or where-ever else comma expressions
>>> * aren't permitted).
>>> */
>>> #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
>>> 
>>> #define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
>>> 
>>> /* [0] degrades to a pointer: a different type from an array */
>>> #define __must_be_array(a)   BUILD_BUG_ON_ZERO(__same_type((a), &(a)[0]))
>>> 
>>> 
>>> This gets used in various places to make sure we're dealing with an
>>> array for a macro:
>>> 
>>> #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + 
>>> __must_be_array(arr))
>>> 
>>> 
>>> So this builds:
>>> 
>>> struct untracked {
>>>   int size;
>>>   int array[];
>>> } *a;
>>> 
>>> __must_be_array(a->array)
>>> => 0 (as expected)
>>> __builtin_types_compatible_p(typeof(a->array), typeof(&(a->array)[0]))
>>> => 0 (as expected, array vs decayed array pointer)
>>> 
>>> 
>>> But if counted_by is added, we get a build failure:
>>> 
>>> struct tracked {
>>>   int size;
>>>   int array[] __counted_by(size);
>>> } *b;
>>> 
>>> __must_be_array(b->array)
>>> => build failure (not expected)
>>> __builtin_types_compatible_p(typeof(b->array), typeof(&(b->array)[0]))
>>> => 1 (not expected, both pointers?)
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Kees Cook
>> 
> 



Re: [PATCH v4 0/4]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-01-25 Thread Qing Zhao
Thanks a lot for the testing.

Yes, I can repeat the issue with the following small example:

#include 
#include 
#include 

#define MAX(a, b)  ((a) > (b) ? (a) :  (b))

struct untracked {
   int size;
   int array[] __attribute__((counted_by (size)));
} *a;
struct untracked * alloc_buf (int index)
{
  struct untracked *p;
  p = (struct untracked *) malloc (MAX (sizeof (struct untracked),
(offsetof (struct untracked, array[0])
 + (index) * sizeof (int;
  p->size = index;
  return p;
}

int main()
{
  a = alloc_buf(10);
 printf ("same_type is %d\n",
  (__builtin_types_compatible_p(typeof (a->array), typeof (&(a->array)[0];
  return 0;
}


/home/opc/Install/latest-d/bin/gcc -O2 btcp.c
same_type is 1

Looks like that the “typeof” operator need to be handled specially in C FE
 for the new internal function .ACCESS_WITH_SIZE. 

(I have specially handle the operator “offsetof” in C FE already).

Will fix this issue.

Thanks.

Qing

> On Jan 24, 2024, at 7:51 PM, Kees Cook  wrote:
> 
> On Wed, Jan 24, 2024 at 12:29:51AM +, Qing Zhao wrote:
>> This is the 4th version of the patch.
> 
> Thanks very much for this!
> 
> I tripped over an unexpected behavioral change that the Linux kernel
> depends on:
> 
> __builtin_types_compatible_p() no longer treats an array marked with
> counted_by as different from that array's decayed pointer. Specifically,
> the kernel uses these macros:
> 
> 
> /*
> * Force a compilation error if condition is true, but also produce a
> * result (of value 0 and type int), so the expression can be used
> * e.g. in a structure initializer (or where-ever else comma expressions
> * aren't permitted).
> */
> #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
> 
> #define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
> 
> /* [0] degrades to a pointer: a different type from an array */
> #define __must_be_array(a)   BUILD_BUG_ON_ZERO(__same_type((a), &(a)[0]))
> 
> 
> This gets used in various places to make sure we're dealing with an
> array for a macro:
> 
> #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + 
> __must_be_array(arr))
> 
> 
> So this builds:
> 
> struct untracked {
>int size;
>int array[];
> } *a;
> 
> __must_be_array(a->array)
> => 0 (as expected)
> __builtin_types_compatible_p(typeof(a->array), typeof(&(a->array)[0]))
> => 0 (as expected, array vs decayed array pointer)
> 
> 
> But if counted_by is added, we get a build failure:
> 
> struct tracked {
>int size;
>int array[] __counted_by(size);
> } *b;
> 
> __must_be_array(b->array)
> => build failure (not expected)
> __builtin_types_compatible_p(typeof(b->array), typeof(&(b->array)[0]))
> => 1 (not expected, both pointers?)
> 
> 
> 
> 
> -- 
> Kees Cook



[PATCH v4 3/4] Use the .ACCESS_WITH_SIZE in builtin object size.

2024-01-23 Thread Qing Zhao
gcc/ChangeLog:

* tree-object-size.cc (access_with_size_object_size): New function.
(call_object_size): Call the new function.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-object-size-common.h: Add a new macro EXPECT.
* gcc.dg/flex-array-counted-by-3.c: New test.
* gcc.dg/flex-array-counted-by-4.c: New test.
---
 .../gcc.dg/builtin-object-size-common.h   |  11 ++
 .../gcc.dg/flex-array-counted-by-3.c  |  63 +++
 .../gcc.dg/flex-array-counted-by-4.c  | 178 ++
 gcc/tree-object-size.cc   |  47 +
 4 files changed, 299 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-4.c

diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-common.h 
b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
index 66ff7cdd953a..b677067c6e6b 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-common.h
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
@@ -30,3 +30,14 @@ unsigned nfails = 0;
   __builtin_abort ();\
 return 0;\
   } while (0)
+
+#define EXPECT(p, _v) do {   \
+  size_t v = _v; \
+  if (p == v)\
+__builtin_printf ("ok:  %s == %zd\n", #p, p);\
+  else   \
+{\
+  __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v);\
+  FAIL ();   \
+}\
+} while (0);
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
new file mode 100644
index ..0066c32ca808
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
@@ -0,0 +1,63 @@
+/* test the attribute counted_by and its usage in
+ * __builtin_dynamic_object_size.  */ 
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "builtin-object-size-common.h"
+
+struct flex {
+  int b;
+  int c[];
+} *array_flex;
+
+struct annotated {
+  int b;
+  int c[] __attribute__ ((counted_by (b)));
+} *array_annotated;
+
+struct nested_annotated {
+  struct {
+union {
+  int b;
+  float f; 
+};
+int n;
+  };
+  int c[] __attribute__ ((counted_by (b)));
+} *array_nested_annotated;
+
+void __attribute__((__noinline__)) setup (int normal_count, int attr_count)
+{
+  array_flex
+= (struct flex *)malloc (sizeof (struct flex)
++ normal_count *  sizeof (int));
+  array_flex->b = normal_count;
+
+  array_annotated
+= (struct annotated *)malloc (sizeof (struct annotated)
+ + attr_count *  sizeof (int));
+  array_annotated->b = attr_count;
+
+  array_nested_annotated
+= (struct nested_annotated *)malloc (sizeof (struct nested_annotated)
++ attr_count *  sizeof (int));
+  array_nested_annotated->b = attr_count;
+
+  return;
+}
+
+void __attribute__((__noinline__)) test ()
+{
+EXPECT(__builtin_dynamic_object_size(array_flex->c, 1), -1);
+EXPECT(__builtin_dynamic_object_size(array_annotated->c, 1),
+  array_annotated->b * sizeof (int));
+EXPECT(__builtin_dynamic_object_size(array_nested_annotated->c, 1),
+  array_nested_annotated->b * sizeof (int));
+}
+
+int main(int argc, char *argv[])
+{
+  setup (10,10);   
+  test ();
+  DONE ();
+}
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
new file mode 100644
index ..3ce7f3545549
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
@@ -0,0 +1,178 @@
+/* test the attribute counted_by and its usage in
+__builtin_dynamic_object_size: what's the correct behavior when the
+allocation size mismatched with the value of counted_by attribute?
+we should always use the latest value that is hold by the counted_by
+field.  */
+/* { dg-do run } */
+/* { dg-options "-O -fstrict-flex-arrays=3" } */
+
+#include "builtin-object-size-common.h"
+
+struct annotated {
+  size_t foo;
+  char others;
+  char array[] __attribute__((counted_by (foo)));
+};
+
+#define noinline __attribute__((__noinline__))
+#define SIZE_BUMP 10 
+#define MAX(a, b) ((a) > (b) ? (a) : (b))
+
+/* In general, Due to type casting, the type for the pointee of a pointer
+   does not say anything about the object it points to,
+   So, __builtin_object_size can not directly use the type of the pointee
+   to decide the size of the object the pointer points to.
+
+   

[PATCH v4 4/4] Use the .ACCESS_WITH_SIZE in bound sanitizer.

2024-01-23 Thread Qing Zhao
Since the result type of the call to .ACCESS_WITH_SIZE is a pointer to
the element type. The original array_ref is converted to an indirect_ref,
which introduced two issues for the instrumenation of bound sanitizer:

A. The index for the original array_ref was mixed into the offset
expression for the new indirect_ref.

In order to make the instrumentation for the bound sanitizer easier, one
more argument for the function .ACCESS_WITH_SIZE is added to record this
original index for the array_ref. then later during bound instrumentation,
get the index from the additional argument from the call to the function
.ACCESS_WITH_SIZE.

B. In the current bound sanitizer, no instrumentation will be inserted for
an indirect_ref.

In order to add instrumentation for an indirect_ref with a call to
.ACCESS_WITH_SIZE, we should specially handle the indirect_ref with a
call to .ACCESS_WITH_SIZE, and get the index and bound info from the
arguments of the call.

gcc/c-family/ChangeLog:

* c-gimplify.cc (ubsan_walk_array_refs_r): Instrument indirect_ref.
* c-ubsan.cc (get_bound_from_access_with_size): New function.
(ubsan_instrument_bounds_indirect_ref): New function.
(ubsan_indirect_ref_instrumented_p): New function.
(ubsan_maybe_instrument_indirect_ref): New function.
* c-ubsan.h (ubsan_maybe_instrument_indirect_ref): New prototype.

gcc/c/ChangeLog:

* c-typeck.cc (build_counted_by_ref): Minor style fix.
(build_access_with_size_for_counted_by): Add one more argument.
(build_array_ref): Set the 5th argument of a call to .ACCESS_WITH_SIZE
to the index.

gcc/testsuite/ChangeLog:

* gcc.dg/ubsan/flex-array-counted-by-bounds-2.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds.c: New test.
---
 gcc/c-family/c-gimplify.cc|   2 +
 gcc/c-family/c-ubsan.cc   | 130 ++
 gcc/c-family/c-ubsan.h|   1 +
 gcc/c/c-typeck.cc |  14 +-
 .../ubsan/flex-array-counted-by-bounds-2.c|  45 ++
 .../ubsan/flex-array-counted-by-bounds.c  |  46 +++
 6 files changed, 235 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c

diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
index 494da49791d5..25a3ca1a9a99 100644
--- a/gcc/c-family/c-gimplify.cc
+++ b/gcc/c-family/c-gimplify.cc
@@ -108,6 +108,8 @@ ubsan_walk_array_refs_r (tree *tp, int *walk_subtrees, void 
*data)
 }
   else if (TREE_CODE (*tp) == ARRAY_REF)
 ubsan_maybe_instrument_array_ref (tp, false);
+  else if (TREE_CODE (*tp) == INDIRECT_REF)
+ubsan_maybe_instrument_indirect_ref (tp);
   else if (TREE_CODE (*tp) == MODIFY_EXPR)
 {
   /* Since r7-1900, we gimplify RHS before LHS.  Consider
diff --git a/gcc/c-family/c-ubsan.cc b/gcc/c-family/c-ubsan.cc
index 940982819ddf..7bb6464eb5b5 100644
--- a/gcc/c-family/c-ubsan.cc
+++ b/gcc/c-family/c-ubsan.cc
@@ -376,6 +376,7 @@ ubsan_instrument_return (location_t loc)
   return build_call_expr_loc (loc, t, 1, build_fold_addr_expr_loc (loc, data));
 }
 
+
 /* Instrument array bounds for ARRAY_REFs.  We create special builtin,
that gets expanded in the sanopt pass, and make an array dimension
of it.  ARRAY is the array, *INDEX is an index to the array.
@@ -501,6 +502,68 @@ ubsan_instrument_bounds (location_t loc, tree array, tree 
*index,
   *index, bound);
 }
 
+/* Get the tree that represented the number of counted_by, i.e, the maximum
+   number of the elements of the object that the call to .ACCESS_WITH_SIZE
+   points to, this number will be the bound of the corresponding array.  */
+static tree
+get_bound_from_access_with_size (tree call)
+{
+  if (!is_access_with_size_p (call))
+return NULL_TREE;
+
+  tree ref_to_size = CALL_EXPR_ARG (call, 1);
+  unsigned int type_of_size = TREE_INT_CST_LOW (CALL_EXPR_ARG (call, 2));
+  unsigned int prec_of_size = TREE_INT_CST_LOW (CALL_EXPR_ARG (call, 3));
+  tree type = build_nonstandard_integer_type (prec_of_size, 1);
+  tree size = fold_build2 (MEM_REF, type, unshare_expr (ref_to_size),
+  build_int_cst (ptr_type_node, 0));
+  /* Only when type_of_size is 1,i.e, the number of the elements of
+ the object type, return the size.  */
+  if (type_of_size != 1)
+return NULL_TREE;
+  else
+size = fold_convert (sizetype, size);
+
+  return size;
+}
+
+/* Instrument array bounds for INDIRECT_REFs whose pointers are
+   POINTER_PLUS_EXPRs of calls to .ACCESS_WITH_SIZE.  We create special
+   builtins that gets expanded in the sanopt pass, and make an array
+   dimension of it.  ARRAY is the pointer to the base of the array,
+   which is a call to .ACCESS_WITH_SIZE.
+   We get the INDEX from the 6th argument of the call to .ACCESS_WITH_SIZE
+   

[PATCH v4 1/4] Provide counted_by attribute to flexible array member field (PR108896)

2024-01-23 Thread Qing Zhao
'counted_by (COUNT)'
 The 'counted_by' attribute may be attached to the C99 flexible
 array member of a structure.  It indicates that the number of the
 elements of the array is given by the field named "COUNT" in the
 same structure as the flexible array member.  GCC uses this
 information to improve the results of the array bound sanitizer and
 the '__builtin_dynamic_object_size'.

 For instance, the following code:

  struct P {
size_t count;
char other;
char array[] __attribute__ ((counted_by (count)));
  } *p;

 specifies that the 'array' is a flexible array member whose number
 of elements is given by the field 'count' in the same structure.

 The field that represents the number of the elements should have an
 integer type.  Otherwise, the compiler will report a warning and
 ignore the attribute.

 An explicit 'counted_by' annotation defines a relationship between
 two objects, 'p->array' and 'p->count', and there are the following
 requirementthat on the relationship between this pair:

* 'p->count' should be initialized before the first reference to
  'p->array';

* 'p->array' has _at least_ 'p->count' number of elements
  available all the time.  This relationship must hold even
  after any of these related objects are updated during the
  program.

 It's the user's responsibility to make sure the above requirements
 to be kept all the time.  Otherwise the compiler will report
 warnings, at the same time, the results of the array bound
 sanitizer and the '__builtin_dynamic_object_size' is undefined.

 One important feature of the attribute is, a reference to the
 flexible array member field will use the latest value assigned to
 the field that represents the number of the elements before that
 reference.  For example,

p->count = val1;
p->array[20] = 0;  // ref1 to p->array
p->count = val2;
p->array[30] = 0;  // ref2 to p->array

 in the above, 'ref1' will use 'val1' as the number of the elements
 in 'p->array', and 'ref2' will use 'val2' as the number of elements
 in 'p->array'.

gcc/c-family/ChangeLog:

PR C/108896
* c-attribs.cc (handle_counted_by_attribute): New function.
(attribute_takes_identifier_p): Add counted_by attribute to the list.
* c-common.cc (c_flexible_array_member_type_p): ...To this.
* c-common.h (c_flexible_array_member_type_p): New prototype.

gcc/c/ChangeLog:

PR C/108896
* c-decl.cc (flexible_array_member_type_p): Renamed and moved to...
(add_flexible_array_elts_to_size): Use renamed function.
(is_flexible_array_member_p): Use renamed function.
(verify_counted_by_attribute): New function.
(finish_struct): Use renamed function and verify counted_by
attribute.
* c-tree.h (lookup_field): New prototype.
* c-typeck.cc (lookup_field): Expose as extern function.

gcc/ChangeLog:

PR C/108896
* doc/extend.texi: Document attribute counted_by.

gcc/testsuite/ChangeLog:

PR C/108896
* gcc.dg/flex-array-counted-by.c: New test.
---
 gcc/c-family/c-attribs.cc| 54 -
 gcc/c-family/c-common.cc | 13 +++
 gcc/c-family/c-common.h  |  1 +
 gcc/c/c-decl.cc  | 85 
 gcc/c/c-tree.h   |  1 +
 gcc/c/c-typeck.cc|  3 +-
 gcc/doc/extend.texi  | 62 ++
 gcc/testsuite/gcc.dg/flex-array-counted-by.c | 40 +
 8 files changed, 239 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by.c

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 40a0cf90295d..4395c0656b14 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -105,6 +105,8 @@ static tree handle_warn_if_not_aligned_attribute (tree *, 
tree, tree,
  int, bool *);
 static tree handle_strict_flex_array_attribute (tree *, tree, tree,
 int, bool *);
+static tree handle_counted_by_attribute (tree *, tree, tree,
+  int, bool *);
 static tree handle_weak_attribute (tree *, tree, tree, int, bool *) ;
 static tree handle_noplt_attribute (tree *, tree, tree, int, bool *) ;
 static tree handle_alias_ifunc_attribute (bool, tree *, tree, tree, bool *);
@@ -412,6 +414,8 @@ const struct attribute_spec c_common_gnu_attributes[] =
  handle_warn_if_not_aligned_attribute, NULL },
   { "strict_flex_array",  1, 1, true, false, false, false,
  handle_strict_flex_array_attribute, NULL },
+  { 

[PATCH v4 2/4] Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.

2024-01-23 Thread Qing Zhao
Including the following changes:
* The definition of the new internal function .ACCESS_WITH_SIZE
  in internal-fn.def.
* C FE converts every reference to a FAM with a "counted_by" attribute
  to a call to the internal function .ACCESS_WITH_SIZE.
  (build_component_ref in c_typeck.cc)

  This includes the case when the object is statically allocated and
  initialized.
  In order to make this working, the routines initializer_constant_valid_p_1
  and output_constant in varasm.cc are updated to handle calls to
  .ACCESS_WITH_SIZE.
  (initializer_constant_valid_p_1 and output_constant in varasm.c)

  However, for the reference inside "offsetof", the "counted_by" attribute is
  ignored since it's not useful at all.
  (c_parser_postfix_expression in c/c-parser.cc)
* Convert every call to .ACCESS_WITH_SIZE to its first argument.
  (expand_ACCESS_WITH_SIZE in internal-fn.cc)
* Adjust alias analysis to exclude the new internal from clobbering anything.
  (ref_maybe_used_by_call_p_1 and call_may_clobber_ref_p_1 in tree-ssa-alias.cc)
* Adjust dead code elimination to eliminate the call to .ACCESS_WITH_SIZE when
  it's LHS is eliminated as dead code.
  (eliminate_unnecessary_stmts in tree-ssa-dce.cc)
* Provide the utility routines to check the call is .ACCESS_WITH_SIZE and
  get the reference from the call to .ACCESS_WITH_SIZE.
  (is_access_with_size_p and get_ref_from_access_with_size in tree.cc)

gcc/c/ChangeLog:

* c-parser.cc (c_parser_postfix_expression): Ignore the counted-by
attribute when build_component_ref inside offsetof operator.
* c-tree.h (build_component_ref): Add one more parameter.
* c-typeck.cc (build_counted_by_ref): New function.
(build_access_with_size_for_counted_by): New function.
(build_component_ref): Check the counted-by attribute and build
call to .ACCESS_WITH_SIZE.

gcc/ChangeLog:

* internal-fn.cc (expand_ACCESS_WITH_SIZE): New function.
* internal-fn.def (ACCESS_WITH_SIZE): New internal function.
* tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Special case
IFN_ACCESS_WITH_SIZE.
(call_may_clobber_ref_p_1): Special case IFN_ACCESS_WITH_SIZE.
* tree-ssa-dce.cc (eliminate_unnecessary_stmts): Eliminate the call
to .ACCESS_WITH_SIZE when its LHS is dead.
* tree.cc (process_call_operands): Adjust side effect for function
.ACCESS_WITH_SIZE.
(is_access_with_size_p): New function.
(get_ref_from_access_with_size): New function.
* tree.h (is_access_with_size_p): New prototype.
(get_ref_from_access_with_size): New prototype.
* varasm.cc (initializer_constant_valid_p_1): Handle call to
.ACCESS_WITH_SIZE.
(output_constant): Handle call to .ACCESS_WITH_SIZE.

gcc/testsuite/ChangeLog:

* gcc.dg/flex-array-counted-by-2.c: New test.
---
 gcc/c/c-parser.cc |  10 +-
 gcc/c/c-tree.h|   2 +-
 gcc/c/c-typeck.cc | 108 +-
 gcc/internal-fn.cc|  35 ++
 gcc/internal-fn.def   |   4 +
 .../gcc.dg/flex-array-counted-by-2.c  |  94 +++
 gcc/tree-ssa-alias.cc |   2 +
 gcc/tree-ssa-dce.cc   |   5 +-
 gcc/tree.cc   |  25 +++-
 gcc/tree.h|   8 ++
 gcc/varasm.cc |  10 ++
 11 files changed, 294 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index c31349dae2ff..a6ed5ac43bb1 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -10850,9 +10850,12 @@ c_parser_postfix_expression (c_parser *parser)
if (c_parser_next_token_is (parser, CPP_NAME))
  {
c_token *comp_tok = c_parser_peek_token (parser);
+   /* Ignore the counted_by attribute for reference inside
+  offsetof since the information is not useful at all.  */
offsetof_ref
  = build_component_ref (loc, offsetof_ref, comp_tok->value,
-comp_tok->location, UNKNOWN_LOCATION);
+comp_tok->location, UNKNOWN_LOCATION,
+false);
c_parser_consume_token (parser);
while (c_parser_next_token_is (parser, CPP_DOT)
   || c_parser_next_token_is (parser,
@@ -10879,11 +10882,14 @@ c_parser_postfix_expression (c_parser *parser)
break;
  }
c_token *comp_tok = c_parser_peek_token (parser);
+   /* Ignore the counted_by attribute for reference inside
+  offsetof since the information is not useful.  

[PATCH v4 0/4]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-01-23 Thread Qing Zhao
Hi,

This is the 4th version of the patch.

It based on the following proposal:

https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635884.html
Represent the missing dependence for the "counted_by" attribute and its 
consumers

**The summary of the proposal is:

* Add a new internal function ".ACCESS_WITH_SIZE" to carry the size information 
for every reference to a FAM field;
* In C FE, Replace every reference to a FAM field whose TYPE has the 
"counted_by" attribute with the new internal function ".ACCESS_WITH_SIZE";
* In every consumer of the size information, for example, BDOS or array bound 
sanitizer, query the size information or ACCESS_MODE information from the new 
internal function;
* When expansing to RTL, replace the internal function with the actual 
reference to the FAM field;
* Some adjustment to ipa alias analysis, and other SSA passes to mitigate the 
impact to the optimizer and code generation.


**The new internal function

  .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, SIZE_OF_SIZE, 
ACCESS_MODE, INDEX)

INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)

which returns the "REF_TO_OBJ" same as the 1st argument;

Both the return type and the type of the first argument of this function have 
been converted from the incomplete array type to the corresponding pointer type.

Please see the following link for why:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638793.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639605.html

1st argument "REF_TO_OBJ": The reference to the object;
2nd argument "REF_TO_SIZE": The reference to the size of the object,
3rd argument "CLASS_OF_SIZE": The size referenced by the REF_TO_SIZE represents
   0: unknown;
   1: the number of the elements of the object type;
   2: the number of bytes;
4th argument "PRECISION_OF_SIZE": The precision of the integer that REF_TO_SIZE 
points;
5th argument "ACCESS_MODE":
  -1: Unknown access semantics
   0: none
   1: read_only
   2: write_only
   3: read_write
6th argument "INDEX": the INDEX for the original array reference.
  -1: Unknown

NOTE: The 6th Argument is added for bound sanitizer instrumentation.

** The Patch sets included:

1. Provide counted_by attribute to flexible array member field;
  which includes:
  * "counted_by" attribute documentation;
  * C FE handling of the new attribute;
syntax checking, error reporting;
  * testing cases;

2. Convert "counted_by" attribute to/from .ACCESS_WITH_SIZE.
  which includes:
  * The definition of the new internal function .ACCESS_WITH_SIZE in 
internal-fn.def.
  * C FE converts every reference to a FAM with "counted_by" attribute to a 
call to the internal function .ACCESS_WITH_SIZE.
(build_component_ref in c_typeck.cc)
This includes the case when the object is statically allocated and 
initialized.
In order to make this working, we should update 
initializer_constant_valid_p_1 and output_constant in varasm.cc to include 
calls to .ACCESS_WITH_SIZE.

However, for the reference inside "offsetof", ignore the "counted_by" 
attribute since it's not useful at all. (c_parser_postfix_expression in 
c/c-parser.cc)

  * Convert every call to .ACCESS_WITH_SIZE to its first argument.
(expand_ACCESS_WITH_SIZE in internal-fn.cc)
  * adjust alias analysis to exclude the new internal from clobbering 
anything.
(ref_maybe_used_by_call_p_1 and call_may_clobber_ref_p_1 in 
tree-ssa-alias.cc)
  * adjust dead code elimination to eliminate the call to .ACCESS_WITH_SIZE 
when
it's LHS is eliminated as dead code.
(eliminate_unnecessary_stmts in tree-ssa-dce.cc)
  * Provide the utility routines to check the call is .ACCESS_WITH_SIZE and
get the reference from the call to .ACCESS_WITH_SIZE.
(is_access_with_size_p and get_ref_from_access_with_size in tree.cc)
  * testing cases. (for offsetof, static initialization, generation of 
calls to
.ACCESS_WITH_SIZE, code runs correctly after calls to .ACCESS_WITH_SIZE 
are
converted back)

3. Use the .ACCESS_WITH_SIZE in builtin object size (sub-object only)
  which includes:
  * use the size info of the .ACCESS_WITH_SIZE for sub-object.
  * testing cases. 

4 Use the .ACCESS_WITH_SIZE in bound sanitizer
  Since the result type of the call to .ACCESS_WITH_SIZE is a pointer to
the element type. The original array_ref is converted to an 
indirect_ref,
which introduced two issues for the instrumenation of bound sanitizer:

A. The index for the original array_ref was mixed into the offset
expression for the new indirect_ref.

In order to make the instrumentation for the bound sanitizer easier, one
more argument for the function .ACCESS_WITH_SIZE is added to record this
original index for the array_ref. then later during bound 
instrumentation,
get the index from the 

Re: HELP: Questions on unshare_expr

2024-01-22 Thread Qing Zhao
One update, last Friday, I merged all my patches for counted-by support 
(including the Patch to workaround the LTO issue)  with the latest trunk, 
bootstrapped
 and run the testing, everything is good.

Today, when I disabled the Patch that workaround the LTO issue, surprisingly, I 
cannot
repeat the LTO issue anymore with the latest trunk + my counted-by support 
patches.
I.e., without the LTO workaround, everything works just fine with the latest 
gcc.

I suspected that some change in the latest GCC “fixed” (or hide) the issue. 

Qing

> On Jan 22, 2024, at 9:52 AM, Qing Zhao  wrote:
> 
> 
> 
>> On Jan 22, 2024, at 2:40 AM, Richard Biener  
>> wrote:
>> 
>> On Fri, Jan 19, 2024 at 5:26 PM Qing Zhao  wrote:
>>> 
>>> 
>>> 
>>>> On Jan 19, 2024, at 4:30 AM, Richard Biener  
>>>> wrote:
>>>> 
>>>> On Thu, Jan 18, 2024 at 3:46 PM Qing Zhao  wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Jan 17, 2024, at 1:43 AM, Richard Biener  
>>>>>> wrote:
>>>>>> 
>>>>>> On Wed, Jan 17, 2024 at 7:42 AM Richard Biener
>>>>>>  wrote:
>>>>>>> 
>>>>>>> On Tue, Jan 16, 2024 at 9:26 PM Qing Zhao  wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Jan 15, 2024, at 4:31 AM, Richard Biener 
>>>>>>>>>  wrote:
>>>>>>>>> 
>>>>>>>>>> All my questions for unshare_expr relate to a  LTO bug that I 
>>>>>>>>>> currently stuck with
>>>>>>>>>> when using .ACCESS_WITH_SIZE in bound sanitizer (only with -flto, 
>>>>>>>>>> without -flto, no issue):
>>>>>>>>>> 
>>>>>>>>>> [opc@qinzhao-aarch64-ol8 gcc]$ sh t
>>>>>>>>>> during IPA pass: modref
>>>>>>>>>> t.c:20:1: internal compiler error: tree code ‘ssa_name’ is not 
>>>>>>>>>> supported in LTO streams
>>>>>>>>>> 0x14c3993 lto_write_tree
>>>>>>>>>>../../latest-gcc-write/gcc/lto-streamer-out.cc:561
>>>>>>>>>> 0x14c3aeb lto_output_tree_1
>>>>>>>>>> 
>>>>>>>>>> And the value of the tree node that triggered the ICE is:
>>>>>>>>>> (gdb) call debug_tree(expr)
>>>>>>>>>> 
>>>>>>>>>> nothrow
>>>>>>>>>> def_stmt
>>>>>>>>>> version:13 in-free-list>
>>>>>>>>>> 
>>>>>>>>>> Is there any good way to debug LTO bug?
>>>>>>>>> 
>>>>>>>>> This happens usually when you have a VLA type and its type fields are 
>>>>>>>>> not
>>>>>>>>> properly gimplified which usually happens because the frontend fails 
>>>>>>>>> to
>>>>>>>>> insert a gimplification point for it (a DECL_EXPR).
>>>>>>>> 
>>>>>>>> I found an old gcc bug
>>>>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97172
>>>>>>>> ICE: tree code ‘ssa_name’ is not supported in LTO streams since 
>>>>>>>> r11-3303-g6450f07388f9fe57
>>>>>>>> 
>>>>>>>> Which is very similar to the bug I am having right now.
>>>>>>>> 
>>>>>>>> After further study, I suspect that the issue I am having right now 
>>>>>>>> with the LTO streaming also
>>>>>>>> relate to “unshare_expr”, “save_expr”, and the combination of these 
>>>>>>>> two, I suspect that
>>>>>>>> the current gcc cannot handle the combination of these two correctly 
>>>>>>>> for my case.
>>>>>>>> 
>>>>>>>> My testing case is:
>>>>>>>> 
>>>>>>>> #include 
>>>>>>>> void __attribute__((__noinline__)) setup_and_test_vla (int n1, int n2, 
>>>>>>>> int m)
>>>>>>>> {
>>>>>>>> struct foo {
>>>>>>>>int n;
>>>>>>>>int p[][n2][n1] __attribute__((co

Re: HELP: Questions on unshare_expr

2024-01-22 Thread Qing Zhao


> On Jan 22, 2024, at 2:40 AM, Richard Biener  
> wrote:
> 
> On Fri, Jan 19, 2024 at 5:26 PM Qing Zhao  wrote:
>> 
>> 
>> 
>>> On Jan 19, 2024, at 4:30 AM, Richard Biener  
>>> wrote:
>>> 
>>> On Thu, Jan 18, 2024 at 3:46 PM Qing Zhao  wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Jan 17, 2024, at 1:43 AM, Richard Biener  
>>>>> wrote:
>>>>> 
>>>>> On Wed, Jan 17, 2024 at 7:42 AM Richard Biener
>>>>>  wrote:
>>>>>> 
>>>>>> On Tue, Jan 16, 2024 at 9:26 PM Qing Zhao  wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Jan 15, 2024, at 4:31 AM, Richard Biener 
>>>>>>>>  wrote:
>>>>>>>> 
>>>>>>>>> All my questions for unshare_expr relate to a  LTO bug that I 
>>>>>>>>> currently stuck with
>>>>>>>>> when using .ACCESS_WITH_SIZE in bound sanitizer (only with -flto, 
>>>>>>>>> without -flto, no issue):
>>>>>>>>> 
>>>>>>>>> [opc@qinzhao-aarch64-ol8 gcc]$ sh t
>>>>>>>>> during IPA pass: modref
>>>>>>>>> t.c:20:1: internal compiler error: tree code ‘ssa_name’ is not 
>>>>>>>>> supported in LTO streams
>>>>>>>>> 0x14c3993 lto_write_tree
>>>>>>>>> ../../latest-gcc-write/gcc/lto-streamer-out.cc:561
>>>>>>>>> 0x14c3aeb lto_output_tree_1
>>>>>>>>> 
>>>>>>>>> And the value of the tree node that triggered the ICE is:
>>>>>>>>> (gdb) call debug_tree(expr)
>>>>>>>>> 
>>>>>>>>> nothrow
>>>>>>>>> def_stmt
>>>>>>>>> version:13 in-free-list>
>>>>>>>>> 
>>>>>>>>> Is there any good way to debug LTO bug?
>>>>>>>> 
>>>>>>>> This happens usually when you have a VLA type and its type fields are 
>>>>>>>> not
>>>>>>>> properly gimplified which usually happens because the frontend fails to
>>>>>>>> insert a gimplification point for it (a DECL_EXPR).
>>>>>>> 
>>>>>>> I found an old gcc bug
>>>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97172
>>>>>>> ICE: tree code ‘ssa_name’ is not supported in LTO streams since 
>>>>>>> r11-3303-g6450f07388f9fe57
>>>>>>> 
>>>>>>> Which is very similar to the bug I am having right now.
>>>>>>> 
>>>>>>> After further study, I suspect that the issue I am having right now 
>>>>>>> with the LTO streaming also
>>>>>>> relate to “unshare_expr”, “save_expr”, and the combination of these 
>>>>>>> two, I suspect that
>>>>>>> the current gcc cannot handle the combination of these two correctly 
>>>>>>> for my case.
>>>>>>> 
>>>>>>> My testing case is:
>>>>>>> 
>>>>>>> #include 
>>>>>>> void __attribute__((__noinline__)) setup_and_test_vla (int n1, int n2, 
>>>>>>> int m)
>>>>>>> {
>>>>>>> struct foo {
>>>>>>> int n;
>>>>>>> int p[][n2][n1] __attribute__((counted_by(n)));
>>>>>>> } *f;
>>>>>>> 
>>>>>>> f = (struct foo *) malloc (sizeof(struct foo) + m*sizeof(int[n2][n1]));
>>>>>>> f->n = m;
>>>>>>> f->p[m][n2][n1]=1;
>>>>>>> return;
>>>>>>> }
>>>>>>> 
>>>>>>> int main(int argc, char *argv[])
>>>>>>> {
>>>>>>> setup_and_test_vla (10, 11, 20);
>>>>>>> return 0;
>>>>>>> }
>>>>>>> 
>>>>>>> Failed with
>>>>>>> my_gcc -Os -fsanitize=bounds -flto
>>>>>>> 
>>>>>>> If changing either n1 or n2 to a constant, the testing passed.
>>>>>>> If deleting -flto, the testing passed too.
>>>>>>&g

Re: HELP: Questions on unshare_expr

2024-01-19 Thread Qing Zhao


> On Jan 19, 2024, at 4:30 AM, Richard Biener  
> wrote:
> 
> On Thu, Jan 18, 2024 at 3:46 PM Qing Zhao  wrote:
>> 
>> 
>> 
>>> On Jan 17, 2024, at 1:43 AM, Richard Biener  
>>> wrote:
>>> 
>>> On Wed, Jan 17, 2024 at 7:42 AM Richard Biener
>>>  wrote:
>>>> 
>>>> On Tue, Jan 16, 2024 at 9:26 PM Qing Zhao  wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Jan 15, 2024, at 4:31 AM, Richard Biener  
>>>>>> wrote:
>>>>>> 
>>>>>>> All my questions for unshare_expr relate to a  LTO bug that I currently 
>>>>>>> stuck with
>>>>>>> when using .ACCESS_WITH_SIZE in bound sanitizer (only with -flto, 
>>>>>>> without -flto, no issue):
>>>>>>> 
>>>>>>> [opc@qinzhao-aarch64-ol8 gcc]$ sh t
>>>>>>> during IPA pass: modref
>>>>>>> t.c:20:1: internal compiler error: tree code ‘ssa_name’ is not 
>>>>>>> supported in LTO streams
>>>>>>> 0x14c3993 lto_write_tree
>>>>>>>  ../../latest-gcc-write/gcc/lto-streamer-out.cc:561
>>>>>>> 0x14c3aeb lto_output_tree_1
>>>>>>> 
>>>>>>> And the value of the tree node that triggered the ICE is:
>>>>>>> (gdb) call debug_tree(expr)
>>>>>>> 
>>>>>>>  nothrow
>>>>>>>  def_stmt
>>>>>>>  version:13 in-free-list>
>>>>>>> 
>>>>>>> Is there any good way to debug LTO bug?
>>>>>> 
>>>>>> This happens usually when you have a VLA type and its type fields are not
>>>>>> properly gimplified which usually happens because the frontend fails to
>>>>>> insert a gimplification point for it (a DECL_EXPR).
>>>>> 
>>>>> I found an old gcc bug
>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97172
>>>>> ICE: tree code ‘ssa_name’ is not supported in LTO streams since 
>>>>> r11-3303-g6450f07388f9fe57
>>>>> 
>>>>> Which is very similar to the bug I am having right now.
>>>>> 
>>>>> After further study, I suspect that the issue I am having right now with 
>>>>> the LTO streaming also
>>>>> relate to “unshare_expr”, “save_expr”, and the combination of these two, 
>>>>> I suspect that
>>>>> the current gcc cannot handle the combination of these two correctly for 
>>>>> my case.
>>>>> 
>>>>> My testing case is:
>>>>> 
>>>>> #include 
>>>>> void __attribute__((__noinline__)) setup_and_test_vla (int n1, int n2, 
>>>>> int m)
>>>>> {
>>>>>  struct foo {
>>>>>  int n;
>>>>>  int p[][n2][n1] __attribute__((counted_by(n)));
>>>>>  } *f;
>>>>> 
>>>>>  f = (struct foo *) malloc (sizeof(struct foo) + m*sizeof(int[n2][n1]));
>>>>>  f->n = m;
>>>>>  f->p[m][n2][n1]=1;
>>>>>  return;
>>>>> }
>>>>> 
>>>>> int main(int argc, char *argv[])
>>>>> {
>>>>> setup_and_test_vla (10, 11, 20);
>>>>> return 0;
>>>>> }
>>>>> 
>>>>> Failed with
>>>>> my_gcc -Os -fsanitize=bounds -flto
>>>>> 
>>>>> If changing either n1 or n2 to a constant, the testing passed.
>>>>> If deleting -flto, the testing passed too.
>>>>> 
>>>>> I double checked my code per the suggestions provided by you and Jakub in 
>>>>> this
>>>>> email thread, and I think the code should be fine.
>>>>> 
>>>>> The code is following:
>>>>> 
>>>>> =
>>>>> 504 /* Instrument array bounds for INDIRECT_REFs whose pointers are
>>>>> 505POINTER_PLUS_EXPRs of calls to .ACCESS_WITH_SIZE. We create special
>>>>> 506builtins that gets expanded in the sanopt pass, and make an array
>>>>> 507dimension of it.  ARRAY is the pointer to the base of the array,
>>>>> 508which is a call to .ACCESS_WITH_SIZE, *OFFSET is the offset to the
>>>>> 509beginning of array.
>>>>> 510Return NULL_TREE if no instrume

Re: HELP: Questions on unshare_expr

2024-01-18 Thread Qing Zhao


> On Jan 17, 2024, at 1:43 AM, Richard Biener  
> wrote:
> 
> On Wed, Jan 17, 2024 at 7:42 AM Richard Biener
>  wrote:
>> 
>> On Tue, Jan 16, 2024 at 9:26 PM Qing Zhao  wrote:
>>> 
>>> 
>>> 
>>>> On Jan 15, 2024, at 4:31 AM, Richard Biener  
>>>> wrote:
>>>> 
>>>>> All my questions for unshare_expr relate to a  LTO bug that I currently 
>>>>> stuck with
>>>>> when using .ACCESS_WITH_SIZE in bound sanitizer (only with -flto, without 
>>>>> -flto, no issue):
>>>>> 
>>>>> [opc@qinzhao-aarch64-ol8 gcc]$ sh t
>>>>> during IPA pass: modref
>>>>> t.c:20:1: internal compiler error: tree code ‘ssa_name’ is not supported 
>>>>> in LTO streams
>>>>> 0x14c3993 lto_write_tree
>>>>>   ../../latest-gcc-write/gcc/lto-streamer-out.cc:561
>>>>> 0x14c3aeb lto_output_tree_1
>>>>> 
>>>>> And the value of the tree node that triggered the ICE is:
>>>>> (gdb) call debug_tree(expr)
>>>>> 
>>>>>   nothrow
>>>>>   def_stmt
>>>>>   version:13 in-free-list>
>>>>> 
>>>>> Is there any good way to debug LTO bug?
>>>> 
>>>> This happens usually when you have a VLA type and its type fields are not
>>>> properly gimplified which usually happens because the frontend fails to
>>>> insert a gimplification point for it (a DECL_EXPR).
>>> 
>>> I found an old gcc bug
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97172
>>> ICE: tree code ‘ssa_name’ is not supported in LTO streams since 
>>> r11-3303-g6450f07388f9fe57
>>> 
>>> Which is very similar to the bug I am having right now.
>>> 
>>> After further study, I suspect that the issue I am having right now with 
>>> the LTO streaming also
>>> relate to “unshare_expr”, “save_expr”, and the combination of these two, I 
>>> suspect that
>>> the current gcc cannot handle the combination of these two correctly for my 
>>> case.
>>> 
>>> My testing case is:
>>> 
>>> #include 
>>> void __attribute__((__noinline__)) setup_and_test_vla (int n1, int n2, int 
>>> m)
>>> {
>>>   struct foo {
>>>   int n;
>>>   int p[][n2][n1] __attribute__((counted_by(n)));
>>>   } *f;
>>> 
>>>   f = (struct foo *) malloc (sizeof(struct foo) + m*sizeof(int[n2][n1]));
>>>   f->n = m;
>>>   f->p[m][n2][n1]=1;
>>>   return;
>>> }
>>> 
>>> int main(int argc, char *argv[])
>>> {
>>>  setup_and_test_vla (10, 11, 20);
>>>  return 0;
>>> }
>>> 
>>> Failed with
>>> my_gcc -Os -fsanitize=bounds -flto
>>> 
>>> If changing either n1 or n2 to a constant, the testing passed.
>>> If deleting -flto, the testing passed too.
>>> 
>>> I double checked my code per the suggestions provided by you and Jakub in 
>>> this
>>> email thread, and I think the code should be fine.
>>> 
>>> The code is following:
>>> 
>>> =
>>> 504 /* Instrument array bounds for INDIRECT_REFs whose pointers are
>>> 505POINTER_PLUS_EXPRs of calls to .ACCESS_WITH_SIZE. We create special
>>> 506builtins that gets expanded in the sanopt pass, and make an array
>>> 507dimension of it.  ARRAY is the pointer to the base of the array,
>>> 508which is a call to .ACCESS_WITH_SIZE, *OFFSET is the offset to the
>>> 509beginning of array.
>>> 510Return NULL_TREE if no instrumentation is emitted.  */
>>> 511
>>> 512 tree
>>> 513 ubsan_instrument_bounds_indirect_ref (location_t loc, tree array, tree 
>>> *offset)
>>> 514 {
>>> 515   if (!is_access_with_size_p (array))
>>> 516 return NULL_TREE;
>>> 517   tree bound = get_bound_from_access_with_size (array);
>>> 518   /* The type of the call to .ACCESS_WITH_SIZE is a pointer type to
>>> 519  the element of the array.  */
>>> 520   tree element_size = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (array)));
>>> 521   gcc_assert (bound);
>>> 522
>>> 523   /* Given the offset, and the size of each element, the index can be
>>> 524  computed as: offset/element_size.  */
>>> 525   *offset = save_expr (*offset);
>>> 526   tree index = fold_build2 (EXACT_DIV_EXPR,
>>> 527 

Re: HELP: Questions on unshare_expr

2024-01-16 Thread Qing Zhao


> On Jan 15, 2024, at 4:31 AM, Richard Biener  
> wrote:
> 
>> All my questions for unshare_expr relate to a  LTO bug that I currently 
>> stuck with
>> when using .ACCESS_WITH_SIZE in bound sanitizer (only with -flto, without 
>> -flto, no issue):
>> 
>> [opc@qinzhao-aarch64-ol8 gcc]$ sh t
>> during IPA pass: modref
>> t.c:20:1: internal compiler error: tree code ‘ssa_name’ is not supported in 
>> LTO streams
>> 0x14c3993 lto_write_tree
>>../../latest-gcc-write/gcc/lto-streamer-out.cc:561
>> 0x14c3aeb lto_output_tree_1
>> 
>> And the value of the tree node that triggered the ICE is:
>> (gdb) call debug_tree(expr)
>> 
>>nothrow
>>def_stmt
>>version:13 in-free-list>
>> 
>> Is there any good way to debug LTO bug?
> 
> This happens usually when you have a VLA type and its type fields are not
> properly gimplified which usually happens because the frontend fails to
> insert a gimplification point for it (a DECL_EXPR).

I found an old gcc bug 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97172
ICE: tree code ‘ssa_name’ is not supported in LTO streams since 
r11-3303-g6450f07388f9fe57

Which is very similar to the bug I am having right now. 

After further study, I suspect that the issue I am having right now with the 
LTO streaming also 
relate to “unshare_expr”, “save_expr”, and the combination of these two, I 
suspect that
the current gcc cannot handle the combination of these two correctly for my 
case. 

My testing case is:

#include 
void __attribute__((__noinline__)) setup_and_test_vla (int n1, int n2, int m)
{
   struct foo {
   int n;
   int p[][n2][n1] __attribute__((counted_by(n)));
   } *f;

   f = (struct foo *) malloc (sizeof(struct foo) + m*sizeof(int[n2][n1]));
   f->n = m;
   f->p[m][n2][n1]=1;
   return;
}

int main(int argc, char *argv[])
{
  setup_and_test_vla (10, 11, 20);
  return 0;
}

Failed with 
my_gcc -Os -fsanitize=bounds -flto

If changing either n1 or n2 to a constant, the testing passed. 
If deleting -flto, the testing passed too. 

I double checked my code per the suggestions provided by you and Jakub in this
email thread, and I think the code should be fine.

The code is following:

=
504 /* Instrument array bounds for INDIRECT_REFs whose pointers are
505POINTER_PLUS_EXPRs of calls to .ACCESS_WITH_SIZE. We create special
506builtins that gets expanded in the sanopt pass, and make an array
507dimension of it.  ARRAY is the pointer to the base of the array,
508which is a call to .ACCESS_WITH_SIZE, *OFFSET is the offset to the
509beginning of array.
510Return NULL_TREE if no instrumentation is emitted.  */
511  
512 tree
513 ubsan_instrument_bounds_indirect_ref (location_t loc, tree array, tree 
*offset)
514 {
515   if (!is_access_with_size_p (array))
516 return NULL_TREE;
517   tree bound = get_bound_from_access_with_size (array);
518   /* The type of the call to .ACCESS_WITH_SIZE is a pointer type to
519  the element of the array.  */
520   tree element_size = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (array)));
521   gcc_assert (bound);
522 
523   /* Given the offset, and the size of each element, the index can be
524  computed as: offset/element_size.  */
525   *offset = save_expr (*offset);
526   tree index = fold_build2 (EXACT_DIV_EXPR,
527sizetype, *offset,
528unshare_expr (element_size));
529   /* Create a "(T *) 0" tree node to describe the original array type.
530  We get the original array type from the first argument of the call to
531  .ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, num_bytes, -1).
532  
533  Originally, REF is a COMPONENT_REF with the original array type,
534  it was converted to a pointer to an ADDR_EXPR, and the ADDR_EXPR's
535  first operand is the original COMPONENT_REF.  */
536   tree ref = CALL_EXPR_ARG (array, 0);
537   tree array_type
538 = unshare_expr (TREE_TYPE (TREE_OPERAND (TREE_OPERAND(ref, 0), 0)));
539   tree zero_with_type = build_int_cst (build_pointer_type (array_type), 0);
540   return build_call_expr_internal_loc (loc, IFN_UBSAN_BOUNDS,
541void_type_node, 3, zero_with_type,
542index, bound);
543 }

=

Inside gdb, the guilty IR failed in LTO streaming is from the above line 520:
TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (array))),

When I use this tree node as an operand of the expression at line 526, I added 
unshare_expr. 

However, I still see the guilty IR as in gdb:

unit-size 
side-effects
arg:0 

arg:0 
nothrow
def_stmt
version:12 in-free-list>
arg:1 
nothrow
def_stmt
version:13 in-free-list>>
arg:1 >


I have been stuck with this bug for quite some time. 
Any help is 

Re: HELP: Questions on unshare_expr

2024-01-15 Thread Qing Zhao


> On Jan 15, 2024, at 3:13 AM, Eric Botcazou  wrote:
> 
>> Okay, so, the "unsharing everything” is done automatically by the compiler
>> before gimplification? 
> 
> See the blurb at gimplify.cc:835 and below about this.

Thanks a lot for the info.  (I read this paragraph before sending the 
questions…)

Qing
> 
> -- 
> Eric Botcazou
> 
> 



Re: HELP: Questions on unshare_expr

2024-01-15 Thread Qing Zhao


> On Jan 15, 2024, at 10:06 AM, Jakub Jelinek  wrote:
> 
> On Mon, Jan 15, 2024 at 02:54:26PM +0000, Qing Zhao wrote:
>> So, before gimplification,  when inserting tree node, we don’t need manually
>> add unshare_expr since the gimplification will automatically unshare nodes. 
> 
> There are cases where unshare_expr is needed even then, such as the uses in
> the sanitizer, because code is then modifying suboperands in place later on
> and if things are shared bad things happen.

for my case, it’s in bound sanitizer, and the instrumentation happens 
during “c_genericize”, which seems before gimplfication. 

So,  when adding instrumentation for bound sanitizer, we still need to 
manually unshare expr even it’s before gimpflication?


If trees can be shared until
> they are unshared before gimplification, one doesn't need to worry about it,
> sure.
> 
>> However, during or after gimplfication, when inserting nodes, we should 
>> manually
>> add unshare_expr when we put the same “tree” into multiple operands.
> 
> Yes.
> 
>>> Using a SAVE_EXPR avoids redundant code but it also requires
>>> that the SAVE_EXPR uses are ordered.
>> 
>> “Require the SAVE_EXPR uses are ordered”, does this mean that 
>> SAVE_EXPRs for the same node should be in a correct order? Or something else?
> 
> The basic requirement is that SAVE_EXPR is evaluated somewhere in a code
> which dominates all other uses of the SAVE_EXPR.
> Say
> SAVE_EXPR , if (x) use1 (SAVE_EXPR ); 
> else use2 (SAVE_EXPR );
> is fine, but
> if (x) use1 (SAVE_EXPR ); else use2 (SAVE_EXPR 
> );
> is not.  Because in the latter case, it will be gimplified into evaluating
> the complex expression in the conditional code guarded on if (x != 0), save
> into some temporary variable and then in the else code just use that
> temporary variable, except it is uninitialized then.

Okay, I see.

Is there utility tool to check for any violation of this order? Or I have to 
manually check the order myself?

Thanks a lot for the help.

Qing
> 
>   Jakub
> 



Re: HELP: Questions on unshare_expr

2024-01-15 Thread Qing Zhao


> On Jan 15, 2024, at 4:31 AM, Richard Biener  
> wrote:
> 
> On Fri, Jan 12, 2024 at 6:30 PM Qing Zhao  wrote:
>> 
>> Thanks a lot for the reply.
>> 
>>> On Jan 12, 2024, at 11:28 AM, Richard Biener  
>>> wrote:
>>> 
>>> 
>>> 
>>>> Am 12.01.2024 um 16:55 schrieb Qing Zhao :
>>>> 
>>>> Hi,
>>>> 
>>>> I have some questions on using the utility routine “unshare_expr”:
>>>> 
>>>> From my understanding, there should be NO shared nodes in a GENERIC 
>>>> function.
>>>> Otherwise, gimplication might fail.
>>> 
>>> There is sharing and this is why we unshare everything before 
>>> gimplification.
>> 
>> Okay, so, the "unsharing everything” is done automatically by the compiler 
>> before gimplification?
>> I don’t need to worry about this?
>> 
>> I see  many places in FE where “unshare_expr” is used, for example, 
>> “ubsan_instrument_division”,
>> “ubsan_instrument_shift”, etc.
> 
> It's likely doing sth during gimplification.

So, before gimplification,  when inserting tree node, we don’t need manually
 add unshare_expr since the gimplification will automatically unshare nodes. 

However, during or after gimplfication, when inserting nodes, we should manually
 add unshare_expr when we put the same “tree” into multiple operands.

Is this understanding correct?

>> So, usually, when should “unshare_expr” be used?
> 
> You should usually unshare when you are putting the same 'tree' into multiple
> operands.  

Okay, I see.

> Using a SAVE_EXPR avoids redundant code but it also requires
> that the SAVE_EXPR uses are ordered.

“Require the SAVE_EXPR uses are ordered”, does this mean that 
SAVE_EXPRs for the same node should be in a correct order? Or something else?


> 
>>>> Therefore, when we insert new tree nodes manually into the GENERIC 
>>>> function, we should
>>>> Make sure there is no shared nodes introduced.
>>>> 
>>>> 1. Is the above understanding correct?
>>> 
>>> No
>>> 
>>>> 2. Is there any tool to check there is no shared nodes in the GENERIC 
>>>> function?
>>>> 3. Are there any tree nodes that are allowed to be shared in a GENERIC 
>>>> function? If so, what are they?
>>> 
>>> There’s some allowed sharing on GIMPLE and a verifier.
>> What’s the name of the verifier that I can search and check?
> 
> verify_node_sharing

Okay, thanks. 

> 
>>> 
>>>> 4. For the following:
>>>> 
>>>> If both “op1” and “op2” are existing tree nodes in the current GENERIC 
>>>> function,
>>>> and we will insert a new tree node:
>>>> 
>>>> tree  new_tree = build2 (CODE, TYPE, op1, op2)
>>>> 
>>>> 
>>>> Should we add “unshare_expr” on both “op1” and “op2” as:
>>>> 
>>>> Tree new_tree = build2 (CODE, TYPE, unshare_expr (op1), unshare_expr (op2))
>>>> ?
>>> 
>>> Not necessarily but instead you have to watch for evaluating side-effects 
>>> only once.  See save_expr.
>> 
>> Okay.  I see.
>>> 
>>>> 
>>>> If op2 is a node that is allowed to be shared, whether the additional 
>>>> “unshare_expr” on it trigger any potential problem?
>>> 
>>> If you unshare side-effects that’s generating wrong-code.  Otherwise 
>>> unsharing is safe.
>> 
>> Okay.
>> Will unnecessary unshareing produce redundant IRs?
> 
> Yes.
> 
>> All my questions for unshare_expr relate to a  LTO bug that I currently 
>> stuck with
>> when using .ACCESS_WITH_SIZE in bound sanitizer (only with -flto, without 
>> -flto, no issue):
>> 
>> [opc@qinzhao-aarch64-ol8 gcc]$ sh t
>> during IPA pass: modref
>> t.c:20:1: internal compiler error: tree code ‘ssa_name’ is not supported in 
>> LTO streams
>> 0x14c3993 lto_write_tree
>>../../latest-gcc-write/gcc/lto-streamer-out.cc:561
>> 0x14c3aeb lto_output_tree_1
>> 
>> And the value of the tree node that triggered the ICE is:
>> (gdb) call debug_tree(expr)
>> 
>>nothrow
>>def_stmt
>>version:13 in-free-list>
>> 
>> Is there any good way to debug LTO bug?
> 
> This happens usually when you have a VLA type and its type fields are not
> properly gimplified which usually happens because the frontend fails to
> insert a gimplification point for it (a DECL_EXPR).
Thanks for the info. 
This is happening for a structure TYPE with FAM (I guess similar as VLA?)
Usually what’s the good solution to it?

thanks.

Qing
> 
>> Thanks a lot for the help.
>> 
>> Qing
>> 
>> 
>>> 
>>> Richard
>>> 
>>>> Thanks a lot for your help.
>>>> 
>>>> Qing
>> 



Re: HELP: Questions on unshare_expr

2024-01-12 Thread Qing Zhao
Thanks a lot for the reply.  

> On Jan 12, 2024, at 11:28 AM, Richard Biener  
> wrote:
> 
> 
> 
>> Am 12.01.2024 um 16:55 schrieb Qing Zhao :
>> 
>> Hi,
>> 
>> I have some questions on using the utility routine “unshare_expr”:
>> 
>> From my understanding, there should be NO shared nodes in a GENERIC function.
>> Otherwise, gimplication might fail.
> 
> There is sharing and this is why we unshare everything before gimplification.

Okay, so, the "unsharing everything” is done automatically by the compiler 
before gimplification? 
I don’t need to worry about this?

I see  many places in FE where “unshare_expr” is used, for example, 
“ubsan_instrument_division”,
 “ubsan_instrument_shift”, etc. 

So, usually, when should “unshare_expr” be used? 

>> Therefore, when we insert new tree nodes manually into the GENERIC function, 
>> we should
>> Make sure there is no shared nodes introduced.
>> 
>> 1. Is the above understanding correct?
> 
> No
> 
>> 2. Is there any tool to check there is no shared nodes in the GENERIC 
>> function?
>> 3. Are there any tree nodes that are allowed to be shared in a GENERIC 
>> function? If so, what are they?
> 
> There’s some allowed sharing on GIMPLE and a verifier.
What’s the name of the verifier that I can search and check? 
> 
>> 4. For the following:
>> 
>> If both “op1” and “op2” are existing tree nodes in the current GENERIC 
>> function,
>> and we will insert a new tree node:
>> 
>> tree  new_tree = build2 (CODE, TYPE, op1, op2)
>> 
>> 
>> Should we add “unshare_expr” on both “op1” and “op2” as:
>> 
>> Tree new_tree = build2 (CODE, TYPE, unshare_expr (op1), unshare_expr (op2))
>> ?
> 
> Not necessarily but instead you have to watch for evaluating side-effects 
> only once.  See save_expr.

Okay.  I see.
> 
>> 
>> If op2 is a node that is allowed to be shared, whether the additional 
>> “unshare_expr” on it trigger any potential problem?
> 
> If you unshare side-effects that’s generating wrong-code.  Otherwise 
> unsharing is safe.

Okay. 
Will unnecessary unshareing produce redundant IRs?

All my questions for unshare_expr relate to a  LTO bug that I currently stuck 
with 
when using .ACCESS_WITH_SIZE in bound sanitizer (only with -flto, without 
-flto, no issue):

[opc@qinzhao-aarch64-ol8 gcc]$ sh t
during IPA pass: modref
t.c:20:1: internal compiler error: tree code ‘ssa_name’ is not supported in LTO 
streams
0x14c3993 lto_write_tree
../../latest-gcc-write/gcc/lto-streamer-out.cc:561
0x14c3aeb lto_output_tree_1

And the value of the tree node that triggered the ICE is:
(gdb) call debug_tree(expr)
 
nothrow
def_stmt 
version:13 in-free-list>

Is there any good way to debug LTO bug?

Thanks a lot for the help.

Qing


> 
> Richard 
> 
>> Thanks a lot for your help.
>> 
>> Qing



HELP: Questions on unshare_expr

2024-01-12 Thread Qing Zhao
Hi, 

I have some questions on using the utility routine “unshare_expr”:

From my understanding, there should be NO shared nodes in a GENERIC function. 
 Otherwise, gimplication might fail. 

Therefore, when we insert new tree nodes manually into the GENERIC function, we 
should
Make sure there is no shared nodes introduced. 

1. Is the above understanding correct?
2. Is there any tool to check there is no shared nodes in the GENERIC function?
3. Are there any tree nodes that are allowed to be shared in a GENERIC 
function? If so, what are they?

4. For the following:

If both “op1” and “op2” are existing tree nodes in the current GENERIC 
function, 
and we will insert a new tree node:

tree  new_tree = build2 (CODE, TYPE, op1, op2)


Should we add “unshare_expr” on both “op1” and “op2” as:

Tree new_tree = build2 (CODE, TYPE, unshare_expr (op1), unshare_expr (op2))
?

If op2 is a node that is allowed to be shared, whether the additional 
“unshare_expr” on it trigger any potential problem?

Thanks a lot for your help.

Qing 







Re: HELP: one issue during the implementation for counted_by attribute

2023-12-06 Thread Qing Zhao
Just an update on this issue.

Finally, I resolved this issue with the following solution:

For the source code (portion):

"
struct annotated {
  size_t foo;
  char array[] __attribute__((counted_by (foo)));
};

p2->array[8] = 0;
“

C FE will generate the following: (*.005t.original)

*(.ACCESS_WITH_SIZE (p2->array, >foo, 1, 8, -1) + 8) = 0;

i.e, the RETURN type of the call to .ACCESS_WITH_SIZE should be a pointer type 
to char,  char *
(Previously, the RETURN type of the call is char [])"

This resolved the issue nicely. 

Let me know if you see any obvious issue with this solution. 

thanks.

Qing


> On Nov 30, 2023, at 11:07 AM, Qing Zhao  wrote:
> 
> Hi, 
> 
> 1. For the following source code (portion):
> 
> struct annotated {
>  size_t foo;
>  char b;
>  char array[] __attribute__((counted_by (foo)));
> };
> 
> static void noinline bar ()
> {
>  struct annotated *p2 = alloc_buf (10);
>  p2->array[8] = 0;
>  return;
> }
> 
> 2. I modified C FE to generate the following code for the routine “bar”:
> 
> ;; Function bar (null)
> ;; enabled by -tree-original
> {
>  struct annotated * p2 = alloc_buf (10);
> 
>struct annotated * p2 = alloc_buf (10);
>  .ACCESS_WITH_SIZE ((char *) >array, >foo, 1, 8, -1)[8] = 0;
>  return;
> }
> 
> The gimpliflication asserted at:/home/opc/Install/latest-d/bin/gcc -O2 
> -fdump-tree-all ttt_1.c
> ttt_1.c: In function ‘bar’:
> ttt_1.c:29:5: internal compiler error: in create_tmp_var, at 
> gimple-expr.cc:488
>   29 |   p2->array[8] = 0;
>  |   ~~^~~
> 
> 3. The reason for this assertion failure is:  (in gcc/gimplify.cc)
> 
> 16686 case CALL_EXPR:
> 16687   ret = gimplify_call_expr (expr_p, pre_p, fallback != fb_none);
> 16688 
> 16689   /* C99 code may assign to an array in a structure returned
> 16690  from a function, and this has undefined behavior only on
> 16691  execution, so create a temporary if an lvalue is
> 16692  required.  */
> 16693   if (fallback == fb_lvalue)
> 16694 {
> 16695   *expr_p = get_initialized_tmp_var (*expr_p, pre_p, 
> post_p, false);
> 16696   mark_addressable (*expr_p);
> 16697   ret = GS_OK;
> 16698 }
> 16699   break;
> 
> At Line 16695, when gimplifier tried to create a temporary value for the 
> .ACCESS_WITH_SIZE function as:
>   tmp = .ACCESS_WITH_SIZE ((char *) >array, >foo, 1, 8, -1);
> 
> It asserted since the TYPE of the function .ACCESS_WITH_SIZE is an 
> INCOMPLETE_TYPE (it’s the TYPE of p2->array, which is an incomplete type).
> 
> 4. I am stuck on how to resolve this issue properly:
> The first question is:
> 
> Where should  we generate
>  tmp = .ACCESS_WITH_SIZE ((char *) >array, >foo, 1, 8, -1)
> 
> In C FE or in middle-end gimplification? 
> 
> Thanks a lot for your help.
> 
> Qing
> 



HELP: one issue during the implementation for counted_by attribute

2023-11-30 Thread Qing Zhao
Hi, 

1. For the following source code (portion):

struct annotated {
  size_t foo;
  char b;
  char array[] __attribute__((counted_by (foo)));
};

static void noinline bar ()
{
  struct annotated *p2 = alloc_buf (10);
  p2->array[8] = 0;
  return;
}

2. I modified C FE to generate the following code for the routine “bar”:

;; Function bar (null)
;; enabled by -tree-original
{
  struct annotated * p2 = alloc_buf (10);

struct annotated * p2 = alloc_buf (10);
  .ACCESS_WITH_SIZE ((char *) >array, >foo, 1, 8, -1)[8] = 0;
  return;
}

The gimpliflication asserted at:/home/opc/Install/latest-d/bin/gcc -O2 
-fdump-tree-all ttt_1.c
ttt_1.c: In function ‘bar’:
ttt_1.c:29:5: internal compiler error: in create_tmp_var, at gimple-expr.cc:488
   29 |   p2->array[8] = 0;
  |   ~~^~~

3. The reason for this assertion failure is:  (in gcc/gimplify.cc)

16686 case CALL_EXPR:
16687   ret = gimplify_call_expr (expr_p, pre_p, fallback != fb_none);
16688 
16689   /* C99 code may assign to an array in a structure returned
16690  from a function, and this has undefined behavior only on
16691  execution, so create a temporary if an lvalue is
16692  required.  */
16693   if (fallback == fb_lvalue)
16694 {
16695   *expr_p = get_initialized_tmp_var (*expr_p, pre_p, post_p, 
false);
16696   mark_addressable (*expr_p);
16697   ret = GS_OK;
16698 }
16699   break;

At Line 16695, when gimplifier tried to create a temporary value for the 
.ACCESS_WITH_SIZE function as:
   tmp = .ACCESS_WITH_SIZE ((char *) >array, >foo, 1, 8, -1);

It asserted since the TYPE of the function .ACCESS_WITH_SIZE is an 
INCOMPLETE_TYPE (it’s the TYPE of p2->array, which is an incomplete type).

4. I am stuck on how to resolve this issue properly:
The first question is:

Where should  we generate
  tmp = .ACCESS_WITH_SIZE ((char *) >array, >foo, 1, 8, -1)

In C FE or in middle-end gimplification? 

Thanks a lot for your help.

Qing



RFC (V3) the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-09 Thread Qing Zhao
Hi,

I added the BPF related issue and the solution in the section Appendix 4 Known 
issues. 
No change to other parts. 

Send this V3 for record purpose.

Qing


Represent the missing dependence for the "counted_by" attribute and its 
consumers 

Qing Zhao

11/09/2023
==

The whole discussion is at:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633783.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634844.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635397.html

1. The problem

There is a data dependency between the size assignment and the implicit use of 
the size information in the __builtin_dynamic_object_size that is missing in 
the IL (line 11 and line 13 in the below example). Such information missing 
will result incorrect code reordering and other code transformations. 

  1 struct A
  2 {
  3  size_t size;
  4  char buf[] __attribute__((counted_by(size)));
  5 };
  6 
  7 size_t 
  8 foo (size_t sz)
  9 {
 10  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
 11  obj->size = sz;
 12  obj->buf[0] = 2;
 13  return __builtin_dynamic_object_size (obj->buf, 1);
 14 }
  
Please see a more complicate example in the Appendex 1.

We need to represent such data dependency correctly in the IL. 

2. The solution:

2.1 Summary

* Add a new internal function ".ACCESS_WITH_SIZE" to carry the size information 
for every reference to a FAM field;
* In C FE, Replace every reference to a FAM field whose TYPE has the 
"counted_by" attribute with the new internal function ".ACCESS_WITH_SIZE";
* In every consumer of the size information, for example, BDOS or array bound 
sanitizer, query the size information or ACCESS_MODE information from the new 
internal function;
* When the size information and the "ACCESS_MODE" information are not used 
anymore, possibly at the 2nd object size phase, replace the internal function 
with the actual reference to the FAM field; 
* Some adjustment to inlining heuristic, ipa alias analysis, and other SSA 
passes to mitigate the impact to the optimizer and code generation. 

2.2 The new internal function 

  .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, SIZE_OF_SIZE, 
ACCESS_MODE)

INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)

which returns the "REF_TO_OBJ" same as the 1st argument;

1st argument "REF_TO_OBJ": The reference to the object;
2nd argument "REF_TO_SIZE": The reference to the size of the object, 
3rd argument "CLASS_OF_SIZE": The size referenced by the REF_TO_SIZE represents 
   0: unknown;
   1: the number of the elements of the object type;
   2: the number of bytes; 
4th argument "SIZE_OF_SIZE": how many bytes is the object that REF_TO_SIZE 
points;
5th argument "ACCESS_MODE": 
  -1: Unknown access semantics
   0: none
   1: read_only
   2: write_only
   3: read_write

NOTEs, 
  A. This new internal function is intended for a more general use from all the 
3 attributes, "access", "alloc_size", and the new "counted_by", to encode the 
"size" and "access_mode" information to the corresponding pointer. (in order to 
resolve PR96503, etc. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503);
  B. For "counted_by" the 3rd argument will be 1;
  C. For "counted_by" and "alloc_size" attributes, the 5th argument will be -1; 
  
  D. In this wrieup, we focus on the implementation details for the 
"counted_by" attribute. However, this function should be ready to be used by 
"access" and "alloc_size" without issue. 

2.3 A new semantic requirement in the user documentation of "counted_by"

For the following structure including a FAM with a counted_by attribute:

  struct A
  {
   size_t size;
   char buf[] __attribute__((counted_by(size)));
  };

for any object with such type:

  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));

The initialization to the size field should be done before the first reference 
to the FAM field,
Otherwise, the behavior is undefined.
Such additional requirement to the user will guarantee that the first reference 
to the FAM knows the size of the FAM.  

Another thing that need to be clarified is:
A later reference to the FAM field will use the latest value assigned to the 
size field before that reference. For example, 
 obj->size = val1;
 ref1 (obj->buf);
 obj->size = val2;
 ref2 (obj->buf);
in the above, "ref1" will use val1 and "ref2" will use val2. 
This clarification will inform user that the dynamic array feature is fully 
supported.

We need to add the above additional requirement and clarification to the user 
documentation.
The complete user documentation is in App

Re: RFC (V2) the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-09 Thread Qing Zhao


> On Nov 9, 2023, at 11:50 AM, Jose Marchesi  wrote:
> 
>> 
>> On Thu, Nov 09, 2023 at 03:49:49PM +, Qing Zhao wrote:
>>> Is it reasonable to add one option to disable the “counted_by” attribute?
>>> (then no insertion of the new .ACCESS_WITH_SIZE into IL).  
>>> 
>>> The major reason is: some users might want to ignore all the “counted_by” 
>>> attribute added in the source code,
>>> We need to provide them a way to disable this feature.
>> 
>> -D'counted_by(x)='
>> and/or
>> -D'__counted_by__(x)='
>> ?
> 
> The insertion of .ACCESS_WITH_SIZE collides with the BPF CO-RE
> preserve_access_index implementation.
> 
> I don't think this will be a problem in practice (the BPF program can
> define counted_by to the empty string as Jakub suggests) but we ought to
> at least detect when a data structure featuring a counted_by FMA is
> accessed with access index preservation (either attribute or builtin)
> and either error out or warning out and try to accomodate by turning the
> .ACCESS_WTIH_INDEX back to plain accesses.  We can do either with BPF
> specific backend code.

Yes, I agree that handling this in BPF backend code might be a better approach
 since this is really a BPF CO-RE specific issue.

For the counted_by implementation, I will keep the current design.

But I will add this identified BPF CO-RE issue into the proposal as a known 
issue for record purpose.

Thanks a lot for raising this issue and the possible solutions.

Qing



Re: RFC (V2) the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-09 Thread Qing Zhao
Is it reasonable to add one option to disable the “counted_by” attribute?
(then no insertion of the new .ACCESS_WITH_SIZE into IL).  

The major reason is: some users might want to ignore all the “counted_by” 
attribute added in the source code,
We need to provide them a way to disable this feature.

thanks.

Qing

> On Nov 6, 2023, at 7:12 PM, Qing Zhao  wrote:
> 
> Hi,
> 
> Attached is the 2nd version of the proposal based on all the discussion so 
> far.
> 
> Let me know if you have more comment and suggestion.
> 
> Thanks a lot for all the help.
> 
> Qing
> ===
> Represent the missing dependence for the "counted_by" attribute and its 
> consumers 
> 
> Qing Zhao
> 
> 11/06/2023
> ==
> 
> The whole discussion is at:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633783.html
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634844.html
> 
> 1. The problem
> 
> There is a data dependency between the size assignment and the implicit use 
> of the size information in the __builtin_dynamic_object_size that is missing 
> in the IL (line 11 and line 13 in the below example). Such information 
> missing will result incorrect code reordering and other code transformations. 
> 
>  1 struct A
>  2 {
>  3  size_t size;
>  4  char buf[] __attribute__((counted_by(size)));
>  5 };
>  6 
>  7 size_t 
>  8 foo (size_t sz)
>  9 {
> 10  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> 11  obj->size = sz;
> 12  obj->buf[0] = 2;
> 13  return __builtin_dynamic_object_size (obj->buf, 1);
> 14 }
> 
> Please see a more complicate example in the Appendex 1.
> 
> We need to represent such data dependency correctly in the IL. 
> 
> 2. The solution:
> 
> 2.1 Summary
> 
> * Add a new internal function ".ACCESS_WITH_SIZE" to carry the size 
> information for every reference to a FAM field;
> * In C FE, Replace every reference to a FAM field whose TYPE has the 
> "counted_by" attribute with the new internal function ".ACCESS_WITH_SIZE";
> * In every consumer of the size information, for example, BDOS or array bound 
> sanitizer, query the size information or ACCESS_MODE information from the new 
> internal function;
> * When the size information and the "ACCESS_MODE" information are not used 
> anymore, possibly at the 2nd object size phase, replace the internal function 
> with the actual reference to the FAM field; 
> * Some adjustment to inlining heuristic, ipa alias analysis, and other SSA 
> passes to mitigate the impact to the optimizer and code generation. 
> 
> 2.2 The new internal function 
> 
>  .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, SIZE_OF_SIZE, 
> ACCESS_MODE)
> 
> INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)
> 
> which returns the "REF_TO_OBJ" same as the 1st argument;
> 
> 1st argument "REF_TO_OBJ": The reference to the object;
> 2nd argument "REF_TO_SIZE": The reference to the size of the object, 
> 3rd argument "CLASS_OF_SIZE": The size referenced by the REF_TO_SIZE 
> represents 
>   0: unknown;
>   1: the number of the elements of the object type;
>   2: the number of bytes; 
> 4th argument "SIZE_OF_SIZE": how many bytes is the object that REF_TO_SIZE 
> points;
> 5th argument "ACCESS_MODE": 
>  -1: Unknown access semantics
>   0: none
>   1: read_only
>   2: write_only
>   3: read_write
> 
> NOTEs, 
>  A. This new internal function is intended for a more general use from all 
> the 3 attributes, "access", "alloc_size", and the new "counted_by", to encode 
> the "size" and "access_mode" information to the corresponding pointer. (in 
> order to resolve PR96503, etc. 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503);
>  B. For "counted_by" the 3rd argument will be 1;
>  C. For "counted_by" and "alloc_size" attributes, the 5th argument will be 
> -1;   
>  D. In this wrieup, we focus on the implementation details for the 
> "counted_by" attribute. However, this function should be ready to be used by 
> "access" and "alloc_size" without issue. 
> 
> 2.3 A new semantic requirement in the user documentation of "counted_by"
> 
> For the following structure including a FAM with a counted_by attribute:
> 
>  struct A
>  {
>   size_t size;
>   char buf[] __attribute__((counted_by(size)));
>  };
> 
> for any object with such type:
> 
>  struct A *obj = __builtin_malloc (sizeof

RFC (V2) the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-06 Thread Qing Zhao
Hi,

Attached is the 2nd version of the proposal based on all the discussion so far.

Let me know if you have more comment and suggestion.

Thanks a lot for all the help.

Qing
===
Represent the missing dependence for the "counted_by" attribute and its 
consumers 

Qing Zhao

11/06/2023
==

The whole discussion is at:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633783.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634844.html

1. The problem

There is a data dependency between the size assignment and the implicit use of 
the size information in the __builtin_dynamic_object_size that is missing in 
the IL (line 11 and line 13 in the below example). Such information missing 
will result incorrect code reordering and other code transformations. 

  1 struct A
  2 {
  3  size_t size;
  4  char buf[] __attribute__((counted_by(size)));
  5 };
  6 
  7 size_t 
  8 foo (size_t sz)
  9 {
 10  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
 11  obj->size = sz;
 12  obj->buf[0] = 2;
 13  return __builtin_dynamic_object_size (obj->buf, 1);
 14 }
  
Please see a more complicate example in the Appendex 1.

We need to represent such data dependency correctly in the IL. 

2. The solution:

2.1 Summary

* Add a new internal function ".ACCESS_WITH_SIZE" to carry the size information 
for every reference to a FAM field;
* In C FE, Replace every reference to a FAM field whose TYPE has the 
"counted_by" attribute with the new internal function ".ACCESS_WITH_SIZE";
* In every consumer of the size information, for example, BDOS or array bound 
sanitizer, query the size information or ACCESS_MODE information from the new 
internal function;
* When the size information and the "ACCESS_MODE" information are not used 
anymore, possibly at the 2nd object size phase, replace the internal function 
with the actual reference to the FAM field; 
* Some adjustment to inlining heuristic, ipa alias analysis, and other SSA 
passes to mitigate the impact to the optimizer and code generation. 

2.2 The new internal function 

  .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, SIZE_OF_SIZE, 
ACCESS_MODE)

INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)

which returns the "REF_TO_OBJ" same as the 1st argument;

1st argument "REF_TO_OBJ": The reference to the object;
2nd argument "REF_TO_SIZE": The reference to the size of the object, 
3rd argument "CLASS_OF_SIZE": The size referenced by the REF_TO_SIZE represents 
   0: unknown;
   1: the number of the elements of the object type;
   2: the number of bytes; 
4th argument "SIZE_OF_SIZE": how many bytes is the object that REF_TO_SIZE 
points;
5th argument "ACCESS_MODE": 
  -1: Unknown access semantics
   0: none
   1: read_only
   2: write_only
   3: read_write

NOTEs, 
  A. This new internal function is intended for a more general use from all the 
3 attributes, "access", "alloc_size", and the new "counted_by", to encode the 
"size" and "access_mode" information to the corresponding pointer. (in order to 
resolve PR96503, etc. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503);
  B. For "counted_by" the 3rd argument will be 1;
  C. For "counted_by" and "alloc_size" attributes, the 5th argument will be -1; 
  
  D. In this wrieup, we focus on the implementation details for the 
"counted_by" attribute. However, this function should be ready to be used by 
"access" and "alloc_size" without issue. 

2.3 A new semantic requirement in the user documentation of "counted_by"

For the following structure including a FAM with a counted_by attribute:

  struct A
  {
   size_t size;
   char buf[] __attribute__((counted_by(size)));
  };

for any object with such type:

  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));

The initialization to the size field should be done before the first reference 
to the FAM field,
Otherwise, the behavior is undefined.
Such additional requirement to the user will guarantee that the first reference 
to the FAM knows the size of the FAM.  

Another thing that need to be clarified is:
A later reference to the FAM field will use the latest value assigned to the 
size field before that reference. For example, 
 obj->size = val1;
 ref1 (obj->buf);
 obj->size = val2;
 ref2 (obj->buf);
in the above, "ref1" will use val1 and "ref2" will use val2. 
This clarification will inform user that the dynamic array feature is fully 
supported.

We need to add the above additional requirement and clarification to the user 
documentation.
The complete user documentation is in Appendix 2. 

2.4 Replace the reference to a FAM field with the new functio

Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-03 Thread Qing Zhao
Yes, after today’s discussion, I think we agreed on 

1. Passing the size field by reference to .ACCESS_WITH_SIZE as jakub suggested.
2. Then the compiler should be able to always use the latest value of size 
field for the reference to FAM.

As a result, no need to add code for pointer re-obtaining purpose in the source 
code. 

I will update the proposal one more time.

thanks.

Qing

> On Nov 2, 2023, at 8:28 PM, Bill Wendling  wrote:
> 
> On Thu, Nov 2, 2023 at 1:36 PM Qing Zhao  wrote:
>> 
>> Thanks a lot for raising these issues.
>> 
>> If I understand correctly,  the major question we need to answer is:
>> 
>> For the following example: (Jakub mentioned this  in an early message)
>> 
>>  1 struct S { int a; char b __attribute__((counted_by (a))) []; };
>>  2 struct S s;
>>  3 s.a = 5;
>>  4 char *p = [2];
>>  5 int i1 = __builtin_dynamic_object_size (p, 0);
>>  6 s.a = 3;
>>  7 int i2 = __builtin_dynamic_object_size (p, 0);
>> 
>> Should the 2nd __bdos call (line 7) get
>>A. the latest value of s.a (line 6) for it’s size?
>> Or  B. the value when the s.b was referenced (line 3, line 4)?
>> 
> I personally think it should be (A). The user is specifically
> indicating that the size has somehow changed, and the compiler should
> behave accordingly.
> 
>> A should be more convenient for the user to use the dynamic array feature.
>> With B, the user has to modify the source code (to add code to “re-obtain”
>> the pointer after the size was adjusted at line 6) as mentioned by Richard.
>> 
>> This depends on how we design the new internal function .ACCESS_WITH_SIZE
>> 
>> 1. Size is passed by value to .ACCESS_WITH_SIZE as we currently designed.
>> 
>> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
>> 
>> 2. Size is passed by reference to .ACCESS_WITH_SIZE as Jakub suggested.
>> 
>> PTR = .ACCESS_WITH_SIZE(PTR, , TYPEOFSIZE, ACCESS_MODE)
>> 
>> With 1, We can only provide B, the user needs to modify the source code to 
>> get the full feature of dynamic array;
>> With 2, We can provide  A, the user will get full support to the dynamic 
>> array without restrictions in the source code.
>> 
> My understanding of ACCESS_WITH_SIZE is that it's there to add an
> explicit reference to SIZE so that the optimizers won't reorder the
> code incorrectly. If that's the case, then it should act as if
> ACCESS_WITH_SIZE wasn't even there (i.e. it's just a pointer
> dereference into the FAM). We get that with (2) it appears. It would
> be a major headache to make the user go throughout their code base to
> ensure that SIZE was either unmodified, or if it was that extra code
> must be added to ensure the expected behavior.
> 
>> However, We have to pay additional cost for supporting A by using 2, which 
>> includes:
>> 
>> 1. .ACCESS_WITH_SIZE will become an escape point, which will further impact 
>> the IPA optimizations, more runtime overhead.
>>Then .ACCESS_WTH_SIZE will not be CONST, right? But it will still be PURE?
>> 
>> 2. __builtin_dynamic_object_size will NOT be LEAF anymore.  This will also 
>> impact some IPA optimizations, more runtime overhead.
>> 
>> I think the following are the factors that make the decision:
>> 
>> 1. How big the performance impact?
>> 2. How important the dynamic array feature? Is adding some user restrictions 
>> as Richard mentioned feasible to support this feature?
>> 
>> Maybe we can implement 1 first, if the full support to the dynamic array is 
>> needed, we can add 2 then?
>> Or, we can implement both, and compare the performance difference, then 
>> decide?
>> 
>> Qing
>> 



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-03 Thread Qing Zhao


> On Nov 2, 2023, at 8:13 PM, Bill Wendling  wrote:
> 
> On Thu, Nov 2, 2023 at 1:00 AM Richard Biener
>  wrote:
>> 
>> On Wed, Nov 1, 2023 at 3:47 PM Qing Zhao  wrote:
>>> 
>>> 
>>> 
>>>> On Oct 31, 2023, at 6:14 PM, Joseph Myers  wrote:
>>>> 
>>>> On Tue, 31 Oct 2023, Qing Zhao wrote:
>>>> 
>>>>> 2.3 A new semantic requirement in the user documentation of "counted_by"
>>>>> 
>>>>> For the following structure including a FAM with a counted_by attribute:
>>>>> 
>>>>> struct A
>>>>> {
>>>>>  size_t size;
>>>>>  char buf[] __attribute__((counted_by(size)));
>>>>> };
>>>>> 
>>>>> for any object with such type:
>>>>> 
>>>>> struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>>>> 
>>>>> The setting to the size field should be done before the first reference
>>>>> to the FAM field.
>>>>> 
>>>>> Such requirement to the user will guarantee that the first reference to
>>>>> the FAM knows the size of the FAM.
>>>>> 
>>>>> We need to add this additional requirement to the user document.
>>>> 
>>>> Make sure the manual is very specific about exactly when size is
>>>> considered to be an accurate representation of the space available for buf
>>>> (given that, after malloc or realloc, it's going to be temporarily
>>>> inaccurate).  If the intent is that inaccurate size at such a time means
>>>> undefined behavior, say so explicitly.
>>> 
>>> Yes, good point. We need to define this clearly in the beginning.
>>> We need to explicit say that
>>> 
>>> the size of the FAM is defined by the latest “counted_by” value. And it’s 
>>> an undefined behavior when the size field is not defined when the FAM is 
>>> referenced.
>>> 
>>> Is the above good enough?
>>> 
>>> 
>>>> 
>>>>> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
>>>>> 
>>>>> In C FE:
>>>>> 
>>>>> for every reference to a FAM, for example, "obj->buf" in the small 
>>>>> example,
>>>>> check whether the corresponding FIELD_DECL has a "counted_by" attribute?
>>>>> if YES, replace the reference to "obj->buf" with a call to
>>>>> .ACCESS_WITH_SIZE (obj->buf, obj->size, -1);
>>>> 
>>>> This seems plausible - but you should also consider the case of static
>>>> initializers - remember the GNU extension for statically allocated objects
>>>> with flexible array members (unless you're not allowing it with
>>>> counted_by).
>>>> 
>>>> static struct A x = { sizeof "hello", "hello" };
>>>> static char *y = 
>>>> 
>>>> I'd expect that to be valid - and unless you say such a usage is invalid,
>>> 
>>> At this moment, I think that this should be valid.
>>> 
>>> I,e, the following:
>>> 
>>> struct A
>>> {
>>> size_t size;
>>> char buf[] __attribute__((counted_by(size)));
>>> };
>>> 
>>> static struct A x = {sizeof "hello", "hello”};
>>> 
>>> Should be valid, and x.size represents the number of elements of x.buf.
>>> Both x.size and x.buf are initialized statically.
>>> 
>>>> you should avoid the replacement in such a static initializer context when
>>>> the FAM reference is to an object with a constant address (if
>>>> .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant
>>>> expression; if it works fine as a constant-address lvalue, then the
>>>> replacement would be OK).
>>> 
>>> Then if such usage for the “counted_by” is valid, we need to replace the FAM
>>> reference by a call to  .ACCESS_WITH_SIZE as well.
>>> Otherwise the “counted_by” relationship will be lost to the Middle end.
>>> 
>>> With the current definition of .ACCESS_WITH_SIZE
>>> 
>>> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
>>> 
>>> Isn’t the PTR (return value of the call) a LVALUE?
>> 
>> You probably want to specify that when a pointer to the array is taken the
>> pointer has to be to the first array eleme

Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-03 Thread Qing Zhao


> On Nov 3, 2023, at 12:30 PM, Jakub Jelinek  wrote:
> 
> On Fri, Nov 03, 2023 at 04:20:57PM +0000, Qing Zhao wrote:
>> So, based on the discussion so far, We will define the .ACCESS_WITH_SIZE as 
>> following:
>> 
>> .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, ACCESS_MODE)
>> 
>> INTERNAL_FN (ACCESS_WITH_SIZE,  ECF_LEAF | ECF_NOTHROW, NULL)
>> 
>> which returns the “REF_TO_OBJ" same as the 1st argument;
>> 
>> 1st argument “REF_TO_OBJ": Reference to the object;
>> 2nd argument “REF_TO_SIZE”:  Reference to size of the object referenced by 
>> the 1st argument, 
>> if the object that the “REF_TO_OBJ” refered has a
>>   * real type, the SIZE that the “REF_TO_SIZE” referred is the number of the 
>> elements of the type;
>>   * void type, the SIZE that the “REF_TO_SIZE” referred is number of bytes; 
> 
> No, you can't do this.  Conversions between pointers are mostly useless in
> GIMPLE, , so you can't make decisions based on TREE_TYPE (TREE_TYPE (fnarg))
> as it could have some random completely unrelated type.
> So, the multiplication factor needs to be encoded in the arguments rather
> than derived from REF_TO_OBJ's type, and similarly the size of what
> REF_TO_SIZE points to needs to be encoded somewhere.

Okay, I see, so 2 more arguments to the new function.

Qing
> 
>> 3rd argument "ACCESS_MODE": 
>> -1: Unknown access semantics
>>  0: none
>>  1: read_only
>>  2: write_only
>>  3: read_write
> 
>   Jakub
> 



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-03 Thread Qing Zhao
So, based on the discussion so far, We will define the .ACCESS_WITH_SIZE as 
following:

 .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, ACCESS_MODE)

INTERNAL_FN (ACCESS_WITH_SIZE,  ECF_LEAF | ECF_NOTHROW, NULL)

which returns the “REF_TO_OBJ" same as the 1st argument;

1st argument “REF_TO_OBJ": Reference to the object;
2nd argument “REF_TO_SIZE”:  Reference to size of the object referenced by the 
1st argument, 
 if the object that the “REF_TO_OBJ” refered has a
   * real type, the SIZE that the “REF_TO_SIZE” referred is the number of the 
elements of the type;
   * void type, the SIZE that the “REF_TO_SIZE” referred is number of bytes; 
3rd argument "ACCESS_MODE": 
 -1: Unknown access semantics
  0: none
  1: read_only
  2: write_only
  3: read_write

NOTEs, 
 A. This new internal function is intended for a more general use from all the 
3 attributes, "access", "alloc_size", and the new "counted_by", to encode the 
"size" and "access_mode" information to the corresponding pointer. (in order to 
resolve PR96503, etc. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503)
 B. For "counted_by" and "alloc_size" attributes, the 3rd argument will be -1.  
 
 C. In this wrieup, we focus on the implementation details for the "counted_by" 
attribute. However, this function should be ready to be used by "access" and 
"alloc_size" without issue. 

Although .ACCESS_WITH_SIZE is not PURE anymore, but it’s only read from the 2nd 
argument, and not modify anything in the pointed objects. So, we can adjust the 
IPA alias analysis phase with this details 
(ref_maybe_used_by_call_p_1/call_may_clobber_ref_p_1).

One more note: only the integer type is allowed for the SIZE, and in 
tree_object_size.cc, all the SIZE
 (in attributes “access”, “alloc_size”, etc) are converted to “sizetype”.  So, 
we don’t need to specify
The type of the size for “REF_TO_SIZE” since it’s always integer types and 
always converted to “sizetype” internally. 

Let me know any more comment or suggestion. 

Qing


On Nov 3, 2023, at 2:32 AM, Martin Uecker  wrote:
> 
> 
> Am Freitag, dem 03.11.2023 um 07:22 +0100 schrieb Jakub Jelinek:
>> On Fri, Nov 03, 2023 at 07:07:36AM +0100, Martin Uecker wrote:
>>> Am Donnerstag, dem 02.11.2023 um 17:28 -0700 schrieb Bill Wendling:
>>>> On Thu, Nov 2, 2023 at 1:36 PM Qing Zhao  wrote:
>>>>> 
>>>>> Thanks a lot for raising these issues.
>>>>> 
>>>>> If I understand correctly,  the major question we need to answer is:
>>>>> 
>>>>> For the following example: (Jakub mentioned this  in an early message)
>>>>> 
>>>>>  1 struct S { int a; char b __attribute__((counted_by (a))) []; };
>>>>>  2 struct S s;
>>>>>  3 s.a = 5;
>>>>>  4 char *p = [2];
>>>>>  5 int i1 = __builtin_dynamic_object_size (p, 0);
>>>>>  6 s.a = 3;
>>>>>  7 int i2 = __builtin_dynamic_object_size (p, 0);
>>>>> 
>>>>> Should the 2nd __bdos call (line 7) get
>>>>>A. the latest value of s.a (line 6) for it’s size?
>>>>> Or  B. the value when the s.b was referenced (line 3, line 4)?
>>>>> 
>>>> I personally think it should be (A). The user is specifically
>>>> indicating that the size has somehow changed, and the compiler should
>>>> behave accordingly.
>>> 
>>> 
>>> One potential problem for A apart from the potential impact on
>>> optimization is that the information may get lost more
>>> easily. Consider:
>>> 
>>> char *p = [2];
>>> f();
>>> int i = __bdos(p, 0);
>>> 
>>> If the compiler can not see into 'f', the information is lost
>>> because f may have changed the size.
>> 
>> Why?  It doesn't really matter.  The options are
>> A. p is at [2] associated with  and int type (or size of int
>>   or whatever); .ACCESS_WITH_SIZE can't be pure, but sure, for aliasing
>>   POV we can describe it with more detail that it doesn't modify anything
>>   in the pointed structure, just escapes the pointer; __bdos can stay
>>   leaf I believe; and when expanding __bdos later on, it would just
>>   dereference the associated pointer at that point (note, __bdos is
>>   pure, so it has vuse but not vdef and can load from memory); if
>>   f changes s.a, no problem, __bdos will load the changed value in there
> 
> Ah, I right. Because of the reload it doesn't matter. 
> Thank you for the explanation!
> 
> Martin
> 
>> B. if .ACCESS_WITH_SIZE associates the pointer with the s.a value from that
>>   point, .ACCESS_WITH_SIZE can be const, but obviously if f changes s.a,
>>   __bdos later will use s.a value from the [2] spot



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-03 Thread Qing Zhao


> On Nov 3, 2023, at 10:46 AM, Jakub Jelinek  wrote:
> 
> On Fri, Nov 03, 2023 at 02:32:04PM +0000, Qing Zhao wrote:
>>> Why?  It doesn't really matter.  The options are
>>> A. p is at [2] associated with  and int type (or size of int
>>>  or whatever); .ACCESS_WITH_SIZE can't be pure,
>> 
>> .ACCESS_WITH_SIZE will only load the size from its address, no any write to 
>> memory.
>> It still can be PURE, right? (It will not be CONST anymore).
> 
> No, it can't be pure.  Because for the IL purposes, it needs to be treated
> as if it saves that address of the counter into some unnamed global variable
> somewhere.

Okay. I see.

>> 
>>> but sure, for aliasing
>>>  POV we can describe it with more detail that it doesn't modify anything
>>>  in the pointed structure, just escapes the pointer;
>> 
>> If we need to do this, where in the gcc code we need to add these details?
> 
> I think ref_maybe_used_by_call_p_1/call_may_clobber_ref_p_1, but Richi is
> expert here.

Just checked these routines, looks like that some other non-pure internal 
functions are handled here too.
For example, 
  case IFN_UBSAN_BOUNDS:
  case IFN_UBSAN_VPTR:
  case IFN_UBSAN_OBJECT_SIZE:
  case IFN_UBSAN_PTR:
  case IFN_ASAN_CHECK:

Looks like the correct place to adjust the new .ACCESS_WITH_SIZE. 
> 
>>> __bdos can stay
>>>  leaf I believe;
>> 
>> That’s good!  (I thought now _bdos will call .ACCESS_WITH_SIZE?)
> 
> No, it shouldn't call it obviously.  If tree-object-size.cc discovery tracks
> something to a pointer initialized by .ACCESS_WITH_SIZE call, then it should
> I believe recurse on the first argument of that call (say if one has
>  ptr_3 = malloc (sz_1);
>  ptr_2 = .ACCESS_WITH_SIZE (ptr_3, _3[4], ...);
> then supposedly __bdos later on should e.g. for 0/1 modes take minimum from
> ptr_3 (the size actually allocated)) and the the counter.

Yes, this is the situation in my mind too. 
I thought this might eliminate the LEAF feature from __bdos. -:) if not, that’s 
good.

Qing
> 
>   Jakub
> 



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-03 Thread Qing Zhao


> On Nov 3, 2023, at 2:22 AM, Jakub Jelinek  wrote:
> 
> On Fri, Nov 03, 2023 at 07:07:36AM +0100, Martin Uecker wrote:
>> Am Donnerstag, dem 02.11.2023 um 17:28 -0700 schrieb Bill Wendling:
>>> On Thu, Nov 2, 2023 at 1:36 PM Qing Zhao  wrote:
>>>> 
>>>> Thanks a lot for raising these issues.
>>>> 
>>>> If I understand correctly,  the major question we need to answer is:
>>>> 
>>>> For the following example: (Jakub mentioned this  in an early message)
>>>> 
>>>>  1 struct S { int a; char b __attribute__((counted_by (a))) []; };
>>>>  2 struct S s;
>>>>  3 s.a = 5;
>>>>  4 char *p = [2];
>>>>  5 int i1 = __builtin_dynamic_object_size (p, 0);
>>>>  6 s.a = 3;
>>>>  7 int i2 = __builtin_dynamic_object_size (p, 0);
>>>> 
>>>> Should the 2nd __bdos call (line 7) get
>>>>A. the latest value of s.a (line 6) for it’s size?
>>>> Or  B. the value when the s.b was referenced (line 3, line 4)?
>>>> 
>>> I personally think it should be (A). The user is specifically
>>> indicating that the size has somehow changed, and the compiler should
>>> behave accordingly.
>> 
>> 
>> One potential problem for A apart from the potential impact on
>> optimization is that the information may get lost more
>> easily. Consider:
>> 
>> char *p = [2];
>> f();
>> int i = __bdos(p, 0);
>> 
>> If the compiler can not see into 'f', the information is lost
>> because f may have changed the size.
> 
> Why?  It doesn't really matter.  The options are
> A. p is at [2] associated with  and int type (or size of int
>   or whatever); .ACCESS_WITH_SIZE can't be pure,

.ACCESS_WITH_SIZE will only load the size from its address, no any write to 
memory.
It still can be PURE, right? (It will not be CONST anymore).

> but sure, for aliasing
>   POV we can describe it with more detail that it doesn't modify anything
>   in the pointed structure, just escapes the pointer;

If we need to do this, where in the gcc code we need to add these details?

> __bdos can stay
>   leaf I believe;

That’s good!  (I thought now _bdos will call .ACCESS_WITH_SIZE?)

Qing

> and when expanding __bdos later on, it would just
>   dereference the associated pointer at that point (note, __bdos is
>   pure, so it has vuse but not vdef and can load from memory); if
>   f changes s.a, no problem, __bdos will load the changed value in there
> B. if .ACCESS_WITH_SIZE associates the pointer with the s.a value from that
>   point, .ACCESS_WITH_SIZE can be const, but obviously if f changes s.a,
>   __bdos later will use s.a value from the [2] spot
> 
>   Jakub
> 



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Qing Zhao



> On Nov 2, 2023, at 8:09 AM, Jakub Jelinek  wrote:
> 
> On Thu, Nov 02, 2023 at 12:52:50PM +0100, Richard Biener wrote:
>>> What I meant is to emit
>>> tmp_4 = .ACCESS_WITH_SIZE ([0], , (typeof ()) 0);
>>> p_5 = _4[2];
>>> i.e. don't associate the pointer with a value of the size, but with
>>> an address where to find the size (plus how large it is), basically escape
>>> pointer to the size at that point.  And __builtin_dynamic_object_size is 
>>> pure,
>>> so supposedly it can depend on what the escaped pointer points to.
>> 
>> Well, yeah - that would work but depend on .ACCESS_WITH_SIZE being an
>> escape point (quite bad IMHO)
> 
> That is why I've said we need to decide what cost we want to suffer because
> of that.
> 
>> and __builtin_dynamic_object_size being
>> non-const (that's probably not too bad).
> 
> It is already pure,leaf,nothrow (unlike __builtin_object_size which is 
> obviously
> const,leaf,nothrow).  Because under the hood, it can read memory when
> expanded.
> 
>>> We'd see that a particular pointer is size associated with  address
>>> and would use that address cast to the type of the third argument (to
>>> preserve the exact pointer type on INTEGER_CST, though not sure, wouldn't
>>> VN CSE it anyway if one has say
>>> union U { struct S { int a; char b __attribute__((counted_by (a))) []; } s;
>>>  struct T { char c, d, e, f; char g __attribute__((counted_by (c))) 
>>> []; } t; };
>>> and
>>> .ACCESS_WITH_SIZE ([0], , (int *) 0);
>>> ...
>>> .ACCESS_WITH_SIZE ([0], , (int *) 0);
>>> ?
>> 
>> We'd probably CSE that - the usual issue of address-with-same-value.
>> 
>>> It would mean though that counted_by wouldn't be allowed to be a
>>> bit-field...
>> 
>> Yup.  We could also pass a pointer to the container though, that's good 
>> enough
>> for the escape, and pass the size by value in addition to that.
> 
> I was wondering about stuff like _BitInt.  But sure, counted_by is just an
> extension, we can just refuse counting by _BitInt in addition to counting by
> floating point, pointers, aggregates, bit-fields, or we could somehow encode
> all the needed type's properties numerically into an integral constant.
> Similarly for alias set (unless it uses 0 for reads).

counted_by currently is limited to INTEGER type. This should resolve this 
issue, right?

Qing
> 
>   Jakub
> 



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Qing Zhao


> On Nov 2, 2023, at 7:52 AM, Richard Biener  wrote:
> 
> On Thu, Nov 2, 2023 at 11:40 AM Jakub Jelinek  wrote:
>> 
>> On Thu, Nov 02, 2023 at 11:18:09AM +0100, Richard Biener wrote:
 Or, if we want to pay further price, .ACCESS_WITH_SIZE could take as one of
 the arguments not the size value, but its address.  Then at __bdos time
 we would dereference that pointer to get the size.
 So,
 struct S { int a; char b __attribute__((counted_by (a))) []; };
 struct S s;
 s.a = 5;
 char *p = [2];
 int i1 = __builtin_dynamic_object_size (p, 0);
 s.a = 3;
 int i2 = __builtin_dynamic_object_size (p, 0);
 would then yield 3 and 1 rather than 3 and 3.
>>> 
>>> I fail to see how we can get the __builtin_dynamic_object_size call
>>> data dependent on s.a, thus avoid re-ordering or even DSE of the
>>> store.
>> 
>> If [2] is lowered as
>> sz_1 = s.a;
>> tmp_2 = .ACCESS_WITH_SIZE ([0], sz_1);
>> p_3 = _2[2];
>> then sure, there is no way, you get the size from that point.
>> tree-object-size.cc tracking then determines that in a particular
>> case the pointer is size associated with sz_1 and use that value
>> as the size (with the usual adjustments for pointer arithmetics and the
>> like).
>> 
>> What I meant is to emit
>> tmp_4 = .ACCESS_WITH_SIZE ([0], , (typeof ()) 0);
>> p_5 = _4[2];
>> i.e. don't associate the pointer with a value of the size, but with
>> an address where to find the size (plus how large it is), basically escape
>> pointer to the size at that point.  And __builtin_dynamic_object_size is 
>> pure,
>> so supposedly it can depend on what the escaped pointer points to.
> 
> Well, yeah - that would work but depend on .ACCESS_WITH_SIZE being an
> escape point (quite bad IMHO) and __builtin_dynamic_object_size being
> non-const (that's probably not too bad).
> 
>> We'd see that a particular pointer is size associated with  address
>> and would use that address cast to the type of the third argument (to
>> preserve the exact pointer type on INTEGER_CST, though not sure, wouldn't
>> VN CSE it anyway if one has say
>> union U { struct S { int a; char b __attribute__((counted_by (a))) []; } s;
>>  struct T { char c, d, e, f; char g __attribute__((counted_by (c))) 
>> []; } t; };
>> and
>> .ACCESS_WITH_SIZE ([0], , (int *) 0);
>> ...
>> .ACCESS_WITH_SIZE ([0], , (int *) 0);
>> ?
> 
> We'd probably CSE that - the usual issue of address-with-same-value.
> 
>> It would mean though that counted_by wouldn't be allowed to be a
>> bit-field...
> 
> Yup.  We could also pass a pointer to the container though, that's good enough
> for the escape, and pass the size by value in addition to that.
Could you explain a little bit more here? Then the .ACCESS_WITH_SIZE will become

PTR = .ACCESS_WITH_SIZE (PTR, ’s Container, SIZE, ACCESS_MODE)

??

> 
>>Jakub
>> 



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Qing Zhao
Thanks a lot for raising these issues. 

If I understand correctly,  the major question we need to answer is:

For the following example: (Jakub mentioned this  in an early message)

  1 struct S { int a; char b __attribute__((counted_by (a))) []; };
  2 struct S s;
  3 s.a = 5;
  4 char *p = [2];
  5 int i1 = __builtin_dynamic_object_size (p, 0);
  6 s.a = 3;
  7 int i2 = __builtin_dynamic_object_size (p, 0);

Should the 2nd __bdos call (line 7) get
A. the latest value of s.a (line 6) for it’s size? 
Or  B. the value when the s.b was referenced (line 3, line 4)?

A should be more convenient for the user to use the dynamic array feature.
With B, the user has to modify the source code (to add code to “re-obtain” 
the pointer after the size was adjusted at line 6) as mentioned by Richard. 

This depends on how we design the new internal function .ACCESS_WITH_SIZE

1. Size is passed by value to .ACCESS_WITH_SIZE as we currently designed. 

PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)

2. Size is passed by reference to .ACCESS_WITH_SIZE as Jakub suggested.

PTR = .ACCESS_WITH_SIZE(PTR, , TYPEOFSIZE, ACCESS_MODE)

With 1, We can only provide B, the user needs to modify the source code to get 
the full feature of dynamic array;
With 2, We can provide  A, the user will get full support to the dynamic array 
without restrictions in the source code. 

However, We have to pay additional cost for supporting A by using 2, which 
includes:

1. .ACCESS_WITH_SIZE will become an escape point, which will further impact the 
IPA optimizations, more runtime overhead. 
Then .ACCESS_WTH_SIZE will not be CONST, right? But it will still be PURE?

2. __builtin_dynamic_object_size will NOT be LEAF anymore.  This will also 
impact some IPA optimizations, more runtime overhead. 

I think the following are the factors that make the decision:

1. How big the performance impact?
2. How important the dynamic array feature? Is adding some user restrictions as 
Richard mentioned feasible to support this feature?

Maybe we can implement 1 first, if the full support to the dynamic array is 
needed, we can add 2 then? 
Or, we can implement both, and compare the performance difference, then decide?

Qing




> On Nov 2, 2023, at 8:09 AM, Jakub Jelinek  wrote:
> 
> On Thu, Nov 02, 2023 at 12:52:50PM +0100, Richard Biener wrote:
>>> What I meant is to emit
>>> tmp_4 = .ACCESS_WITH_SIZE ([0], , (typeof ()) 0);
>>> p_5 = _4[2];
>>> i.e. don't associate the pointer with a value of the size, but with
>>> an address where to find the size (plus how large it is), basically escape
>>> pointer to the size at that point.  And __builtin_dynamic_object_size is 
>>> pure,
>>> so supposedly it can depend on what the escaped pointer points to.
>> 
>> Well, yeah - that would work but depend on .ACCESS_WITH_SIZE being an
>> escape point (quite bad IMHO)
> 
> That is why I've said we need to decide what cost we want to suffer because
> of that.
> 
>> and __builtin_dynamic_object_size being
>> non-const (that's probably not too bad).
> 
> It is already pure,leaf,nothrow (unlike __builtin_object_size which is 
> obviously
> const,leaf,nothrow).  Because under the hood, it can read memory when
> expanded.
> 
>>> We'd see that a particular pointer is size associated with  address
>>> and would use that address cast to the type of the third argument (to
>>> preserve the exact pointer type on INTEGER_CST, though not sure, wouldn't
>>> VN CSE it anyway if one has say
>>> union U { struct S { int a; char b __attribute__((counted_by (a))) []; } s;
>>>  struct T { char c, d, e, f; char g __attribute__((counted_by (c))) 
>>> []; } t; };
>>> and
>>> .ACCESS_WITH_SIZE ([0], , (int *) 0);
>>> ...
>>> .ACCESS_WITH_SIZE ([0], , (int *) 0);
>>> ?
>> 
>> We'd probably CSE that - the usual issue of address-with-same-value.
>> 
>>> It would mean though that counted_by wouldn't be allowed to be a
>>> bit-field...
>> 
>> Yup.  We could also pass a pointer to the container though, that's good 
>> enough
>> for the escape, and pass the size by value in addition to that.
> 
> I was wondering about stuff like _BitInt.  But sure, counted_by is just an
> extension, we can just refuse counting by _BitInt in addition to counting by
> floating point, pointers, aggregates, bit-fields, or we could somehow encode
> all the needed type's properties numerically into an integral constant.
> Similarly for alias set (unless it uses 0 for reads).
> 
>   Jakub
> 



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Qing Zhao


> On Nov 2, 2023, at 9:54 AM, Richard Biener  wrote:
> 
> On Thu, Nov 2, 2023 at 2:50 PM Qing Zhao  wrote:
>> 
>> 
>> 
>>> On Nov 2, 2023, at 3:57 AM, Richard Biener  
>>> wrote:
>>> 
>>> On Wed, Nov 1, 2023 at 3:47 PM Qing Zhao  wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Oct 31, 2023, at 6:14 PM, Joseph Myers  wrote:
>>>>> 
>>>>> On Tue, 31 Oct 2023, Qing Zhao wrote:
>>>>> 
>>>>>> 2.3 A new semantic requirement in the user documentation of "counted_by"
>>>>>> 
>>>>>> For the following structure including a FAM with a counted_by attribute:
>>>>>> 
>>>>>> struct A
>>>>>> {
>>>>>> size_t size;
>>>>>> char buf[] __attribute__((counted_by(size)));
>>>>>> };
>>>>>> 
>>>>>> for any object with such type:
>>>>>> 
>>>>>> struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>>>>> 
>>>>>> The setting to the size field should be done before the first reference
>>>>>> to the FAM field.
>>>>>> 
>>>>>> Such requirement to the user will guarantee that the first reference to
>>>>>> the FAM knows the size of the FAM.
>>>>>> 
>>>>>> We need to add this additional requirement to the user document.
>>>>> 
>>>>> Make sure the manual is very specific about exactly when size is
>>>>> considered to be an accurate representation of the space available for buf
>>>>> (given that, after malloc or realloc, it's going to be temporarily
>>>>> inaccurate).  If the intent is that inaccurate size at such a time means
>>>>> undefined behavior, say so explicitly.
>>>> 
>>>> Yes, good point. We need to define this clearly in the beginning.
>>>> We need to explicit say that
>>>> 
>>>> the size of the FAM is defined by the latest “counted_by” value. And it’s 
>>>> an undefined behavior when the size field is not defined when the FAM is 
>>>> referenced.
>>>> 
>>>> Is the above good enough?
>>>> 
>>>> 
>>>>> 
>>>>>> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
>>>>>> 
>>>>>> In C FE:
>>>>>> 
>>>>>> for every reference to a FAM, for example, "obj->buf" in the small 
>>>>>> example,
>>>>>> check whether the corresponding FIELD_DECL has a "counted_by" attribute?
>>>>>> if YES, replace the reference to "obj->buf" with a call to
>>>>>>.ACCESS_WITH_SIZE (obj->buf, obj->size, -1);
>>>>> 
>>>>> This seems plausible - but you should also consider the case of static
>>>>> initializers - remember the GNU extension for statically allocated objects
>>>>> with flexible array members (unless you're not allowing it with
>>>>> counted_by).
>>>>> 
>>>>> static struct A x = { sizeof "hello", "hello" };
>>>>> static char *y = 
>>>>> 
>>>>> I'd expect that to be valid - and unless you say such a usage is invalid,
>>>> 
>>>> At this moment, I think that this should be valid.
>>>> 
>>>> I,e, the following:
>>>> 
>>>> struct A
>>>> {
>>>> size_t size;
>>>> char buf[] __attribute__((counted_by(size)));
>>>> };
>>>> 
>>>> static struct A x = {sizeof "hello", "hello”};
>>>> 
>>>> Should be valid, and x.size represents the number of elements of x.buf.
>>>> Both x.size and x.buf are initialized statically.
>>>> 
>>>>> you should avoid the replacement in such a static initializer context when
>>>>> the FAM reference is to an object with a constant address (if
>>>>> .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant
>>>>> expression; if it works fine as a constant-address lvalue, then the
>>>>> replacement would be OK).
>>>> 
>>>> Then if such usage for the “counted_by” is valid, we need to replace the 
>>>> FAM
>>>> reference by a call to  .ACC

Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Qing Zhao


> On Nov 2, 2023, at 3:57 AM, Richard Biener  wrote:
> 
> On Wed, Nov 1, 2023 at 3:47 PM Qing Zhao  wrote:
>> 
>> 
>> 
>>> On Oct 31, 2023, at 6:14 PM, Joseph Myers  wrote:
>>> 
>>> On Tue, 31 Oct 2023, Qing Zhao wrote:
>>> 
>>>> 2.3 A new semantic requirement in the user documentation of "counted_by"
>>>> 
>>>> For the following structure including a FAM with a counted_by attribute:
>>>> 
>>>> struct A
>>>> {
>>>>  size_t size;
>>>>  char buf[] __attribute__((counted_by(size)));
>>>> };
>>>> 
>>>> for any object with such type:
>>>> 
>>>> struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>>> 
>>>> The setting to the size field should be done before the first reference
>>>> to the FAM field.
>>>> 
>>>> Such requirement to the user will guarantee that the first reference to
>>>> the FAM knows the size of the FAM.
>>>> 
>>>> We need to add this additional requirement to the user document.
>>> 
>>> Make sure the manual is very specific about exactly when size is
>>> considered to be an accurate representation of the space available for buf
>>> (given that, after malloc or realloc, it's going to be temporarily
>>> inaccurate).  If the intent is that inaccurate size at such a time means
>>> undefined behavior, say so explicitly.
>> 
>> Yes, good point. We need to define this clearly in the beginning.
>> We need to explicit say that
>> 
>> the size of the FAM is defined by the latest “counted_by” value. And it’s an 
>> undefined behavior when the size field is not defined when the FAM is 
>> referenced.
>> 
>> Is the above good enough?
>> 
>> 
>>> 
>>>> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
>>>> 
>>>> In C FE:
>>>> 
>>>> for every reference to a FAM, for example, "obj->buf" in the small example,
>>>> check whether the corresponding FIELD_DECL has a "counted_by" attribute?
>>>> if YES, replace the reference to "obj->buf" with a call to
>>>> .ACCESS_WITH_SIZE (obj->buf, obj->size, -1);
>>> 
>>> This seems plausible - but you should also consider the case of static
>>> initializers - remember the GNU extension for statically allocated objects
>>> with flexible array members (unless you're not allowing it with
>>> counted_by).
>>> 
>>> static struct A x = { sizeof "hello", "hello" };
>>> static char *y = 
>>> 
>>> I'd expect that to be valid - and unless you say such a usage is invalid,
>> 
>> At this moment, I think that this should be valid.
>> 
>> I,e, the following:
>> 
>> struct A
>> {
>> size_t size;
>> char buf[] __attribute__((counted_by(size)));
>> };
>> 
>> static struct A x = {sizeof "hello", "hello”};
>> 
>> Should be valid, and x.size represents the number of elements of x.buf.
>> Both x.size and x.buf are initialized statically.
>> 
>>> you should avoid the replacement in such a static initializer context when
>>> the FAM reference is to an object with a constant address (if
>>> .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant
>>> expression; if it works fine as a constant-address lvalue, then the
>>> replacement would be OK).
>> 
>> Then if such usage for the “counted_by” is valid, we need to replace the FAM
>> reference by a call to  .ACCESS_WITH_SIZE as well.
>> Otherwise the “counted_by” relationship will be lost to the Middle end.
>> 
>> With the current definition of .ACCESS_WITH_SIZE
>> 
>> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
>> 
>> Isn’t the PTR (return value of the call) a LVALUE?
> 
> You probably want to specify that when a pointer to the array is taken the
> pointer has to be to the first array element (or do we want to mangle the
> 'size' accordingly for the instrumentation?).

Yes. Will add this into the user documentation.

>  You also want to specify that
> the 'size' associated with such pointer is assumed to be unchanging and
> after changing the size such pointer has to be re-obtained.

What do you mean by “re-obtained”? 

>  Plus that
> changes to the allocated object/size have to be performed through an
> lvalue where the containing type and thus the 'counted_by' attribute is
> visible.

Through an lvalue with the containing type?

Yes, will add this too. 


>  That is,
> 
> size_t *s = 
> *s = 1;
> 
> is invoking undefined behavior,

right.

> likewise modifying 'buf' (makes it a bit
> awkward since for example that wouldn't support using posix_memalign
> for allocation, though aligned_alloc would be fine).
Is there a small example for the undefined behavior for this?

Qing
> 
> Richard.
> 
>> Qing
>>> 
>>> --
>>> Joseph S. Myers
>>> jos...@codesourcery.com
>> 



Help: which routine in C FE I should look at for the reference to a FAM field?

2023-11-01 Thread Qing Zhao
Joseph and Martin,

For the task to replace every reference to a FAM field with an call to 
.ACCESS_WITH_SIZE, 
Where in the C FE I should look at?

Thanks a lot for the help.


Qing

Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-01 Thread Qing Zhao


> On Nov 1, 2023, at 11:00 AM, Martin Uecker  wrote:
> 
> Am Mittwoch, dem 01.11.2023 um 14:47 + schrieb Qing Zhao:
>> 
>>> On Oct 31, 2023, at 6:14 PM, Joseph Myers  wrote:
>>> 
>>> On Tue, 31 Oct 2023, Qing Zhao wrote:
>>> 
>>>> 2.3 A new semantic requirement in the user documentation of "counted_by"
>>>> 
>>>> For the following structure including a FAM with a counted_by attribute:
>>>> 
>>>> struct A
>>>> {
>>>>  size_t size;
>>>>  char buf[] __attribute__((counted_by(size)));
>>>> };
>>>> 
>>>> for any object with such type:
>>>> 
>>>> struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>>> 
>>>> The setting to the size field should be done before the first reference 
>>>> to the FAM field.
>>>> 
>>>> Such requirement to the user will guarantee that the first reference to 
>>>> the FAM knows the size of the FAM.
>>>> 
>>>> We need to add this additional requirement to the user document.
>>> 
>>> Make sure the manual is very specific about exactly when size is 
>>> considered to be an accurate representation of the space available for buf 
>>> (given that, after malloc or realloc, it's going to be temporarily 
>>> inaccurate).  If the intent is that inaccurate size at such a time means 
>>> undefined behavior, say so explicitly.
>> 
>> Yes, good point. We need to define this clearly in the beginning. 
>> We need to explicit say that 
>> 
>> the size of the FAM is defined by the latest “counted_by” value. And it’s an 
>> undefined behavior when the size field is not defined when the FAM is 
>> referenced.
> 
> It is defined by the latest "counted_by" value before x.buf
> is referenced, but not the latest before x.buf is dereferenced.

Then:

The size of the FAM is defined by the latest “counted_by” value before the FAM 
is referenced. 
It’s an undefined behavior when the “counted_by” value is not initialized 
before the FAM is referenced. 

> 
>> 
>> Is the above good enough?
>> 
>> 
>>> 
>>>> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
>>>> 
>>>> In C FE:
>>>> 
>>>> for every reference to a FAM, for example, "obj->buf" in the small example,
>>>> check whether the corresponding FIELD_DECL has a "counted_by" attribute?
>>>> if YES, replace the reference to "obj->buf" with a call to
>>>> .ACCESS_WITH_SIZE (obj->buf, obj->size, -1); 
>>> 
>>> This seems plausible - but you should also consider the case of static 
>>> initializers - remember the GNU extension for statically allocated objects 
>>> with flexible array members (unless you're not allowing it with 
>>> counted_by).
>>> 
>>> static struct A x = { sizeof "hello", "hello" };
>>> static char *y = 
>>> 
>>> I'd expect that to be valid - and unless you say such a usage is invalid, 
>> 
>> At this moment, I think that this should be valid.
>> 
>> I,e, the following:
>> 
>> struct A
>> {
>> size_t size;
>> char buf[] __attribute__((counted_by(size)));
>> };
>> 
>> static struct A x = {sizeof "hello", "hello”};
>> 
>> Should be valid, and x.size represents the number of elements of x.buf. 
>> Both x.size and x.buf are initialized statically. 
> 
> Joseph is talking about the compile-time initialization of y.

Okay, -:) 
so, this is the point where the x.buf is referenced,
 and I think that replacing this reference to a call to .ACCESS_WITH_SIZE is 
still needed.
Otherwise, the “counted_by” relationship will NOT be seen by the middle-end 
anymore.


> 
>> 
>>> you should avoid the replacement in such a static initializer context when 
>>> the FAM reference is to an object with a constant address (if 
>>> .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant 
>>> expression; if it works fine as a constant-address lvalue, then the 
>>> replacement would be OK).
>> 
>> Then if such usage for the “counted_by” is valid, we need to replace the FAM 
>> reference by a call to  .ACCESS_WITH_SIZE as well.
>> Otherwise the “counted_by” relationship will be lost to the Middle end. 
>> 
>> With the current definition of .ACCESS_WITH_SIZE
>> 
>> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
>> 
>> Isn’t the PTR (return value of the call) a LVALUE? 
> 
> The question is whether we get an address constant
> that can be used for compile-time initialization.

Oh, I see.

So, now, PTR is already an constant at FE, the replacement will be

.ACCESS_WITH_SIZE( CONSTANT_ADDRESS, SIZE, ACCESS_MODE)

This looks awkward….
Should we allow this?

If not allowed, then the “counted_by” attribute will not work for the static 
initialization. 

> 
> I think it would be good to collect a list of test
> cases and to include this example.

Yes, I will put this into the testing case list.

Qing
> 
> Martin
> 
>> 
>> Qing
>>> 
>>> -- 
>>> Joseph S. Myers
>>> jos...@codesourcery.com



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-01 Thread Qing Zhao


> On Oct 31, 2023, at 6:14 PM, Joseph Myers  wrote:
> 
> On Tue, 31 Oct 2023, Qing Zhao wrote:
> 
>> 2.3 A new semantic requirement in the user documentation of "counted_by"
>> 
>> For the following structure including a FAM with a counted_by attribute:
>> 
>>  struct A
>>  {
>>   size_t size;
>>   char buf[] __attribute__((counted_by(size)));
>>  };
>> 
>> for any object with such type:
>> 
>>  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>> 
>> The setting to the size field should be done before the first reference 
>> to the FAM field.
>> 
>> Such requirement to the user will guarantee that the first reference to 
>> the FAM knows the size of the FAM.
>> 
>> We need to add this additional requirement to the user document.
> 
> Make sure the manual is very specific about exactly when size is 
> considered to be an accurate representation of the space available for buf 
> (given that, after malloc or realloc, it's going to be temporarily 
> inaccurate).  If the intent is that inaccurate size at such a time means 
> undefined behavior, say so explicitly.

Yes, good point. We need to define this clearly in the beginning. 
We need to explicit say that 

the size of the FAM is defined by the latest “counted_by” value. And it’s an 
undefined behavior when the size field is not defined when the FAM is 
referenced.

Is the above good enough?


> 
>> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
>> 
>> In C FE:
>> 
>> for every reference to a FAM, for example, "obj->buf" in the small example,
>>  check whether the corresponding FIELD_DECL has a "counted_by" attribute?
>>  if YES, replace the reference to "obj->buf" with a call to
>>  .ACCESS_WITH_SIZE (obj->buf, obj->size, -1); 
> 
> This seems plausible - but you should also consider the case of static 
> initializers - remember the GNU extension for statically allocated objects 
> with flexible array members (unless you're not allowing it with 
> counted_by).
> 
> static struct A x = { sizeof "hello", "hello" };
> static char *y = 
> 
> I'd expect that to be valid - and unless you say such a usage is invalid, 

At this moment, I think that this should be valid.

I,e, the following:

struct A
{
 size_t size;
 char buf[] __attribute__((counted_by(size)));
};

static struct A x = {sizeof "hello", "hello”};

Should be valid, and x.size represents the number of elements of x.buf. 
Both x.size and x.buf are initialized statically. 

> you should avoid the replacement in such a static initializer context when 
> the FAM reference is to an object with a constant address (if 
> .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant 
> expression; if it works fine as a constant-address lvalue, then the 
> replacement would be OK).

Then if such usage for the “counted_by” is valid, we need to replace the FAM 
reference by a call to  .ACCESS_WITH_SIZE as well.
Otherwise the “counted_by” relationship will be lost to the Middle end. 

With the current definition of .ACCESS_WITH_SIZE

PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)

Isn’t the PTR (return value of the call) a LVALUE? 

Qing
> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-10-31 Thread Qing Zhao


> On Oct 31, 2023, at 1:35 PM, Siddhesh Poyarekar  wrote:
> 
> On 2023-10-31 12:26, Qing Zhao wrote:
>> Hi,
>> I wrote a summary based on our extensive discussion, hopefully this can be 
>> served as an informal proposal.
>> Please take a look at it and let me know any comment or suggestion.
>> There are some (???) in the section 3.2 and 3.6, those are my questions 
>> seeking for help.  -:)
>> Thanks again for all the help.
>> Qing.
>> 
>> Represent the missing dependence for the "counted_by" attribute and its 
>> consumers
>> Qing Zhao
>> 10/30/2023
>> ==
>> The whole discussion is at:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633783.html
>> 1. The problem
>> There is a data dependency between the size assignment and the implicit use 
>> of the size information in the __builtin_dynamic_object_size that is missing 
>> in the IL (line 11 and line 13 in the below example). Such information 
>> missing will result incorrect code reordering and other code transformations.
>>   1 struct A
>>   2 {
>>   3  size_t size;
>>   4  char buf[] __attribute__((counted_by(size)));
>>   5 };
>>   6
>>   7 size_t
>>   8 foo (size_t sz)
>>   9 {
>>  10  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>  11  obj->size = sz;
>>  12  obj->buf[0] = 2;
>>  13  return __builtin_dynamic_object_size (obj->buf, 1);
>>  14 }
>>   Please see a more complicate example in the Appendex 1.
>> We need to represent such data dependency correctly in the IL.
>> 2. The solution:
>> 2.1 Summary
>> * Add a new internal function "ACCESS_WITH_SIZE" to carry the size 
>> information for every FAM field access;
>> * In C FE, Replace every FAM field access whose TYPE has the "counted_by" 
>> attribute with the new internal function "ACCESS_WITH_SIZE";
>> * In every consumer of the size information, for example, BDOS or array 
>> bound sanitizer, query the size information or ACCESS_MODE information from 
>> the new internal function;
>> * When the size information and the "ACCESS_MODE" information are not used 
>> anymore, possibly at the 2nd object size phase, replace the internal 
>> function with the actual FAM field access;
>> * Some adjustment to inlining heuristic and some SSA passes to mitigate the 
>> impact to the optimizer and code generation.
>> 2.2 The new internal function
>>   .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
>> INTERNAL_FN (ACCESS_WITH_SIZE, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
>> which returns the "PTR" same as the 1st argument;
>> 1st argument "PTR": Pointer to the object;
>> 2nd argument "SIZE": The size of the pointed object,
>>   if the pointee of the "PTR" has a
>> * real type, it's the number of the elements of the type;
>> * void type, it's the number of bytes;
>> 3rd argument "ACCESS_MODE":
>>   -1: Unknown access semantics
>>0: none
>>1: read_only
>>2: write_only
>>3: read_write
>> NOTEs,
>>   A. This new internal function is intended for a more general use from all 
>> the 3 attributes, "access", "alloc_size", and the new "counted_by", to 
>> encode the "size" and "access_mode" information to the corresponding 
>> pointer. (in order to resolve PR96503, etc. 
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503)
>>   B. For "counted_by" and "alloc_size" attributes, the 3rd argument will be 
>> -1.
>>   C. In this wrieup, we focus on the implementation details for the 
>> "counted_by" attribute. However, this function should be ready to be used by 
>> "access" and "alloc_size" without issue.
>> 2.3 A new semantic requirement in the user documentation of "counted_by"
>> For the following structure including a FAM with a counted_by attribute:
>>   struct A
>>   {
>>size_t size;
>>char buf[] __attribute__((counted_by(size)));
>>   };
>> for any object with such type:
>>   struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>> The setting to the size field should be done before the first reference to 
>> the FAM field.
> 
> A more flexible specification could be stating that validation for a 
> reference to the FAM field will use the latest value assigned to the size

RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-10-31 Thread Qing Zhao
Hi, 

I wrote a summary based on our extensive discussion, hopefully this can be 
served as an informal proposal. 

Please take a look at it and let me know any comment or suggestion.

There are some (???) in the section 3.2 and 3.6, those are my questions seeking 
for help.  -:)

Thanks again for all the help.

Qing.


Represent the missing dependence for the "counted_by" attribute and its 
consumers 

Qing Zhao

10/30/2023
==

The whole discussion is at:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633783.html

1. The problem

There is a data dependency between the size assignment and the implicit use of 
the size information in the __builtin_dynamic_object_size that is missing in 
the IL (line 11 and line 13 in the below example). Such information missing 
will result incorrect code reordering and other code transformations. 

  1 struct A
  2 {
  3  size_t size;
  4  char buf[] __attribute__((counted_by(size)));
  5 };
  6 
  7 size_t 
  8 foo (size_t sz)
  9 {
 10  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
 11  obj->size = sz;
 12  obj->buf[0] = 2;
 13  return __builtin_dynamic_object_size (obj->buf, 1);
 14 }
  
Please see a more complicate example in the Appendex 1.

We need to represent such data dependency correctly in the IL. 

2. The solution:

2.1 Summary

* Add a new internal function "ACCESS_WITH_SIZE" to carry the size information 
for every FAM field access;
* In C FE, Replace every FAM field access whose TYPE has the "counted_by" 
attribute with the new internal function "ACCESS_WITH_SIZE";
* In every consumer of the size information, for example, BDOS or array bound 
sanitizer, query the size information or ACCESS_MODE information from the new 
internal function;
* When the size information and the "ACCESS_MODE" information are not used 
anymore, possibly at the 2nd object size phase, replace the internal function 
with the actual FAM field access; 
* Some adjustment to inlining heuristic and some SSA passes to mitigate the 
impact to the optimizer and code generation. 

2.2 The new internal function 

  .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)

INTERNAL_FN (ACCESS_WITH_SIZE, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)

which returns the "PTR" same as the 1st argument;

1st argument "PTR": Pointer to the object;
2nd argument "SIZE": The size of the pointed object, 
  if the pointee of the "PTR" has a
* real type, it's the number of the elements of the type;
* void type, it's the number of bytes; 
3rd argument "ACCESS_MODE": 
  -1: Unknown access semantics
   0: none
   1: read_only
   2: write_only
   3: read_write

NOTEs, 
  A. This new internal function is intended for a more general use from all the 
3 attributes, "access", "alloc_size", and the new "counted_by", to encode the 
"size" and "access_mode" information to the corresponding pointer. (in order to 
resolve PR96503, etc. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503)
  B. For "counted_by" and "alloc_size" attributes, the 3rd argument will be -1. 
  
  C. In this wrieup, we focus on the implementation details for the 
"counted_by" attribute. However, this function should be ready to be used by 
"access" and "alloc_size" without issue. 

2.3 A new semantic requirement in the user documentation of "counted_by"

For the following structure including a FAM with a counted_by attribute:

  struct A
  {
   size_t size;
   char buf[] __attribute__((counted_by(size)));
  };

for any object with such type:

  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));

The setting to the size field should be done before the first reference to the 
FAM field.

Such requirement to the user will guarantee that the first reference to the FAM 
knows the size of the FAM.  

We need to add this additional requirement to the user document.

2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE

In C FE:

for every reference to a FAM, for example, "obj->buf" in the small example,
  check whether the corresponding FIELD_DECL has a "counted_by" attribute?
  if YES, replace the reference to "obj->buf" with a call to
  .ACCESS_WITH_SIZE (obj->buf, obj->size, -1); 

2.5 Query the size info 

There are multiple consumers of the size info (and ACCESS_MODE info):

  * __builtin_dynamic_object_size;
  * array bound sanitizer;

in these consumers, get the size info from the 2nd argument of the call to
ACCESS_WITH_SIZE (PTR, SIZE, -1)

2.6 Eliminate the internal function when not useful anymore

After the last consumer of the size information in the ACCESS_WITH_SIZE, We 
should replace the internal call with it

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-27 Thread Qing Zhao
Okay, thanks for the explanation.
We will keep this in mind.

Qing

> On Oct 27, 2023, at 1:19 PM, Kees Cook  wrote:
> 
> On Fri, Oct 27, 2023 at 03:10:22PM +0000, Qing Zhao wrote:
>> Since  the dynamic array support is quite important to the kernel (is this 
>> true, Kees? ),
>> We might need to include such support into our design in the beginning. 
> 
> tl;dr: We don't need "dynamic array support" in the 1st version of 
> __counted_by
> 
> I'm not sure it's as strong as "quite important", but it is a code
> pattern that exists. The vast majority of FAM usage is run-time fixed,
> in the sense that the allocation matches the usage. Only sometimes do we
> over-allocate and then slowly fill it up like I've shown.
> 
> So really my thoughts on this are to bring light to the usage pattern
> in the hopes that we don't make it an impossible thing to do. And if
> it's a limitation of the initial version of __counted_by, the kernel can
> still use it: it will just need to use __counted_by strictly for
> allocation sizes, not "usage" size:
> 
> struct foo {
>   int allocated;
>   int used;
>   int array[] __counted_by(allocated); // would nice to use "used"
> };
> 
>   struct foo *p;
> 
>   p = alloc(sizeof(*p) + sizeof(*p->array) * max_items);
>   p->allocated = max_items;
>   p->used = 0;
> 
>   while (data_available())
>   p->array[++p->used] = next_datum();
> 
> With this, we'll still catch p->array accesses beyond "allocated",
> but other code in the kernel won't catch "invalid data" accesses for
> p->array beyond "used". (i.e. we still have memory corruption protection,
> just not logic error protection.)
> 
> We can deal with aliasing in the future if we want to expand to catching
> logic errors.
> 
> I should not that we don't get logic error protection from things like
> ARM's Memory Tagging Extension either -- it only tracks allocation size
> (and is very expensive to change as the "used" part of an allocation
> grows), so this isn't an unreasonable condition for __counted_by to
> require as well.
> 
> -- 
> Kees Cook



Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-27 Thread Qing Zhao
About where we should insert the new __builtin_with_access_and_size:

> On Oct 26, 2023, at 2:54 PM, Qing Zhao  wrote:
> 
> 
> 
>> On Oct 26, 2023, at 10:05 AM, Richard Biener  
>> wrote:
>> 
>> 
>> 
>>> Am 26.10.2023 um 12:14 schrieb Martin Uecker :
>>> 
>>> Am Donnerstag, dem 26.10.2023 um 11:20 +0200 schrieb Martin Uecker:
>>>>> Am Donnerstag, dem 26.10.2023 um 10:45 +0200 schrieb Richard Biener:
>>>>> On Wed, Oct 25, 2023 at 8:16 PM Martin Uecker  wrote:
>>>>>> 
>>>>>> Am Mittwoch, dem 25.10.2023 um 13:13 +0200 schrieb Richard Biener:
>>>>>>> 
>>>>>>>> Am 25.10.2023 um 12:47 schrieb Martin Uecker :
>>>>>>>> 
>>>>>>>> Am Mittwoch, dem 25.10.2023 um 06:25 -0400 schrieb Siddhesh Poyarekar:
>>>>>>>>>> On 2023-10-25 04:16, Martin Uecker wrote:
>>>>>>>>>> Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
>>>>>>>>>>> 
>>>>>>>>>>>> Am 24.10.2023 um 22:38 schrieb Martin Uecker :
>>>>>>>>>>>> 
>>>>>>>>>>>> Am Dienstag, dem 24.10.2023 um 20:30 + schrieb Qing Zhao:
>>>>>>>>>>>>> Hi, Sid,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Really appreciate for your example and detailed explanation. Very 
>>>>>>>>>>>>> helpful.
>>>>>>>>>>>>> I think that this example is an excellent example to show 
>>>>>>>>>>>>> (almost) all the issues we need to consider.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I slightly modified this example to make it to be compilable and 
>>>>>>>>>>>>> run-able, as following:
>>>>>>>>>>>>> (but I still cannot make the incorrect reordering or DSE 
>>>>>>>>>>>>> happening, anyway, the potential reordering possibility is there…)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 1 #include 
>>>>>>>>>>>>> 2 struct A
>>>>>>>>>>>>> 3 {
>>>>>>>>>>>>> 4  size_t size;
>>>>>>>>>>>>> 5  char buf[] __attribute__((counted_by(size)));
>>>>>>>>>>>>> 6 };
>>>>>>>>>>>>> 7
>>>>>>>>>>>>> 8 static size_t
>>>>>>>>>>>>> 9 get_size_from (void *ptr)
>>>>>>>>>>>>> 10 {
>>>>>>>>>>>>> 11  return __builtin_dynamic_object_size (ptr, 1);
>>>>>>>>>>>>> 12 }
>>>>>>>>>>>>> 13
>>>>>>>>>>>>> 14 void
>>>>>>>>>>>>> 15 foo (size_t sz)
>>>>>>>>>>>>> 16 {
>>>>>>>>>>>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * 
>>>>>>>>>>>>> sizeof(char));
>>>>>>>>>>>>> 18  obj->size = sz;
>>>>>>>>>>>>> 19  obj->buf[0] = 2;
>>>>>>>>>>>>> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>>>>>>>>>>>>> 21  return;
>>>>>>>>>>>>> 22 }
>>>>>>>>>>>>> 23
>>>>>>>>>>>>> 24 int main ()
>>>>>>>>>>>>> 25 {
>>>>>>>>>>>>> 26  foo (20);
>>>>>>>>>>>>> 27  return 0;
>>>>>>>>>>>>> 28 }
>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>>> When it’s set I suppose.  Turn
>>>>>>>>>>> 
>>>>>>>>>>> X.l = n;
>>>>>>>>>>> 
>>>>>>>>>>> Into
>>>>>>>>>&g

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-27 Thread Qing Zhao


> On Oct 27, 2023, at 10:53 AM, Martin Uecker  wrote:
> 
> Am Freitag, dem 27.10.2023 um 14:32 + schrieb Qing Zhao:
>> 
>>> On Oct 27, 2023, at 3:21 AM, Martin Uecker  wrote:
>>> 
>>> Am Donnerstag, dem 26.10.2023 um 19:57 + schrieb Qing Zhao:
>>>> I guess that what Kees wanted, ""fill the array without knowing the actual 
>>>> final size" code pattern”, as following:
>>>> 
>>>>>>  struct foo *f;
>>>>>>  char *p;
>>>>>>  int i;
>>>>>> 
>>>>>>  f = alloc(maximum_possible);
>>>>>>  f->count = 0;
>>>>>>  p = f->buf;
>>>>>> 
>>>>>>  for (i; data_is_available() && i < maximum_possible; i++) {
>>>>>>  f->count ++;
>>>>>>  p[i] = next_data_item();
>>>>>>  }
>>>> 
>>>> actually is a dynamic array, or more accurately, Bounded-size dynamic 
>>>> array: ( but not a dynamic allocated array as we discussed so far)
>>>> 
>>>> https://en.wikipedia.org/wiki/Dynamic_array
>>>> 
>>>> This dynamic array, also is called growable array, or resizable array, 
>>>> whose size can 
>>>> be changed during the lifetime. 
>>>> 
>>>> For VLA or FAM, I believe that they are both dynamic allocated array, i.e, 
>>>> even though the size is not know at the compilation time, but the size
>>>> will be fixed after the array is allocated. 
>>>> 
>>>> I am not sure whether C has support to such Dynamic array? Or whether it’s 
>>>> easy to provide dynamic array support in C?
>>> 
>>> It is possible to support dynamic arrays in C even with
>>> good checking, but not safely using the pattern above
>>> where you derive a pointer which you later use independently.
>>> 
>>> While we could track the connection to the original struct,
>>> the necessary synchronization between the counter and the
>>> access to the buffer is difficult.  I do not see how this
>>> could be supported with reasonable effort and cost.
>>> 
>>> 
>>> But with this restriction in mind, we can do a lot in C.
>>> For example, see my experimental (!) container library
>>> which has vector type.
>>> https://github.com/uecker/noplate/blob/main/test.c
>>> You can get an array view for the vector (which then
>>> also can decay to a pointer), so it interoperates nicely
>>> with C but you can get good bounds checking.
>>> 
>>> 
>>> But once you derive a pointer and pass it on, it gets
>>> difficult.  But if you want safety, you just have to 
>>> to simply avoid this in code. 
>> 
>> So, for the following modified code: (without the additional pointer “p”)
>> 
>> struct foo
>> {
>> size_t count;
>> char buf[] __attribute__((counted_by(count)));
>> };
>> 
>> struct foo *f;
>> int i;  
>> 
>> f = alloc(maximum_possible);
>> f->count = 0;
>> 
>> for (i; data_is_available() && i < maximum_possible; i++) {
>>  f->count ++;  
>>  f->buf[i] = next_data_item();
>> }   
>> 
>> The support for dynamic array should be possible? 
> 
> With the design we discussed this should work because
> __builtin_with_access (or whatever) it reads:
> 
> f = alloc(maximum_possible);
> f->count = 0;
> 
> for (i; data_is_available() && i < maximum_possible; i++) {
>  f->count ++;  
>  __builtin_with_access(f->buf, f->count)[i] = next_data_item();
> }   
> 

Yes, with the data flow, f->count should get the latest value of f->count. 
>> 
>> 
>>> 
>>> What we could potentially do is add restrictions so 
>>> that the access to buf always has to go via x->buf 
>>> or you get at least a warning.
>> 
>> Are the following two restrictions to the user enough:
>> 
>> 1. The access to buf should always go via x->buf, 
>>no assignment to another independent pointer 
>>and access buf through this new pointer.
> 
> Yes, maybe. One could also try to be smarter.
> 
> For example, one warn only when >buf is
> assigned to another pointer and one of the
> following conditions is fulfilled:
> 
> - the pointer escapes from the local context 
> 
> - there is a store to f->counter in the
> local context that does not dominate >buf.
> 
> Then Kees' example would work too in most cases.

I guess that we might need to come up with the list of concrete restrictions to 
the user, 
and list these restrictions in the user documentation.

Since  the dynamic array support is quite important to the kernel (is this 
true, Kees? ),
We might need to include such support into our design in the beginning. 

> 
> But I would probably wait until we have some
> initial experience with this feature.

You mean after we have an initial implementation of the “builtin_with_size”?
Yes, at this moment, I think that the “builtin_with_size” approach is the best 
one.
Just some details need more thinking before the real implementation.  -:)

Qing
> 
> Martin
> 
>> 2.  User need to keep the synchronization between
>>  the counter and the access to the buffer all the time.



Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-27 Thread Qing Zhao


> On Oct 27, 2023, at 3:21 AM, Martin Uecker  wrote:
> 
> Am Donnerstag, dem 26.10.2023 um 19:57 + schrieb Qing Zhao:
>> I guess that what Kees wanted, ""fill the array without knowing the actual 
>> final size" code pattern”, as following:
>> 
>>>>struct foo *f;
>>>>char *p;
>>>>int i;
>>>> 
>>>>f = alloc(maximum_possible);
>>>>f->count = 0;
>>>>p = f->buf;
>>>> 
>>>>for (i; data_is_available() && i < maximum_possible; i++) {
>>>>f->count ++;
>>>>p[i] = next_data_item();
>>>>}
>> 
>> actually is a dynamic array, or more accurately, Bounded-size dynamic array: 
>> ( but not a dynamic allocated array as we discussed so far)
>> 
>> https://en.wikipedia.org/wiki/Dynamic_array
>> 
>> This dynamic array, also is called growable array, or resizable array, whose 
>> size can 
>> be changed during the lifetime. 
>> 
>> For VLA or FAM, I believe that they are both dynamic allocated array, i.e, 
>> even though the size is not know at the compilation time, but the size
>> will be fixed after the array is allocated. 
>> 
>> I am not sure whether C has support to such Dynamic array? Or whether it’s 
>> easy to provide dynamic array support in C?
> 
> It is possible to support dynamic arrays in C even with
> good checking, but not safely using the pattern above
> where you derive a pointer which you later use independently.
> 
> While we could track the connection to the original struct,
> the necessary synchronization between the counter and the
> access to the buffer is difficult.  I do not see how this
> could be supported with reasonable effort and cost.
> 
> 
> But with this restriction in mind, we can do a lot in C.
> For example, see my experimental (!) container library
> which has vector type.
> https://github.com/uecker/noplate/blob/main/test.c
> You can get an array view for the vector (which then
> also can decay to a pointer), so it interoperates nicely
> with C but you can get good bounds checking.
> 
> 
> But once you derive a pointer and pass it on, it gets
> difficult.  But if you want safety, you just have to 
> to simply avoid this in code. 

So, for the following modified code: (without the additional pointer “p”)

struct foo
{
 size_t count;
 char buf[] __attribute__((counted_by(count)));
};

struct foo *f;
int i;  

f = alloc(maximum_possible);
f->count = 0;

for (i; data_is_available() && i < maximum_possible; i++) {
  f->count ++;  
  f->buf[i] = next_data_item();
}   

The support for dynamic array should be possible? 


> 
> What we could potentially do is add restrictions so 
> that the access to buf always has to go via x->buf 
> or you get at least a warning.

Are the following two restrictions to the user enough:

1. The access to buf should always go via x->buf, 
no assignment to another independent pointer 
and access buf through this new pointer.
2.  User need to keep the synchronization between
  the counter and the access to the buffer all the time.


Qing
> 
> Martin
> 
> 
> 
> 
>> 
>> Qing
>> 
>> 
>>> On Oct 26, 2023, at 12:45 PM, Martin Uecker  wrote:
>>> 
>>> Am Donnerstag, dem 26.10.2023 um 09:13 -0700 schrieb Kees Cook:
>>>> On Thu, Oct 26, 2023 at 10:15:10AM +0200, Martin Uecker wrote:
>>>>> but not this:
>>>>> 
>>> 
>>> x->count = 11;
>>>>> char *p = >buf;
>>>>> x->count = 1;
>>>>> p[10] = 1; // !
>>>> 
>>>> This seems fine to me -- it's how I'd expect it to work: "10" is beyond
>>>> "1".
>>> 
>>> Note that the store would be allowed.
>>> 
>>>> 
>>>>> (because the pointer is passed around the
>>>>> store to the counter)
>>>>> 
>>>>> and also here the second store is then irrelevant
>>>>> for the access:
>>>>> 
>>>>> x->count = 10;
>>>>> char* p = >buf;
>>>>> ...
>>>>> x->count = 1; // somewhere else
>>>>> 
>>>>> p[9] = 1; // ok, because count matter when buf was accesssed.
>>>> 
>>>> This is less great, but I can understand why it happens. "p" loses the
>>>> association with "x". It'd be nice if "p" had to way to retain that it
>>>> was just an alias for 

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-26 Thread Qing Zhao
I guess that what Kees wanted, ""fill the array without knowing the actual 
final size" code pattern”, as following:

>>  struct foo *f;
>>  char *p;
>>  int i;
>> 
>>  f = alloc(maximum_possible);
>>  f->count = 0;
>>  p = f->buf;
>> 
>>  for (i; data_is_available() && i < maximum_possible; i++) {
>>  f->count ++;
>>  p[i] = next_data_item();
>>  }

actually is a dynamic array, or more accurately, Bounded-size dynamic array: ( 
but not a dynamic allocated array as we discussed so far)

https://en.wikipedia.org/wiki/Dynamic_array

This dynamic array, also is called growable array, or resizable array, whose 
size can 
be changed during the lifetime. 

For VLA or FAM, I believe that they are both dynamic allocated array, i.e, even 
though the size is not know at the compilation time, but the size
will be fixed after the array is allocated. 

I am not sure whether C has support to such Dynamic array? Or whether it’s easy 
to provide dynamic array support in C?

Qing


> On Oct 26, 2023, at 12:45 PM, Martin Uecker  wrote:
> 
> Am Donnerstag, dem 26.10.2023 um 09:13 -0700 schrieb Kees Cook:
>> On Thu, Oct 26, 2023 at 10:15:10AM +0200, Martin Uecker wrote:
>>> but not this:
>>> 
> 
> x->count = 11;
>>> char *p = >buf;
>>> x->count = 1;
>>> p[10] = 1; // !
>> 
>> This seems fine to me -- it's how I'd expect it to work: "10" is beyond
>> "1".
> 
> Note that the store would be allowed.
> 
>> 
>>> (because the pointer is passed around the
>>> store to the counter)
>>> 
>>> and also here the second store is then irrelevant
>>> for the access:
>>> 
>>> x->count = 10;
>>> char* p = >buf;
>>> ...
>>> x->count = 1; // somewhere else
>>> 
>>> p[9] = 1; // ok, because count matter when buf was accesssed.
>> 
>> This is less great, but I can understand why it happens. "p" loses the
>> association with "x". It'd be nice if "p" had to way to retain that it
>> was just an alias for x->buf, so future p access would check count.
> 
> The problem is not to discover that p is an alias to x->buf, 
> but that it seems difficult to make sure that stores to 
> x->count are not reordered relative to the final access to
> p[i] you want to check, so that you then get the right value.
> 
>> 
>> But this appears to be an existing limitation in other areas where an
>> assignment will cause the loss of object association. (I've run into
>> this before.) It's just more surprising in the above example because in
>> the past the loss of association would cause __bdos() to revert back to
>> "SIZE_MAX" results ("I don't know the size") rather than an "outdated"
>> size, which may get us into unexpected places...
>> 
>>> IMHO this makes sense also from the user side and
>>> are the desirable semantics we discussed before.
>>> 
>>> But can you take a look at this?
>>> 
>>> 
>>> This should simulate it fairly well:
>>> https://godbolt.org/z/xq89aM7Gr
>>> 
>>> (the call to the noinline function would go away,
>>> but not necessarily its impact on optimization)
>> 
>> Yeah, this example should be a very rare situation: a leaf function is
>> changing the characteristics of the struct but returning a buffer within
>> it to the caller. The more likely glitch would be from:
>> 
>> int main()
>> {
>>  struct foo *f = foo_alloc(7);
>>  char *p = FAM_ACCESS(f, size, buf);
>> 
>>  printf("%ld\n", __builtin_dynamic_object_size(p, 0));
>>  test1(f); // or just "f->count = 10;" no function call needed
>>  printf("%ld\n", __builtin_dynamic_object_size(p, 0));
>> 
>>  return 0;
>> }
>> 
>> which reports:
>> 7
>> 7
>> 
>> instead of:
>> 7
>> 10
>> 
>> This kind of "get an alias" situation is pretty common in the kernel
>> as a way to have a convenient "handle" to the array. In the case of a
>> "fill the array without knowing the actual final size" code pattern,
>> things would immediately break:
>> 
>>  struct foo *f;
>>  char *p;
>>  int i;
>> 
>>  f = alloc(maximum_possible);
>>  f->count = 0;
>>  p = f->buf;
>> 
>>  for (i; data_is_available() && i < maximum_possible; i++) {
>>  f->count ++;
>>  p[i] = next_data_item();
>>  }
>> 
>> Now perhaps the problem here is that "count" cannot be used for a count
>> of "logically valid members in the array" but must always be a count of
>> "allocated member space in the array", which I guess is tolerable, but
>> isn't ideal -- I'd like to catch logic bugs in addition to allocation
>> bugs, but the latter is certainly much more important to catch.
> 
> Maybe we could have a warning when f->buf is not directly
> accessed.
> 
> Martin
> 
>> 
> 



Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-26 Thread Qing Zhao


> On Oct 26, 2023, at 1:05 PM, Martin Uecker  wrote:
> 
> Am Donnerstag, dem 26.10.2023 um 16:41 + schrieb Qing Zhao:
>> 
>>> On Oct 26, 2023, at 5:20 AM, Martin Uecker  wrote:
>>> 
>>> Am Donnerstag, dem 26.10.2023 um 10:45 +0200 schrieb Richard Biener:
>>>> On Wed, Oct 25, 2023 at 8:16 PM Martin Uecker  wrote:
>>>>> 
>>>>> Am Mittwoch, dem 25.10.2023 um 13:13 +0200 schrieb Richard Biener:
>>>>>> 
>>>>>>> Am 25.10.2023 um 12:47 schrieb Martin Uecker :
>>>>>>> 
>>>>>>> Am Mittwoch, dem 25.10.2023 um 06:25 -0400 schrieb Siddhesh Poyarekar:
>>>>>>>>> On 2023-10-25 04:16, Martin Uecker wrote:
>>>>>>>>> Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
>>>>>>>>>> 
>>>>>>>>>>> Am 24.10.2023 um 22:38 schrieb Martin Uecker :
>>>>>>>>>>> 
>>>>>>>>>>> Am Dienstag, dem 24.10.2023 um 20:30 + schrieb Qing Zhao:
>>>>>>>>>>>> Hi, Sid,
>>>>>>>>>>>> 
>>>>>>>>>>>> Really appreciate for your example and detailed explanation. Very 
>>>>>>>>>>>> helpful.
>>>>>>>>>>>> I think that this example is an excellent example to show (almost) 
>>>>>>>>>>>> all the issues we need to consider.
>>>>>>>>>>>> 
>>>>>>>>>>>> I slightly modified this example to make it to be compilable and 
>>>>>>>>>>>> run-able, as following:
>>>>>>>>>>>> (but I still cannot make the incorrect reordering or DSE 
>>>>>>>>>>>> happening, anyway, the potential reordering possibility is there…)
>>>>>>>>>>>> 
>>>>>>>>>>>> 1 #include 
>>>>>>>>>>>> 2 struct A
>>>>>>>>>>>> 3 {
>>>>>>>>>>>> 4  size_t size;
>>>>>>>>>>>> 5  char buf[] __attribute__((counted_by(size)));
>>>>>>>>>>>> 6 };
>>>>>>>>>>>> 7
>>>>>>>>>>>> 8 static size_t
>>>>>>>>>>>> 9 get_size_from (void *ptr)
>>>>>>>>>>>> 10 {
>>>>>>>>>>>> 11  return __builtin_dynamic_object_size (ptr, 1);
>>>>>>>>>>>> 12 }
>>>>>>>>>>>> 13
>>>>>>>>>>>> 14 void
>>>>>>>>>>>> 15 foo (size_t sz)
>>>>>>>>>>>> 16 {
>>>>>>>>>>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * 
>>>>>>>>>>>> sizeof(char));
>>>>>>>>>>>> 18  obj->size = sz;
>>>>>>>>>>>> 19  obj->buf[0] = 2;
>>>>>>>>>>>> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>>>>>>>>>>>> 21  return;
>>>>>>>>>>>> 22 }
>>>>>>>>>>>> 23
>>>>>>>>>>>> 24 int main ()
>>>>>>>>>>>> 25 {
>>>>>>>>>>>> 26  foo (20);
>>>>>>>>>>>> 27  return 0;
>>>>>>>>>>>> 28 }
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>>> When it’s set I suppose.  Turn
>>>>>>>>>> 
>>>>>>>>>> X.l = n;
>>>>>>>>>> 
>>>>>>>>>> Into
>>>>>>>>>> 
>>>>>>>>>> X.l = __builtin_with_size (x.buf, n);
>>>>>>>>> 
>>>>>>>>> It would turn
>>>>>>>>> 
>>>>>>>>> some_variable = (&) x.buf
>>>>>>>>> 
>>>>>>>>> into
>>>>>>>>> 
>>>>>>>>> some_variable = _

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-26 Thread Qing Zhao


> On Oct 26, 2023, at 10:05 AM, Richard Biener  
> wrote:
> 
> 
> 
>> Am 26.10.2023 um 12:14 schrieb Martin Uecker :
>> 
>> Am Donnerstag, dem 26.10.2023 um 11:20 +0200 schrieb Martin Uecker:
>>>> Am Donnerstag, dem 26.10.2023 um 10:45 +0200 schrieb Richard Biener:
>>>> On Wed, Oct 25, 2023 at 8:16 PM Martin Uecker  wrote:
>>>>> 
>>>>> Am Mittwoch, dem 25.10.2023 um 13:13 +0200 schrieb Richard Biener:
>>>>>> 
>>>>>>> Am 25.10.2023 um 12:47 schrieb Martin Uecker :
>>>>>>> 
>>>>>>> Am Mittwoch, dem 25.10.2023 um 06:25 -0400 schrieb Siddhesh Poyarekar:
>>>>>>>>> On 2023-10-25 04:16, Martin Uecker wrote:
>>>>>>>>> Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
>>>>>>>>>> 
>>>>>>>>>>> Am 24.10.2023 um 22:38 schrieb Martin Uecker :
>>>>>>>>>>> 
>>>>>>>>>>> Am Dienstag, dem 24.10.2023 um 20:30 + schrieb Qing Zhao:
>>>>>>>>>>>> Hi, Sid,
>>>>>>>>>>>> 
>>>>>>>>>>>> Really appreciate for your example and detailed explanation. Very 
>>>>>>>>>>>> helpful.
>>>>>>>>>>>> I think that this example is an excellent example to show (almost) 
>>>>>>>>>>>> all the issues we need to consider.
>>>>>>>>>>>> 
>>>>>>>>>>>> I slightly modified this example to make it to be compilable and 
>>>>>>>>>>>> run-able, as following:
>>>>>>>>>>>> (but I still cannot make the incorrect reordering or DSE 
>>>>>>>>>>>> happening, anyway, the potential reordering possibility is there…)
>>>>>>>>>>>> 
>>>>>>>>>>>> 1 #include 
>>>>>>>>>>>> 2 struct A
>>>>>>>>>>>> 3 {
>>>>>>>>>>>> 4  size_t size;
>>>>>>>>>>>> 5  char buf[] __attribute__((counted_by(size)));
>>>>>>>>>>>> 6 };
>>>>>>>>>>>> 7
>>>>>>>>>>>> 8 static size_t
>>>>>>>>>>>> 9 get_size_from (void *ptr)
>>>>>>>>>>>> 10 {
>>>>>>>>>>>> 11  return __builtin_dynamic_object_size (ptr, 1);
>>>>>>>>>>>> 12 }
>>>>>>>>>>>> 13
>>>>>>>>>>>> 14 void
>>>>>>>>>>>> 15 foo (size_t sz)
>>>>>>>>>>>> 16 {
>>>>>>>>>>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * 
>>>>>>>>>>>> sizeof(char));
>>>>>>>>>>>> 18  obj->size = sz;
>>>>>>>>>>>> 19  obj->buf[0] = 2;
>>>>>>>>>>>> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>>>>>>>>>>>> 21  return;
>>>>>>>>>>>> 22 }
>>>>>>>>>>>> 23
>>>>>>>>>>>> 24 int main ()
>>>>>>>>>>>> 25 {
>>>>>>>>>>>> 26  foo (20);
>>>>>>>>>>>> 27  return 0;
>>>>>>>>>>>> 28 }
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>>> When it’s set I suppose.  Turn
>>>>>>>>>> 
>>>>>>>>>> X.l = n;
>>>>>>>>>> 
>>>>>>>>>> Into
>>>>>>>>>> 
>>>>>>>>>> X.l = __builtin_with_size (x.buf, n);
>>>>>>>>> 
>>>>>>>>> It would turn
>>>>>>>>> 
>>>>>>>>> some_variable = (&) x.buf
>>>>>>>>> 
>>>>>>>>> into
>>>>>>>>> 
>>>>>>>>> 

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-26 Thread Qing Zhao


> On Oct 26, 2023, at 5:20 AM, Martin Uecker  wrote:
> 
> Am Donnerstag, dem 26.10.2023 um 10:45 +0200 schrieb Richard Biener:
>> On Wed, Oct 25, 2023 at 8:16 PM Martin Uecker  wrote:
>>> 
>>> Am Mittwoch, dem 25.10.2023 um 13:13 +0200 schrieb Richard Biener:
>>>> 
>>>>> Am 25.10.2023 um 12:47 schrieb Martin Uecker :
>>>>> 
>>>>> Am Mittwoch, dem 25.10.2023 um 06:25 -0400 schrieb Siddhesh Poyarekar:
>>>>>>> On 2023-10-25 04:16, Martin Uecker wrote:
>>>>>>> Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
>>>>>>>> 
>>>>>>>>> Am 24.10.2023 um 22:38 schrieb Martin Uecker :
>>>>>>>>> 
>>>>>>>>> Am Dienstag, dem 24.10.2023 um 20:30 + schrieb Qing Zhao:
>>>>>>>>>> Hi, Sid,
>>>>>>>>>> 
>>>>>>>>>> Really appreciate for your example and detailed explanation. Very 
>>>>>>>>>> helpful.
>>>>>>>>>> I think that this example is an excellent example to show (almost) 
>>>>>>>>>> all the issues we need to consider.
>>>>>>>>>> 
>>>>>>>>>> I slightly modified this example to make it to be compilable and 
>>>>>>>>>> run-able, as following:
>>>>>>>>>> (but I still cannot make the incorrect reordering or DSE happening, 
>>>>>>>>>> anyway, the potential reordering possibility is there…)
>>>>>>>>>> 
>>>>>>>>>> 1 #include 
>>>>>>>>>> 2 struct A
>>>>>>>>>> 3 {
>>>>>>>>>> 4  size_t size;
>>>>>>>>>> 5  char buf[] __attribute__((counted_by(size)));
>>>>>>>>>> 6 };
>>>>>>>>>> 7
>>>>>>>>>> 8 static size_t
>>>>>>>>>> 9 get_size_from (void *ptr)
>>>>>>>>>> 10 {
>>>>>>>>>> 11  return __builtin_dynamic_object_size (ptr, 1);
>>>>>>>>>> 12 }
>>>>>>>>>> 13
>>>>>>>>>> 14 void
>>>>>>>>>> 15 foo (size_t sz)
>>>>>>>>>> 16 {
>>>>>>>>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * 
>>>>>>>>>> sizeof(char));
>>>>>>>>>> 18  obj->size = sz;
>>>>>>>>>> 19  obj->buf[0] = 2;
>>>>>>>>>> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>>>>>>>>>> 21  return;
>>>>>>>>>> 22 }
>>>>>>>>>> 23
>>>>>>>>>> 24 int main ()
>>>>>>>>>> 25 {
>>>>>>>>>> 26  foo (20);
>>>>>>>>>> 27  return 0;
>>>>>>>>>> 28 }
>>>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>>> When it’s set I suppose.  Turn
>>>>>>>> 
>>>>>>>> X.l = n;
>>>>>>>> 
>>>>>>>> Into
>>>>>>>> 
>>>>>>>> X.l = __builtin_with_size (x.buf, n);
>>>>>>> 
>>>>>>> It would turn
>>>>>>> 
>>>>>>> some_variable = (&) x.buf
>>>>>>> 
>>>>>>> into
>>>>>>> 
>>>>>>> some_variable = __builtin_with_size ( (&) x.buf. x.len)
>>>>>>> 
>>>>>>> 
>>>>>>> So the later access to x.buf and not the initialization
>>>>>>> of a member of the struct (which is too early).
>>>>>>> 
>>>>>> 
>>>>>> Hmm, so with Qing's example above, are you suggesting the transformation
>>>>>> be to foo like so:
>>>>>> 
>>>>>> 14 void
>>>>>> 15 foo (size_t sz)
>>>>>> 16 {
>>>>>> 16.5  void * _1;
>>>>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * 
>>>

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-26 Thread Qing Zhao


> On Oct 26, 2023, at 4:56 AM, Richard Biener  
> wrote:
> 
> On Thu, Oct 26, 2023 at 7:22 AM Jakub Jelinek  wrote:
>> 
>> On Wed, Oct 25, 2023 at 07:03:43PM +, Qing Zhao wrote:
>>> For the code generation impact:
>>> 
>>> turning the original  x.buf
>>> to a builtin function call
>>> __builtin_with_access_and_size(x,buf, x.L,-1)
>>> 
>>> might inhibit some optimizations from happening before the builtin is
>>> evaluated into object size info (phase  .objsz1).  I guess there might be
>>> some performance impact.
>>> 
>>> However, if we mark this builtin as PURE, NOTRROW, etc, then the negative
>>> performance impact will be reduced to minimum?
>> 
>> You can't drop it during objsz1 pass though, otherwise __bdos wouldn't
>> be able to figure out the dynamic sizes in case of normal (non-early)
>> inlining - caller takes address of a counted_by array, passes it down
>> to callee which is only inlined late and uses __bdos, or callee takes address
>> and returns it and caller uses __bdos, etc. - so it would need to be objsz2.
>> 
>> And while the builtin (or if it is an internal detail rather than user
>> accessible builtin an internal function) could be even const/nothrow/leaf if
>> the arguments contain the loads from the structure 2 fields, I'm afraid it
>> will still have huge code generation impact, prevent tons of pre-IPA
>> optimizations.  And it will need some work to handle it properly during
>> inlining heuristics, because in GIMPLE the COMPONENT_REF loads aren't gimple
>> values, so it wouldn't be just the builtin/internal-fn call to be ignored,
>> but also the count load from memory.
> 
> I think we want to track the value, not the "memory" in the builtin call,
> so GIMPLE would be
> 
> _1 = x.L;
> .. = __builtin_with_access_and_size (, _1, -1);

Before adding the __builtin_with_access_and_size, the code is:



After inserting the built-in, it becomes:

_1 = x.L;
__builtin_with_access_and_size (, _1, -1).


So, the # of total instructions, the # of LOADs, and the # of calls will all be 
increased.
There will be impact to the inlining decision definitely.

> 
> also please make sure to use an internal function for
> __builtin_with_access_and_size,
> I don't think we want to expose this to users - it's an implementation detail.

Okay, will define it as an internal function (add it to internal-fn.def). -:)

Qing
> 
> Richard.
> 
>> 
>>Jakub
>> 



Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-26 Thread Qing Zhao


> On Oct 26, 2023, at 1:21 AM, Jakub Jelinek  wrote:
> 
> On Wed, Oct 25, 2023 at 07:03:43PM +0000, Qing Zhao wrote:
>> For the code generation impact:
>> 
>> turning the original  x.buf 
>> to a builtin function call
>> __builtin_with_access_and_size(x,buf, x.L,-1)
>> 
>> might inhibit some optimizations from happening before the builtin is
>> evaluated into object size info (phase  .objsz1).  I guess there might be
>> some performance impact.
>> 
>> However, if we mark this builtin as PURE, NOTRROW, etc, then the negative
>> performance impact will be reduced to minimum?
> 
> You can't drop it during objsz1 pass though, otherwise __bdos wouldn't
> be able to figure out the dynamic sizes in case of normal (non-early)
> inlining - caller takes address of a counted_by array, passes it down
> to callee which is only inlined late and uses __bdos, or callee takes address
> and returns it and caller uses __bdos, etc. - so it would need to be objsz2.

I guess that I didn’t say it very clear previously. Let me explain again:

My understanding is, there are “early_objsz” phase and then later “objsz1” 
phase for -O[1|2|3]. 
For -Og, there are “early_objsz” and then later “objsz2”. 

So, the “objsz1” I mentioned (for the case -O[1|2|3])  should be the same as 
the “objsz2” you mentioned above?  -:)
It’s the second objsz phase. 

In the second objsz phase, I believe that all the inlining (including early 
inlining and IPA inlining) are all applied?
> 
> And while the builtin (or if it is an internal detail rather than user
> accessible builtin an internal function)

Okay, will use an “internal function” instead of “ builtin function”. 

> could be even const/nothrow/leaf if
> the arguments contain the loads from the structure 2 fields, I'm afraid it
> will still have huge code generation impact, prevent tons of pre-IPA
> optimizations.  And it will need some work to handle it properly during
> inlining heuristics, because in GIMPLE the COMPONENT_REF loads aren't gimple
> values, so it wouldn't be just the builtin/internal-fn call to be ignored,
> but also the count load from memory.

Are you worrying about the potential additional LOADs will change the inlining 
decision
 since the inlining heuristic depends on the # of loads from memory? 

In additional to the # of loads, the # of instructions and the # of calls of 
the function 
might be increased too, will these have impact on inlining decision? 

In addition to inlining decision, any other impact to other IPA optimizations? 

thanks.

Qing


> 
>   Jakub
> 



Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-25 Thread Qing Zhao


> On Oct 25, 2023, at 6:06 PM, Kees Cook  wrote:
> 
> On Wed, Oct 25, 2023 at 01:27:29PM +0000, Qing Zhao wrote:
>> A.  Add an additional argument, the size parameter,  to __bdos, 
>> A.1, during FE;
>> A.2, during gimplification phase;
> 
> I just wanted to clarify that this is all just an "internal" detail,
> yes?

YES!

> i.e. the __bdos() used by in C code is unchanged?

there should be no change to the user interface. 

> 
> For example, the Linux kernel can still use __bdos() without knowing
> the count member ahead of time (otherwise it kind of defeats the purpose).
Don’t quite understand this, could you clarify? 

(Anyway, the bottom line is no change to the user interface, we just discuss 
the internal implementation inside GCC) -:)

Qing
> 
> -- 
> Kees Cook



Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-25 Thread Qing Zhao


> On Oct 25, 2023, at 11:38 AM, Richard Biener  
> wrote:
> 
> 
> 
>> Am 25.10.2023 um 16:50 schrieb Siddhesh Poyarekar :
>> 
>> On 2023-10-25 09:27, Qing Zhao wrote:
>>>>> On Oct 24, 2023, at 7:56 PM, Siddhesh Poyarekar  
>>>>> wrote:
>>>> 
>>>> On 2023-10-24 18:51, Qing Zhao wrote:
>>>>> Thanks for the proposal!
>>>>> So what you suggested is:
>>>>> For every x.buf,  change it as a __builtin_with_size(x.buf, x.L) in the 
>>>>> FE, then the call to the _bdos (x.buf, 1) will
>>>>> Become:
>>>>>   _bdos(__builtin_with_size(x.buf, x.L), 1)?
>>>>> Then the implicit use of x.L in _bdos(x.buf.1) will become explicit?
>>>> 
>>>> Oops, I think Martin and I fell off-list in a subthread.  I clarified that 
>>>> my comment was that any such annotation at object reference is probably 
>>>> too late and hence not the right place for it; basically it has the same 
>>>> problems as the option A in your comment.  A better place to reinforce 
>>>> such a relationship would be the allocation+initialization site instead.
>>> I think Martin’s proposal might work, it’s different than the option A:
>>> A.  Add an additional argument, the size parameter,  to __bdos,
>>> A.1, during FE;
>>> A.2, during gimplification phase;
>>> Option A targets on the __bdos call, try to encode the implicit use to the 
>>> call, this will not work when the real object has not been instantiation at 
>>> the call site.
>>> However, Martin’s proposal targets on the FMA array itself, it will enhance 
>>> the FAM access naturally with the size information. And such FAM access 
>>> with size info will propagated to the __bdos site later through inlining, 
>>> etc. and then tree-object-size can use the size information at that point. 
>>> At the same time, the implicit use of the size is recorded correctly.
>>> So, I think that this proposal is natural and reasonable.
>> 
>> Ack, we discussed this later in the thread and I agree[1].  Richard still 
>> has concerns[2] that I think may be addressed by putting __builtin_with_size 
>> at the point where the reference to x.buf escapes, but I'm not very sure 
>> about that.
>> 
>> Oh, and Martin suggested using __builtin_with_size more generally[3] in 
>> bugzilla to address attribute inlining issues and we have high level 
>> consensus for a __builtin_with_access instead, which associates access type 
>> in addition to size with the target object.  For the purposes of counted_by, 
>> access type could simply be -1.
> 
> Btw, I’d like to see some hard numbers on the amount of extra false positives 
> this will cause a well as the effect on generated code before putting this in 
> mainline and effectively needing to support it forever. 

What do you mean by the “extra false positives”? 

For the code generation impact:

turning the original  x.buf 
to a builtin function call
__builtin_with_access_and_size(x,buf, x.L,-1)

might inhibit some optimizations from happening before the builtin is evaluated 
into object size info (phase  .objsz1).  I guess there might be some 
performance impact. 

However, if we mark this builtin as PURE, NOTRROW, etc, then the negative 
performance impact will be reduced to minimum? 

Qing

> 
> Richard 
> 
>> Thanks,
>> Sid
>> 
>> 
>> [1] 
>> https://inbox.sourceware.org/gcc-patches/73af949c-3caa-4b11-93ce-3064b95a9...@gotplt.org/T/#m4f3cafa489493180e258fd62aca0196a5f244039
>> 
>> [2] 
>> https://inbox.sourceware.org/gcc-patches/73af949c-3caa-4b11-93ce-3064b95a9...@gotplt.org/T/#mcf226f891621db8b640deaedd8942bb8519010f3
>> 
>> [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503#c6



Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-25 Thread Qing Zhao


> On Oct 25, 2023, at 10:50 AM, Siddhesh Poyarekar  wrote:
> 
> On 2023-10-25 09:27, Qing Zhao wrote:
>>> On Oct 24, 2023, at 7:56 PM, Siddhesh Poyarekar  wrote:
>>> 
>>> On 2023-10-24 18:51, Qing Zhao wrote:
>>>> Thanks for the proposal!
>>>> So what you suggested is:
>>>> For every x.buf,  change it as a __builtin_with_size(x.buf, x.L) in the 
>>>> FE, then the call to the _bdos (x.buf, 1) will
>>>> Become:
>>>>_bdos(__builtin_with_size(x.buf, x.L), 1)?
>>>> Then the implicit use of x.L in _bdos(x.buf.1) will become explicit?
>>> 
>>> Oops, I think Martin and I fell off-list in a subthread.  I clarified that 
>>> my comment was that any such annotation at object reference is probably too 
>>> late and hence not the right place for it; basically it has the same 
>>> problems as the option A in your comment.  A better place to reinforce such 
>>> a relationship would be the allocation+initialization site instead.
>> I think Martin’s proposal might work, it’s different than the option A:
>> A.  Add an additional argument, the size parameter,  to __bdos,
>>  A.1, during FE;
>>  A.2, during gimplification phase;
>> Option A targets on the __bdos call, try to encode the implicit use to the 
>> call, this will not work when the real object has not been instantiation at 
>> the call site.
>> However, Martin’s proposal targets on the FMA array itself, it will enhance 
>> the FAM access naturally with the size information. And such FAM access with 
>> size info will propagated to the __bdos site later through inlining, etc. 
>> and then tree-object-size can use the size information at that point. At the 
>> same time, the implicit use of the size is recorded correctly.
>> So, I think that this proposal is natural and reasonable.
> 
> Ack, we discussed this later in the thread and I agree[1].  Richard still has 
> concerns[2] that I think may be addressed by putting __builtin_with_size at 
> the point where the reference to x.buf escapes, but I'm not very sure about 
> that.
> 
> Oh, and Martin suggested using __builtin_with_size more generally[3] in 
> bugzilla to address attribute inlining issues and we have high level 
> consensus for a __builtin_with_access instead, which associates access type 
> in addition to size with the target object.  For the purposes of counted_by, 
> access type could simply be -1.

Yes, I read all the discussions in the comments of PR96503 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503), and I do agree that this 
is a good idea. 

I prefer the name for the new builtin as:  
__builtin_with_access_and_size
Instead of 
__builtin_with_access

All the attributes, “alloca_size”, “access”, and the new “counted_by” for FMA, 
could be converted to this builtin consistently, and even the later new 
extension, for example, “counted_by” attribute for general pointers, could use 
the same builtin. 

SOMETYPE *ptr = __builtin_with_access_and_size (SOMETYPE *ptr, size_t size, int 
access)

In the above, 

1. SOMETYPE will be the type of the pointee of “ptr”, it could be a real type 
or void.

2. “size”

If SOMETYPE is a real type, the “size” will be the number of elements of the 
type;
If SOMETYPE is void, the “size” will be the number of bytes.   

3. “access”

-1: Unknown access semantics
0: none
1: read_only
2: write_only
3: read_write

For the “counted_by” and “alloca_size” attribute, the “access” will be -1. 

Qing
> 
> Thanks,
> Sid
> 
> 
> [1] 
> https://inbox.sourceware.org/gcc-patches/73af949c-3caa-4b11-93ce-3064b95a9...@gotplt.org/T/#m4f3cafa489493180e258fd62aca0196a5f244039
> 
> [2] 
> https://inbox.sourceware.org/gcc-patches/73af949c-3caa-4b11-93ce-3064b95a9...@gotplt.org/T/#mcf226f891621db8b640deaedd8942bb8519010f3
> 
> [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503#c6



<    1   2   3   4   5   6   7   8   9   10   >