Re: [PATCH v2 3/3] Add testing cases for flexible array members in unions and alone in structures.
On 2024-04-25 10:06, Qing Zhao wrote: gcc/testsuite/ChangeLog: * c-c++-common/fam-in-union-alone-in-struct-1.c: New testcase. * c-c++-common/fam-in-union-alone-in-struct-2.c: New testcase. * c-c++-common/fam-in-union-alone-in-struct-3.c: New testcase. --- Sorry I should have commented sooner; could you please also add tests for __bos/__bdos for such unions and structs? Thanks, Sid .../fam-in-union-alone-in-struct-1.c | 52 +++ .../fam-in-union-alone-in-struct-2.c | 51 ++ .../fam-in-union-alone-in-struct-3.c | 36 + 3 files changed, 139 insertions(+) create mode 100644 gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-1.c create mode 100644 gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-2.c create mode 100644 gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-3.c diff --git a/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-1.c b/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-1.c new file mode 100644 index ..7d4721aa95ac --- /dev/null +++ b/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-1.c @@ -0,0 +1,52 @@ +/* testing the correct usage of flexible array members in unions + and alone in structures. */ +/* { dg-do run} */ +/* { dg-options "-Wpedantic" } */ + +union with_fam_1 { + int a; + int b[]; /* { dg-warning "flexible array member in union is a GCC extension" } */ +}; + +union with_fam_2 { + char a; + int b[]; /* { dg-warning "flexible array member in union is a GCC extension" } */ +}; + +union with_fam_3 { + char a[]; /* { dg-warning "flexible array member in union is a GCC extension" } */ + /* { dg-warning "in an otherwise empty" "" { target c++ } .-1 } */ + int b[]; /* { dg-warning "flexible array member in union is a GCC extension" } */ +}; + +struct only_fam { + int b[]; + /* { dg-warning "in a struct with no named members" "" { target c } .-1 } */ + /* { dg-warning "in an otherwise empty" "" { target c++ } .-2 } */ + /* { dg-warning "forbids flexible array member" "" { target c++ } .-3 } */ +}; + +struct only_fam_2 { + unsigned int : 2; + unsigned int : 3; + int b[]; + /* { dg-warning "in a struct with no named members" "" { target c } .-1 } */ + /* { dg-warning "in an otherwise empty" "" { target c++ } .-2 } */ + /* { dg-warning "forbids flexible array member" "" { target c++ } .-3 } */ +}; + +int main () +{ + if (sizeof (union with_fam_1) != sizeof (int)) +__builtin_abort (); + if (sizeof (union with_fam_2) != __alignof__ (int)) +__builtin_abort (); + if (sizeof (union with_fam_3) != 0) +__builtin_abort (); + if (sizeof (struct only_fam) != 0) +__builtin_abort (); + if (sizeof (struct only_fam_2) != sizeof (int)) +__builtin_abort (); + return 0; +} + diff --git a/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-2.c b/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-2.c new file mode 100644 index ..3743f9e7dac5 --- /dev/null +++ b/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-2.c @@ -0,0 +1,51 @@ +/* testing the correct usage of flexible array members in unions + and alone in structures: initialization */ +/* { dg-do run} */ +/* { dg-options "-O2" } */ + +union with_fam_1 { + int a; + int b[]; +} with_fam_1_v = {.b = {1, 2, 3, 4}}; + +union with_fam_2 { + int a; + char b[]; +} with_fam_2_v = {.a = 0x1f2f3f4f}; + +union with_fam_3 { + char a[]; + int b[]; +} with_fam_3_v = {.b = {0x1f2f3f4f, 0x5f6f7f7f}}; + +struct only_fam { + int b[]; +} only_fam_v = {{7, 11}}; + +struct only_fam_2 { + unsigned int : 2; + unsigned int : 3; + int b[]; +} only_fam_2_v = {{7, 11}}; + +int main () +{ + if (with_fam_1_v.b[3] != 4 + || with_fam_1_v.b[0] != 1) +__builtin_abort (); + if (with_fam_2_v.b[3] != 0x1f + || with_fam_2_v.b[0] != 0x4f) +__builtin_abort (); + if (with_fam_3_v.a[0] != 0x4f + || with_fam_3_v.a[7] != 0x5f) +__builtin_abort (); + if (only_fam_v.b[0] != 7 + || only_fam_v.b[1] != 11) +__builtin_abort (); + if (only_fam_2_v.b[0] != 7 + || only_fam_2_v.b[1] != 11) +__builtin_abort (); + + return 0; +} + diff --git a/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-3.c b/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-3.c new file mode 100644 index ..dd36fa01306d --- /dev/null +++ b/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-3.c @@ -0,0 +1,36 @@ +/* testing the correct usage of flexible array members in unions + and alone in structures. */ +/* { dg-do compile } */ +/* { dg-options "-pedantic-errors" } */ + +union with_fam_1 { + int a; + int b[]; /* { dg-error "flexible array member in union is a GCC extension" } */ +}; + +union with_fam_2 { + char a; + int b[]; /* { dg-error "flexible array member in union is a GCC extension" } */ +}; + +union with_fam_3 { + char a[]; /* { dg-error "flexible array member in union is a GCC
Re: [PATCH v8 5/5] Add the 6th argument to .ACCESS_WITH_SIZE
On 2024-03-29 12:07, Qing Zhao wrote: to carry the TYPE of the flexible array. Such information is needed during tree-object-size.cc. We cannot use the result type or the type of the 1st argument of the routine .ACCESS_WITH_SIZE to decide the element type of the original array due to possible type casting in the source code. gcc/c/ChangeLog: * c-typeck.cc (build_access_with_size_for_counted_by): Add the 6th argument to .ACCESS_WITH_SIZE. gcc/ChangeLog: * tree-object-size.cc (access_with_size_object_size): Use the type of the 6th argument for the type of the element. gcc/testsuite/ChangeLog: * gcc.dg/flex-array-counted-by-6.c: New test. This version looks fine to me for stage 1, but I'm not a maintainer so you'll need an ack from one to commit. Thanks, Sid --- gcc/c/c-typeck.cc | 11 +++-- gcc/internal-fn.cc| 2 + .../gcc.dg/flex-array-counted-by-6.c | 46 +++ gcc/tree-object-size.cc | 16 --- 4 files changed, 66 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-6.c diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc index f7b0e08459b0..05948f76039e 100644 --- a/gcc/c/c-typeck.cc +++ b/gcc/c/c-typeck.cc @@ -2608,7 +2608,8 @@ build_counted_by_ref (tree datum, tree subdatum, tree *counted_by_type) to: - (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1)) + (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1, + (TYPE_OF_ARRAY *)0)) NOTE: The return type of this function is the POINTER type pointing to the original flexible array type. @@ -2620,6 +2621,9 @@ build_counted_by_ref (tree datum, tree subdatum, tree *counted_by_type) The 4th argument of the call is a constant 0 with the TYPE of the object pointed by COUNTED_BY_REF. + The 6th argument of the call is a constant 0 with the pointer TYPE + to the original flexible array type. + */ static tree build_access_with_size_for_counted_by (location_t loc, tree ref, @@ -2632,12 +2636,13 @@ build_access_with_size_for_counted_by (location_t loc, tree ref, tree call = build_call_expr_internal_loc (loc, IFN_ACCESS_WITH_SIZE, - result_type, 5, + result_type, 6, array_to_pointer_conversion (loc, ref), counted_by_ref, build_int_cst (integer_type_node, 1), build_int_cst (counted_by_type, 0), - build_int_cst (integer_type_node, -1)); + build_int_cst (integer_type_node, -1), + build_int_cst (result_type, 0)); /* Wrap the call with an INDIRECT_REF with the flexible array type. */ call = build1 (INDIRECT_REF, TREE_TYPE (ref), call); SET_EXPR_LOCATION (call, loc); diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index e744080ee670..34e4a4aea534 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -3411,6 +3411,8 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt) 1: read_only 2: write_only 3: read_write + 6th argument: A constant 0 with the pointer TYPE to the original flexible + array type. Both the return type and the type of the first argument of this function have been converted from the incomplete array type to diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c b/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c new file mode 100644 index ..65fa01443d95 --- /dev/null +++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c @@ -0,0 +1,46 @@ +/* Test the attribute counted_by and its usage in + * __builtin_dynamic_object_size: when the type of the flexible array member + * is casting to another type. */ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +#include "builtin-object-size-common.h" + +typedef unsigned short u16; + +struct info { + u16 data_len; + char data[] __attribute__((counted_by(data_len))); +}; + +struct foo { + int a; + int b; +}; + +static __attribute__((__noinline__)) +struct info *setup () +{ + struct info *p; + size_t bytes = 3 * sizeof(struct foo); + + p = (struct info *)malloc (sizeof (struct info) + bytes); + p->data_len = bytes; + + return p; +} + +static void +__attribute__((__noinline__)) report (struct info *p) +{ + struct foo *bar = (struct foo *)p->data; + EXPECT(__builtin_dynamic_object_size((char *)(bar + 1), 1), 16); + EXPECT(__builtin_dynamic_object_size((char *)(bar + 2), 1), 8); +} + +int main(int argc, char *argv[]) +{ + struct info *p = setup(); + report(p); + return 0; +} diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc index 8de264d1dee2..4c1fa9b555fa 100644 ---
Re: [PATCH v8 4/5] Use the .ACCESS_WITH_SIZE in bound sanitizer.
On 2024-03-29 12:07, Qing Zhao wrote: gcc/c-family/ChangeLog: * c-ubsan.cc (get_bound_from_access_with_size): New function. (ubsan_instrument_bounds): Handle call to .ACCESS_WITH_SIZE. gcc/testsuite/ChangeLog: * gcc.dg/ubsan/flex-array-counted-by-bounds-2.c: New test. * gcc.dg/ubsan/flex-array-counted-by-bounds-3.c: New test. * gcc.dg/ubsan/flex-array-counted-by-bounds-4.c: New test. * gcc.dg/ubsan/flex-array-counted-by-bounds.c: New test. --- This version looks fine to me for stage 1, but I'm not a maintainer so you'll need an ack from one to commit. Thanks, Sid gcc/c-family/c-ubsan.cc | 42 + .../ubsan/flex-array-counted-by-bounds-2.c| 45 ++ .../ubsan/flex-array-counted-by-bounds-3.c| 34 ++ .../ubsan/flex-array-counted-by-bounds-4.c| 34 ++ .../ubsan/flex-array-counted-by-bounds.c | 46 +++ 5 files changed, 201 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-3.c create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-4.c create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c diff --git a/gcc/c-family/c-ubsan.cc b/gcc/c-family/c-ubsan.cc index 940982819ddf..7cd3c6aa5b88 100644 --- a/gcc/c-family/c-ubsan.cc +++ b/gcc/c-family/c-ubsan.cc @@ -376,6 +376,40 @@ ubsan_instrument_return (location_t loc) return build_call_expr_loc (loc, t, 1, build_fold_addr_expr_loc (loc, data)); } +/* Get the tree that represented the number of counted_by, i.e, the maximum + number of the elements of the object that the call to .ACCESS_WITH_SIZE + points to, this number will be the bound of the corresponding array. */ +static tree +get_bound_from_access_with_size (tree call) +{ + if (!is_access_with_size_p (call)) +return NULL_TREE; + + tree ref_to_size = CALL_EXPR_ARG (call, 1); + unsigned int class_of_size = TREE_INT_CST_LOW (CALL_EXPR_ARG (call, 2)); + tree type = TREE_TYPE (CALL_EXPR_ARG (call, 3)); + tree size = fold_build2 (MEM_REF, type, unshare_expr (ref_to_size), + build_int_cst (ptr_type_node, 0)); + /* If size is negative value, treat it as zero. */ + if (!TYPE_UNSIGNED (type)) + { +tree cond = fold_build2 (LT_EXPR, boolean_type_node, +unshare_expr (size), build_zero_cst (type)); +size = fold_build3 (COND_EXPR, type, cond, + build_zero_cst (type), size); + } + + /* Only when class_of_size is 1, i.e, the number of the elements of + the object type, return the size. */ + if (class_of_size != 1) +return NULL_TREE; + else +size = fold_convert (sizetype, size); + + return size; +} + + /* Instrument array bounds for ARRAY_REFs. We create special builtin, that gets expanded in the sanopt pass, and make an array dimension of it. ARRAY is the array, *INDEX is an index to the array. @@ -401,6 +435,14 @@ ubsan_instrument_bounds (location_t loc, tree array, tree *index, && COMPLETE_TYPE_P (type) && integer_zerop (TYPE_SIZE (type))) bound = build_int_cst (TREE_TYPE (TYPE_MIN_VALUE (domain)), -1); + else if (INDIRECT_REF_P (array) + && is_access_with_size_p ((TREE_OPERAND (array, 0 + { + bound = get_bound_from_access_with_size ((TREE_OPERAND (array, 0))); + bound = fold_build2 (MINUS_EXPR, TREE_TYPE (bound), + bound, + build_int_cst (TREE_TYPE (bound), 1)); + } else return NULL_TREE; } diff --git a/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c new file mode 100644 index ..b503320628d2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c @@ -0,0 +1,45 @@ +/* Test the attribute counted_by and its usage in + bounds sanitizer combined with VLA. */ +/* { dg-do run } */ +/* { dg-options "-fsanitize=bounds" } */ +/* { dg-output "index 11 out of bounds for type 'int \\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */ +/* { dg-output "\[^\n\r]*index 20 out of bounds for type 'int \\\[\\\*\\\]\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */ +/* { dg-output "\[^\n\r]*index 11 out of bounds for type 'int \\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */ +/* { dg-output "\[^\n\r]*index 10 out of bounds for type 'int \\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */ + + +#include + +void __attribute__((__noinline__)) setup_and_test_vla (int n, int m) +{ + struct foo { + int n; + int p[][n] __attribute__((counted_by(n))); + } *f; + + f = (struct foo *) malloc (sizeof(struct foo) + m*sizeof(int[n])); + f->n = m; + f->p[m][n-1]=1; + return; +} + +void
Re: [PATCH v8 3/5] Use the .ACCESS_WITH_SIZE in builtin object size.
On 2024-03-29 12:07, Qing Zhao wrote: gcc/ChangeLog: * tree-object-size.cc (access_with_size_object_size): New function. (call_object_size): Call the new function. gcc/testsuite/ChangeLog: * gcc.dg/builtin-object-size-common.h: Add a new macro EXPECT. * gcc.dg/flex-array-counted-by-3.c: New test. * gcc.dg/flex-array-counted-by-4.c: New test. * gcc.dg/flex-array-counted-by-5.c: New test. This version looks fine to me for stage 1, but I'm not a maintainer so you'll need an ack from one to commit. Thanks, Sid --- .../gcc.dg/builtin-object-size-common.h | 11 ++ .../gcc.dg/flex-array-counted-by-3.c | 63 +++ .../gcc.dg/flex-array-counted-by-4.c | 178 ++ .../gcc.dg/flex-array-counted-by-5.c | 48 + gcc/tree-object-size.cc | 60 ++ 5 files changed, 360 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-4.c create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-5.c diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-common.h b/gcc/testsuite/gcc.dg/builtin-object-size-common.h index 66ff7cdd953a..b677067c6e6b 100644 --- a/gcc/testsuite/gcc.dg/builtin-object-size-common.h +++ b/gcc/testsuite/gcc.dg/builtin-object-size-common.h @@ -30,3 +30,14 @@ unsigned nfails = 0; __builtin_abort (); \ return 0; \ } while (0) + +#define EXPECT(p, _v) do { \ + size_t v = _v; \ + if (p == v)\ +__builtin_printf ("ok: %s == %zd\n", #p, p); \ + else \ +{\ + __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v); \ + FAIL (); \ +}\ +} while (0); diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c new file mode 100644 index ..78f50230e891 --- /dev/null +++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c @@ -0,0 +1,63 @@ +/* Test the attribute counted_by and its usage in + * __builtin_dynamic_object_size. */ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +#include "builtin-object-size-common.h" + +struct flex { + int b; + int c[]; +} *array_flex; + +struct annotated { + int b; + int c[] __attribute__ ((counted_by (b))); +} *array_annotated; + +struct nested_annotated { + struct { +union { + int b; + float f; +}; +int n; + }; + int c[] __attribute__ ((counted_by (b))); +} *array_nested_annotated; + +void __attribute__((__noinline__)) setup (int normal_count, int attr_count) +{ + array_flex += (struct flex *)malloc (sizeof (struct flex) ++ normal_count * sizeof (int)); + array_flex->b = normal_count; + + array_annotated += (struct annotated *)malloc (sizeof (struct annotated) + + attr_count * sizeof (int)); + array_annotated->b = attr_count; + + array_nested_annotated += (struct nested_annotated *)malloc (sizeof (struct nested_annotated) ++ attr_count * sizeof (int)); + array_nested_annotated->b = attr_count; + + return; +} + +void __attribute__((__noinline__)) test () +{ +EXPECT(__builtin_dynamic_object_size(array_flex->c, 1), -1); +EXPECT(__builtin_dynamic_object_size(array_annotated->c, 1), + array_annotated->b * sizeof (int)); +EXPECT(__builtin_dynamic_object_size(array_nested_annotated->c, 1), + array_nested_annotated->b * sizeof (int)); +} + +int main(int argc, char *argv[]) +{ + setup (10,10); + test (); + DONE (); +} diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c new file mode 100644 index ..20103d58ef51 --- /dev/null +++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c @@ -0,0 +1,178 @@ +/* Test the attribute counted_by and its usage in +__builtin_dynamic_object_size: what's the correct behavior when the +allocation size mismatched with the value of counted_by attribute? +We should always use the latest value that is hold by the counted_by +field. */ +/* { dg-do run } */ +/* { dg-options "-O -fstrict-flex-arrays=3" } */ + +#include "builtin-object-size-common.h" + +struct annotated { + size_t foo; + char others; + char array[] __attribute__((counted_by (foo))); +}; + +#define noinline
Re: [PATCH v6 3/5] Use the .ACCESS_WITH_SIZE in builtin object size.
On 2024-03-18 12:28, Qing Zhao wrote: This should probably bail out if object_size_type & OST_DYNAMIC == 0. Okay. Will add this. When add this into access_with_size_object_size, I found some old bugs in early_object_sizes_execute_one, and fixed them as well. Would you be able to isolate this fix and post them separately? I reckon they would be relevant for gcc 14 too. Thanks, Sid
Re: [PATCH v6 4/5] Use the .ACCESS_WITH_SIZE in bound sanitizer.
On 2024-02-16 14:47, Qing Zhao wrote: gcc/c-family/ChangeLog: * c-ubsan.cc (get_bound_from_access_with_size): New function. (ubsan_instrument_bounds): Handle call to .ACCESS_WITH_SIZE. gcc/testsuite/ChangeLog: * gcc.dg/ubsan/flex-array-counted-by-bounds-2.c: New test. * gcc.dg/ubsan/flex-array-counted-by-bounds-3.c: New test. * gcc.dg/ubsan/flex-array-counted-by-bounds.c: New test. --- gcc/c-family/c-ubsan.cc | 42 + .../ubsan/flex-array-counted-by-bounds-2.c| 45 ++ .../ubsan/flex-array-counted-by-bounds-3.c| 34 ++ .../ubsan/flex-array-counted-by-bounds.c | 46 +++ 4 files changed, 167 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-3.c create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c diff --git a/gcc/c-family/c-ubsan.cc b/gcc/c-family/c-ubsan.cc index 940982819ddf..164b29845b3a 100644 --- a/gcc/c-family/c-ubsan.cc +++ b/gcc/c-family/c-ubsan.cc @@ -376,6 +376,40 @@ ubsan_instrument_return (location_t loc) return build_call_expr_loc (loc, t, 1, build_fold_addr_expr_loc (loc, data)); } +/* Get the tree that represented the number of counted_by, i.e, the maximum + number of the elements of the object that the call to .ACCESS_WITH_SIZE + points to, this number will be the bound of the corresponding array. */ +static tree +get_bound_from_access_with_size (tree call) +{ + if (!is_access_with_size_p (call)) +return NULL_TREE; + + tree ref_to_size = CALL_EXPR_ARG (call, 1); + unsigned int type_of_size = TREE_INT_CST_LOW (CALL_EXPR_ARG (call, 2)); Again for consistency, this should probably be class_of_size. + tree type = TREE_TYPE (CALL_EXPR_ARG (call, 3)); + tree size = fold_build2 (MEM_REF, type, unshare_expr (ref_to_size), + build_int_cst (ptr_type_node, 0)); + /* If size is negative value, treat it as zero. */ + if (!TYPE_UNSIGNED (type)) + { +tree cond = fold_build2 (LT_EXPR, boolean_type_node, +unshare_expr (size), build_zero_cst (type)); +size = fold_build3 (COND_EXPR, type, cond, + build_zero_cst (type), size); + } + + /* Only when type_of_size is 1,i.e, the number of the elements of + the object type, return the size. */ + if (type_of_size != 1) +return NULL_TREE; + else +size = fold_convert (sizetype, size); + + return size; +} + + /* Instrument array bounds for ARRAY_REFs. We create special builtin, that gets expanded in the sanopt pass, and make an array dimension of it. ARRAY is the array, *INDEX is an index to the array. @@ -401,6 +435,14 @@ ubsan_instrument_bounds (location_t loc, tree array, tree *index, && COMPLETE_TYPE_P (type) && integer_zerop (TYPE_SIZE (type))) bound = build_int_cst (TREE_TYPE (TYPE_MIN_VALUE (domain)), -1); + else if (INDIRECT_REF_P (array) + && is_access_with_size_p ((TREE_OPERAND (array, 0 + { + bound = get_bound_from_access_with_size ((TREE_OPERAND (array, 0))); + bound = fold_build2 (MINUS_EXPR, TREE_TYPE (bound), + bound, + build_int_cst (TREE_TYPE (bound), 1)); + } This will wrap if bound == 0, maybe that needs to be special-cased. And maybe also add a test for it below. else return NULL_TREE; } diff --git a/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c new file mode 100644 index ..148934975ee5 --- /dev/null +++ b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c @@ -0,0 +1,45 @@ +/* test the attribute counted_by and its usage in + bounds sanitizer combined with VLA. */ +/* { dg-do run } */ +/* { dg-options "-fsanitize=bounds" } */ +/* { dg-output "index 11 out of bounds for type 'int \\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */ +/* { dg-output "\[^\n\r]*index 20 out of bounds for type 'int \\\[\\\*\\\]\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */ +/* { dg-output "\[^\n\r]*index 11 out of bounds for type 'int \\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */ +/* { dg-output "\[^\n\r]*index 10 out of bounds for type 'int \\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */ + + +#include + +void __attribute__((__noinline__)) setup_and_test_vla (int n, int m) +{ + struct foo { + int n; + int p[][n] __attribute__((counted_by(n))); + } *f; + + f = (struct foo *) malloc (sizeof(struct foo) + m*sizeof(int[n])); + f->n = m; + f->p[m][n-1]=1; + return; +} + +void __attribute__((__noinline__)) setup_and_test_vla_1 (int n1, int n2, int m) +{ + struct foo { +int n; +int p[][n2][n1] __attribute__((counted_by(n))); + } *f; + + f
Re: [PATCH v6 3/5] Use the .ACCESS_WITH_SIZE in builtin object size.
On 2024-02-16 14:47, Qing Zhao wrote: gcc/ChangeLog: * tree-object-size.cc (access_with_size_object_size): New function. (call_object_size): Call the new function. gcc/testsuite/ChangeLog: * gcc.dg/builtin-object-size-common.h: Add a new macro EXPECT. * gcc.dg/flex-array-counted-by-3.c: New test. * gcc.dg/flex-array-counted-by-4.c: New test. * gcc.dg/flex-array-counted-by-5.c: New test. --- .../gcc.dg/builtin-object-size-common.h | 11 ++ .../gcc.dg/flex-array-counted-by-3.c | 63 +++ .../gcc.dg/flex-array-counted-by-4.c | 178 ++ .../gcc.dg/flex-array-counted-by-5.c | 48 + gcc/tree-object-size.cc | 59 ++ 5 files changed, 359 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-4.c create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-5.c diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-common.h b/gcc/testsuite/gcc.dg/builtin-object-size-common.h index 66ff7cdd953a..b677067c6e6b 100644 --- a/gcc/testsuite/gcc.dg/builtin-object-size-common.h +++ b/gcc/testsuite/gcc.dg/builtin-object-size-common.h @@ -30,3 +30,14 @@ unsigned nfails = 0; __builtin_abort (); \ return 0; \ } while (0) + +#define EXPECT(p, _v) do { \ + size_t v = _v; \ + if (p == v)\ +__builtin_printf ("ok: %s == %zd\n", #p, p); \ + else \ +{\ + __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v); \ + FAIL (); \ +}\ +} while (0); diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c new file mode 100644 index ..0066c32ca808 --- /dev/null +++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c @@ -0,0 +1,63 @@ +/* test the attribute counted_by and its usage in + * __builtin_dynamic_object_size. */ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +#include "builtin-object-size-common.h" + +struct flex { + int b; + int c[]; +} *array_flex; + +struct annotated { + int b; + int c[] __attribute__ ((counted_by (b))); +} *array_annotated; + +struct nested_annotated { + struct { +union { + int b; + float f; +}; +int n; + }; + int c[] __attribute__ ((counted_by (b))); +} *array_nested_annotated; + +void __attribute__((__noinline__)) setup (int normal_count, int attr_count) +{ + array_flex += (struct flex *)malloc (sizeof (struct flex) ++ normal_count * sizeof (int)); + array_flex->b = normal_count; + + array_annotated += (struct annotated *)malloc (sizeof (struct annotated) + + attr_count * sizeof (int)); + array_annotated->b = attr_count; + + array_nested_annotated += (struct nested_annotated *)malloc (sizeof (struct nested_annotated) ++ attr_count * sizeof (int)); + array_nested_annotated->b = attr_count; + + return; +} + +void __attribute__((__noinline__)) test () +{ +EXPECT(__builtin_dynamic_object_size(array_flex->c, 1), -1); +EXPECT(__builtin_dynamic_object_size(array_annotated->c, 1), + array_annotated->b * sizeof (int)); +EXPECT(__builtin_dynamic_object_size(array_nested_annotated->c, 1), + array_nested_annotated->b * sizeof (int)); +} + +int main(int argc, char *argv[]) +{ + setup (10,10); + test (); + DONE (); +} diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c new file mode 100644 index ..3ce7f3545549 --- /dev/null +++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c @@ -0,0 +1,178 @@ +/* test the attribute counted_by and its usage in +__builtin_dynamic_object_size: what's the correct behavior when the +allocation size mismatched with the value of counted_by attribute? +we should always use the latest value that is hold by the counted_by +field. */ +/* { dg-do run } */ +/* { dg-options "-O -fstrict-flex-arrays=3" } */ + +#include "builtin-object-size-common.h" + +struct annotated { + size_t foo; + char others; + char array[] __attribute__((counted_by (foo))); +}; + +#define noinline __attribute__((__noinline__)) +#define SIZE_BUMP 10 +#define MAX(a, b) ((a) > (b) ? (a) : (b)) + +/* In general, Due to type casting, the
Re: [PATCH v6 2/5] Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.
On 2024-02-16 14:47, Qing Zhao wrote: Including the following changes: * The definition of the new internal function .ACCESS_WITH_SIZE in internal-fn.def. * C FE converts every reference to a FAM with a "counted_by" attribute to a call to the internal function .ACCESS_WITH_SIZE. (build_component_ref in c_typeck.cc) This includes the case when the object is statically allocated and initialized. In order to make this working, the routines initializer_constant_valid_p_1 and output_constant in varasm.cc are updated to handle calls to .ACCESS_WITH_SIZE. (initializer_constant_valid_p_1 and output_constant in varasm.c) However, for the reference inside "offsetof", the "counted_by" attribute is ignored since it's not useful at all. (c_parser_postfix_expression in c/c-parser.cc) In addtion to "offsetof", for the reference inside operator "typeof" and "alignof", we ignore counted_by attribute too. When building ADDR_EXPR for the .ACCESS_WITH_SIZE in C FE, replace the call with its first argument. * Convert every call to .ACCESS_WITH_SIZE to its first argument. (expand_ACCESS_WITH_SIZE in internal-fn.cc) * Adjust alias analysis to exclude the new internal from clobbering anything. (ref_maybe_used_by_call_p_1 and call_may_clobber_ref_p_1 in tree-ssa-alias.cc) * Adjust dead code elimination to eliminate the call to .ACCESS_WITH_SIZE when it's LHS is eliminated as dead code. (eliminate_unnecessary_stmts in tree-ssa-dce.cc) * Provide the utility routines to check the call is .ACCESS_WITH_SIZE and get the reference from the call to .ACCESS_WITH_SIZE. (is_access_with_size_p and get_ref_from_access_with_size in tree.cc) gcc/c/ChangeLog: * c-parser.cc (c_parser_postfix_expression): Ignore the counted-by attribute when build_component_ref inside offsetof operator. * c-tree.h (build_component_ref): Add one more parameter. * c-typeck.cc (build_counted_by_ref): New function. (build_access_with_size_for_counted_by): New function. (build_component_ref): Check the counted-by attribute and build call to .ACCESS_WITH_SIZE. (build_unary_op): When building ADDR_EXPR for .ACCESS_WITH_SIZE, use its first argument. (lvalue_p): Accept call to .ACCESS_WITH_SIZE. gcc/ChangeLog: * internal-fn.cc (expand_ACCESS_WITH_SIZE): New function. * internal-fn.def (ACCESS_WITH_SIZE): New internal function. * tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Special case IFN_ACCESS_WITH_SIZE. (call_may_clobber_ref_p_1): Special case IFN_ACCESS_WITH_SIZE. * tree-ssa-dce.cc (eliminate_unnecessary_stmts): Eliminate the call to .ACCESS_WITH_SIZE when its LHS is dead. * tree.cc (process_call_operands): Adjust side effect for function .ACCESS_WITH_SIZE. (is_access_with_size_p): New function. (get_ref_from_access_with_size): New function. * tree.h (is_access_with_size_p): New prototype. (get_ref_from_access_with_size): New prototype. * varasm.cc (initializer_constant_valid_p_1): Handle call to .ACCESS_WITH_SIZE. (output_constant): Handle call to .ACCESS_WITH_SIZE. gcc/testsuite/ChangeLog: * gcc.dg/flex-array-counted-by-2.c: New test. --- gcc/c/c-parser.cc | 10 +- gcc/c/c-tree.h| 2 +- gcc/c/c-typeck.cc | 128 +- gcc/internal-fn.cc| 36 + gcc/internal-fn.def | 4 + .../gcc.dg/flex-array-counted-by-2.c | 112 +++ gcc/tree-ssa-alias.cc | 2 + gcc/tree-ssa-dce.cc | 5 +- gcc/tree.cc | 25 +++- gcc/tree.h| 8 ++ gcc/varasm.cc | 10 ++ 11 files changed, 332 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index c31349dae2ff..a6ed5ac43bb1 100644 --- a/gcc/c/c-parser.cc +++ b/gcc/c/c-parser.cc @@ -10850,9 +10850,12 @@ c_parser_postfix_expression (c_parser *parser) if (c_parser_next_token_is (parser, CPP_NAME)) { c_token *comp_tok = c_parser_peek_token (parser); + /* Ignore the counted_by attribute for reference inside + offsetof since the information is not useful at all. */ offsetof_ref = build_component_ref (loc, offsetof_ref, comp_tok->value, -comp_tok->location, UNKNOWN_LOCATION); +comp_tok->location, UNKNOWN_LOCATION, +false); c_parser_consume_token (parser);
Re: [PATCH v6 1/5] Provide counted_by attribute to flexible array member field (PR108896)
On 2024-02-16 14:47, Qing Zhao wrote: 'counted_by (COUNT)' The 'counted_by' attribute may be attached to the C99 flexible array member of a structure. It indicates that the number of the elements of the array is given by the field named "COUNT" in the same structure as the flexible array member. GCC uses this information to improve the results of the array bound sanitizer and the '__builtin_dynamic_object_size'. For instance, the following code: struct P { size_t count; char other; char array[] __attribute__ ((counted_by (count))); } *p; specifies that the 'array' is a flexible array member whose number of elements is given by the field 'count' in the same structure. The field that represents the number of the elements should have an integer type. Otherwise, the compiler will report a warning and ignore the attribute. When the field that represents the number of the elements is assigned a negative integer value, the compiler will treat the value as zero. An explicit 'counted_by' annotation defines a relationship between two objects, 'p->array' and 'p->count', and there are the following requirementthat on the relationship between this pair: * 'p->count' should be initialized before the first reference to 'p->array'; * 'p->array' has _at least_ 'p->count' number of elements available all the time. This relationship must hold even after any of these related objects are updated during the program. It's the user's responsibility to make sure the above requirements to be kept all the time. Otherwise the compiler will report warnings, at the same time, the results of the array bound sanitizer and the '__builtin_dynamic_object_size' is undefined. One important feature of the attribute is, a reference to the flexible array member field will use the latest value assigned to the field that represents the number of the elements before that reference. For example, p->count = val1; p->array[20] = 0; // ref1 to p->array p->count = val2; p->array[30] = 0; // ref2 to p->array in the above, 'ref1' will use 'val1' as the number of the elements in 'p->array', and 'ref2' will use 'val2' as the number of elements in 'p->array'. I can't approve of course, but here's a review of the code that should hopefully make it easier for the C frontend maintainers. gcc/c-family/ChangeLog: PR C/108896 * c-attribs.cc (handle_counted_by_attribute): New function. (attribute_takes_identifier_p): Add counted_by attribute to the list. * c-common.cc (c_flexible_array_member_type_p): ...To this. * c-common.h (c_flexible_array_member_type_p): New prototype. gcc/c/ChangeLog: PR C/108896 * c-decl.cc (flexible_array_member_type_p): Renamed and moved to... (add_flexible_array_elts_to_size): Use renamed function. (is_flexible_array_member_p): Use renamed function. (verify_counted_by_attribute): New function. (finish_struct): Use renamed function and verify counted_by attribute. * c-tree.h (lookup_field): New prototype. * c-typeck.cc (lookup_field): Expose as extern function. gcc/ChangeLog: PR C/108896 * doc/extend.texi: Document attribute counted_by. gcc/testsuite/ChangeLog: PR C/108896 * gcc.dg/flex-array-counted-by.c: New test. --- gcc/c-family/c-attribs.cc| 54 - gcc/c-family/c-common.cc | 13 +++ gcc/c-family/c-common.h | 1 + gcc/c/c-decl.cc | 85 gcc/c/c-tree.h | 1 + gcc/c/c-typeck.cc| 3 +- gcc/doc/extend.texi | 64 +++ gcc/testsuite/gcc.dg/flex-array-counted-by.c | 40 + 8 files changed, 241 insertions(+), 20 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by.c diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc index 40a0cf90295d..4395c0656b14 100644 --- a/gcc/c-family/c-attribs.cc +++ b/gcc/c-family/c-attribs.cc @@ -105,6 +105,8 @@ static tree handle_warn_if_not_aligned_attribute (tree *, tree, tree, int, bool *); static tree handle_strict_flex_array_attribute (tree *, tree, tree, int, bool *); +static tree handle_counted_by_attribute (tree *, tree, tree, + int, bool *); static tree handle_weak_attribute (tree *, tree, tree, int, bool *) ; static tree handle_noplt_attribute (tree *, tree, tree, int, bool *) ;
Re: [RFC] GCC Security policy
On 2024-02-12 10:00, Richard Biener wrote: GCC driver support is then to the extent identifying the inputs and the outputs. We already have -MM to generate a list of non-system dependencies, so gcc should be able to pass on inputs to the tool, which could then map those files (and the system headers directory) into the sandbox before invocation. The output file could perhaps be enforced as having to be a new one, i.e. fail if the target is already found. I'm not sure a generic utility can achieve this unless the outputs need to be retrieved from somewhere else (not "usual" place when invoking un-sandboxed). Even the driver doesn't necessarily know all files read/written. So I suppose better defining of the actual goal is in order. gcc -sandboxed -O2 -c t.ii -fdump-tree-all what should this do? IMO invoked tools (gas, cc1plus) should be restricted to access the input files. Ideally the dumps would appear where they appear when not sandboxed but clearly overwriting existing files would be problematic, writing new files not so much, but only to the standard (or specified) auxiliary output file paths. Couldn't we get away with not having to handle dump files? They don't seem to be sensitive targets. Thanks, Sid
Re: [RFC] GCC Security policy
On 2024-02-12 08:16, Martin Jambor wrote: This probably ties in somewhat with an idea David Malcolm had riffed on with me earlier, of caching files for diagnostics. If we could unify file accesses somehow, we could make this happen, i.e. open/read files as root and then do all execution as non-root. Sandboxing will have similar requirements, i.e. map in input files and an output file handle upfront and then unshare() into a sandbox to do the actual compilation. This will make sure that at least the processing of inputs does not affect the system on which the compilation is being run. Right. As we often just download some (sometimes large) pre-processed source from Bugzilla and then happily run GCC on it on our computers, this feature might be actually useful for us (still, we'd probably need a more concrete description of what we want, would e.g. using "-wrapper gdb,--args" work in such a sandbox?). I agree that for some even semi-complex builds, a more general sandboxing solution is probably better. Joseph seems to be leaning towards nudging people to a general sandboxing solution too. The question then is whether this takes the shape of a utility in, e.g. contrib that builds such a sandbox or simply a wiki page. Thanks, Sid
Re: [RFC] GCC Security policy
On 2024-02-09 15:06, Joseph Myers wrote: Ideally dependencies would be properly set up so that everything is built in the original build, and ideally there would be no need to relink at install time (I'm not sure of the exact circumstances in which it might be needed, or on what OSes to e.g. encode the right library paths in final installed executables). In practice I think it's common for some building to take place at install time. There is a more general principle here of composability: it's not helpful for being able to write scripts or makefiles combining invocations of different utilities and have them behave predictably if some of those utilities start making judgements about whether it's a good idea to run them in a particular environment rather than just doing their job independent of irrelevant aspects of the environment. The semantics of invoking "gcc" have nothing to do with whether it's run as root; it should never need to look up what user it's running as at all. (And it's probably also a bad idea for lots of separate utilities to gain their own ways to run in a restricted environment, for similar reasons; rather than teaching "gcc" a way to create a restricted environment itself, ensure there are easy-to-use more general utilities for running arbitrary programs on untrusted input in a contained environment.) I see your point. The way you put it, there's no GCC project here at all then. Sid
Re: [RFC] GCC Security policy
On 2024-02-09 12:14, Joseph Myers wrote: On Fri, 9 Feb 2024, Siddhesh Poyarekar wrote: For privilege management we could add a --allow-root driver flag that allows gcc to run as root. Without the flag one could either outright refuse to run or drop privileges and run. Dropping privileges will be a bit tricky to implement because it would need a user to drop privileges to and then there would be the question of how to manage file access to read the compiler input and write out the compiler output. If there's no such user, gcc could refuse to run as root by default. I wonder though if from a security posture perspective it makes sense to simply discourage running as root all the time and not bother trying to make it work with dropped privileges and all that. Of course it would mean that this would be less of a "project"; it'll be a simple enough patch to refuse to run until --allow-root is specified. I think disallowing running as root would be a big problem in practice - the typical problem case is when people build software as non-root and run "make install" as root, and for some reason "make install" wants to (re)build or (re)link something. Isn't that a problematic practice though? Or maybe have those invocations be separated out as CC_ROOT? Thanks, Sid
Re: [RFC] GCC Security policy
On 2024-02-09 10:38, Martin Jambor wrote: If anyone is interested in scoping this and then mentoring this as a Google Summer of Code project this year then now is the right time to speak up! I can help with mentoring and reviews, although I'll need someone to assist with actual approvals. There are two distinct sets of ideas to explore, one is privilege management and the other sandboxing. For privilege management we could add a --allow-root driver flag that allows gcc to run as root. Without the flag one could either outright refuse to run or drop privileges and run. Dropping privileges will be a bit tricky to implement because it would need a user to drop privileges to and then there would be the question of how to manage file access to read the compiler input and write out the compiler output. If there's no such user, gcc could refuse to run as root by default. I wonder though if from a security posture perspective it makes sense to simply discourage running as root all the time and not bother trying to make it work with dropped privileges and all that. Of course it would mean that this would be less of a "project"; it'll be a simple enough patch to refuse to run until --allow-root is specified. This probably ties in somewhat with an idea David Malcolm had riffed on with me earlier, of caching files for diagnostics. If we could unify file accesses somehow, we could make this happen, i.e. open/read files as root and then do all execution as non-root. Sandboxing will have similar requirements, i.e. map in input files and an output file handle upfront and then unshare() into a sandbox to do the actual compilation. This will make sure that at least the processing of inputs does not affect the system on which the compilation is being run. Sid
Re: [PATCH] SECURITY.txt: Drop "exploitable" in reference to hardening issues
On 2023-12-18 09:35, Siddhesh Poyarekar wrote: The "exploitable vulnerability" may lead to a misunderstanding that missed hardening issues are considered vulnerabilities, just that they're not exploitable. This is not true, since while hardening bugs may be security-relevant, the absence of hardening does not make a program any more vulnerable to exploits than without. Drop the "exploitable" word to make it clear that missed hardening is not considered a vulnerability. Ping, may I commit this if there are no objections? Thanks, Sid diff --git a/SECURITY.txt b/SECURITY.txt index b3e2bbfda90..126603d4c22 100644 --- a/SECURITY.txt +++ b/SECURITY.txt @@ -155,10 +155,10 @@ Security features implemented in GCC GCC implements a number of security features that reduce the impact of security issues in applications, such as -fstack-protector, -fstack-clash-protection, _FORTIFY_SOURCE and so on. A failure of - these features to function perfectly in all situations is not an - exploitable vulnerability in itself since it does not affect the - correctness of programs. Further, they're dependent on heuristics - and may not always have full coverage for protection. + these features to function perfectly in all situations is not a + vulnerability in itself since it does not affect the correctness of + programs. Further, they're dependent on heuristics and may not + always have full coverage for protection. Similarly, GCC may transform code in a way that the correctness of the expressed algorithm is preserved, but supplementary properties
Re: [PATCH] tree-object-size: Clean up unknown propagation
On 2023-12-20 00:23, Jeff Law wrote: On 12/19/23 10:21, Siddhesh Poyarekar wrote: Narrow down scope of the unknowns bitmap so that it is only accessible within the reexamination process. This also removes any role of unknown propagation from object_sizes_set, thus simplifying that code path a bit. gcc/ChangeLog: * tree-object-size.cc (object_size_info): Remove UNKNOWNS. Drop all references to it. (object_sizes_set): Move unknowns propagation code to... (gimplify_size_expressions): ... here. Also free reexamine bitmap. (propagate_unknowns): New parameter UNKNOWNS. Update callers. Signed-off-by: Siddhesh Poyarekar --- This is a follow-up cleanup to pr#113012, but not required to fix that bug. Bootstrapped on x86_64 and with config=ubsan. OK, assuming it also went through a regression test or does so before committing. Thanks, yes, I also compared test results during the x86_64 bootstrap, likewise for i686 after I had posted this. Sid
Re: [PATCH] tree-object-size: Always set computed bit for bdos [PR113012]
On 2023-12-19 17:57, Jakub Jelinek wrote: On Mon, Dec 18, 2023 at 11:42:52AM -0500, Siddhesh Poyarekar wrote: It is always safe to set the computed bit for dynamic object sizes at the end of collect_object_sizes_for because even in case of a dependency loop encountered in nested calls, we have an SSA temporary to actually finish the object size expression. The reexamine pass for dynamic object sizes is only for propagation of unknowns and gimplification of the size expressions, not for loop resolution as in the case of static object sizes. gcc/ChangeLog: PR tree-optimization/113012 * gcc.dg/ubsan/pr113012.c: New test case. gcc/testsuite/ChangeLog: PR tree-optimization/113012 * tree-object-size.cc (compute_builtin_object_size): Expand comment for dynamic object sizes. (collect_object_sizes_for): Always set COMPUTED bitmap for dynamic object sizes. You have the gcc/ChangeLog and gcc/testsuite/ChangeLog hunks swapped, I think this wouldn't get through pre-commit hook. Oops, fixed. --- a/gcc/tree-object-size.cc +++ b/gcc/tree-object-size.cc @@ -1185,10 +1185,12 @@ compute_builtin_object_size (tree ptr, int object_size_type, osi.tos = NULL; } - /* First pass: walk UD chains, compute object sizes that -can be computed. osi.reexamine bitmap at the end will -contain what variables were found in dependency cycles -and therefore need to be reexamined. */ + /* First pass: walk UD chains, compute object sizes that can be computed. +osi.reexamine bitmap at the end will contain what variables that need This wording is weird. Perhaps ... will contain versions of SSA_NAMEs that need to be reexamined. ? Because varno seems to be SSA_NAME_VERSION. Ack, it's unrelated to my change, but fixed since I touched the comment block. @@ -1823,11 +1825,16 @@ collect_object_sizes_for (struct object_size_info *osi, tree var) gcc_unreachable (); } - if (! reexamine || object_sizes_unknown_p (object_size_type, varno)) + /* Dynamic sizes use placeholder temps to return an answer, so it is always + * safe to set COMPUTED for them. */ We don't use this style of comments, please replace the * at the start of second line with a space. Oops, fixed. + if ((object_size_type & OST_DYNAMIC) + || !reexamine || object_sizes_unknown_p (object_size_type, varno)) { bitmap_set_bit (computed[object_size_type], varno); if (!(object_size_type & OST_DYNAMIC)) bitmap_clear_bit (osi->reexamine, varno); + else if (reexamine) + bitmap_set_bit (osi->reexamine, varno); } else { Otherwise LGTM, but please wait at least a few weeks before backporting to 13. OK, I'll push with fixes to trunk in a bit and then push to 13 next year. Thanks, Sid
[PATCH] tree-object-size: Clean up unknown propagation
Narrow down scope of the unknowns bitmap so that it is only accessible within the reexamination process. This also removes any role of unknown propagation from object_sizes_set, thus simplifying that code path a bit. gcc/ChangeLog: * tree-object-size.cc (object_size_info): Remove UNKNOWNS. Drop all references to it. (object_sizes_set): Move unknowns propagation code to... (gimplify_size_expressions): ... here. Also free reexamine bitmap. (propagate_unknowns): New parameter UNKNOWNS. Update callers. Signed-off-by: Siddhesh Poyarekar --- This is a follow-up cleanup to pr#113012, but not required to fix that bug. Bootstrapped on x86_64 and with config=ubsan. gcc/tree-object-size.cc | 65 + 1 file changed, 34 insertions(+), 31 deletions(-) diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc index 434b2fc0bf5..08a3b7f5d94 100644 --- a/gcc/tree-object-size.cc +++ b/gcc/tree-object-size.cc @@ -43,7 +43,7 @@ struct object_size_info int object_size_type; unsigned char pass; bool changed; - bitmap visited, reexamine, unknowns; + bitmap visited, reexamine; unsigned int *depths; unsigned int *stack, *tos; }; @@ -264,19 +264,8 @@ object_sizes_set (struct object_size_info *osi, unsigned varno, tree val, { if (bitmap_bit_p (osi->reexamine, varno)) { - if (size_unknown_p (val, object_size_type)) - { - oldval = object_sizes_get (osi, varno); - old_wholeval = object_sizes_get (osi, varno, true); - bitmap_set_bit (osi->unknowns, SSA_NAME_VERSION (oldval)); - bitmap_set_bit (osi->unknowns, SSA_NAME_VERSION (old_wholeval)); - bitmap_clear_bit (osi->reexamine, varno); - } - else - { - val = bundle_sizes (oldval, val); - wholeval = bundle_sizes (old_wholeval, wholeval); - } + val = bundle_sizes (oldval, val); + wholeval = bundle_sizes (old_wholeval, wholeval); } else { @@ -958,25 +947,26 @@ emit_phi_nodes (gimple *stmt, tree size, tree wholesize) size_unknown, as noted in UNKNOWNS. */ static tree -propagate_unknowns (object_size_info *osi, tree expr) +propagate_unknowns (object_size_info *osi, tree expr, bitmap unknowns) { int object_size_type = osi->object_size_type; switch (TREE_CODE (expr)) { case SSA_NAME: - if (bitmap_bit_p (osi->unknowns, SSA_NAME_VERSION (expr))) + if (bitmap_bit_p (unknowns, SSA_NAME_VERSION (expr))) return size_unknown (object_size_type); return expr; case MIN_EXPR: case MAX_EXPR: { - tree res = propagate_unknowns (osi, TREE_OPERAND (expr, 0)); + tree res = propagate_unknowns (osi, TREE_OPERAND (expr, 0), +unknowns); if (size_unknown_p (res, object_size_type)) return res; - res = propagate_unknowns (osi, TREE_OPERAND (expr, 1)); + res = propagate_unknowns (osi, TREE_OPERAND (expr, 1), unknowns); if (size_unknown_p (res, object_size_type)) return res; @@ -984,7 +974,8 @@ propagate_unknowns (object_size_info *osi, tree expr) } case MODIFY_EXPR: { - tree res = propagate_unknowns (osi, TREE_OPERAND (expr, 1)); + tree res = propagate_unknowns (osi, TREE_OPERAND (expr, 1), +unknowns); if (size_unknown_p (res, object_size_type)) return res; return expr; @@ -992,7 +983,8 @@ propagate_unknowns (object_size_info *osi, tree expr) case TREE_VEC: for (int i = 0; i < TREE_VEC_LENGTH (expr); i++) { - tree res = propagate_unknowns (osi, TREE_VEC_ELT (expr, i)); + tree res = propagate_unknowns (osi, TREE_VEC_ELT (expr, i), +unknowns); if (size_unknown_p (res, object_size_type)) return res; } @@ -1000,7 +992,8 @@ propagate_unknowns (object_size_info *osi, tree expr) case PLUS_EXPR: case MINUS_EXPR: { - tree res = propagate_unknowns (osi, TREE_OPERAND (expr, 0)); + tree res = propagate_unknowns (osi, TREE_OPERAND (expr, 0), +unknowns); if (size_unknown_p (res, object_size_type)) return res; @@ -1025,6 +1018,7 @@ gimplify_size_expressions (object_size_info *osi) /* Step 1: Propagate unknowns into expressions. */ bitmap reexamine = BITMAP_ALLOC (NULL); bitmap_copy (reexamine, osi->reexamine); + bitmap unknowns = BITMAP_ALLOC (NULL); do { changed = false; @@ -1032,14 +1026,23 @@ gimplify_size_expressions (object_size_info *osi) { object_size cur = object_sizes_get_raw (osi, i); - if (size_unknown_p (propagate
[PATCH] tree-object-size: Always set computed bit for bdos [PR113012]
It is always safe to set the computed bit for dynamic object sizes at the end of collect_object_sizes_for because even in case of a dependency loop encountered in nested calls, we have an SSA temporary to actually finish the object size expression. The reexamine pass for dynamic object sizes is only for propagation of unknowns and gimplification of the size expressions, not for loop resolution as in the case of static object sizes. gcc/ChangeLog: PR tree-optimization/113012 * gcc.dg/ubsan/pr113012.c: New test case. gcc/testsuite/ChangeLog: PR tree-optimization/113012 * tree-object-size.cc (compute_builtin_object_size): Expand comment for dynamic object sizes. (collect_object_sizes_for): Always set COMPUTED bitmap for dynamic object sizes. Signed-off-by: Siddhesh Poyarekar --- Testing: - Bootstrapped x86_64 and config=ubsan - Tested i686 OK for trunk and backport to gcc 13 branch? gcc/testsuite/gcc.dg/ubsan/pr113012.c | 17 + gcc/tree-object-size.cc | 17 - 2 files changed, 29 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/ubsan/pr113012.c diff --git a/gcc/testsuite/gcc.dg/ubsan/pr113012.c b/gcc/testsuite/gcc.dg/ubsan/pr113012.c new file mode 100644 index 000..4fc38cd1171 --- /dev/null +++ b/gcc/testsuite/gcc.dg/ubsan/pr113012.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-fsanitize=undefined" } */ + +int * +foo (int x, int y, int z, int w) +{ + int *p = __builtin_malloc (z * sizeof (int)); + int *q = p - 1; + while (--x > 0) +{ + if (w + 1 > y) + q = p - 1; + ++*q; + ++q; +} + return p; +} diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc index 28f27adf9ca..434b2fc0bf5 100644 --- a/gcc/tree-object-size.cc +++ b/gcc/tree-object-size.cc @@ -1185,10 +1185,12 @@ compute_builtin_object_size (tree ptr, int object_size_type, osi.tos = NULL; } - /* First pass: walk UD chains, compute object sizes that -can be computed. osi.reexamine bitmap at the end will -contain what variables were found in dependency cycles -and therefore need to be reexamined. */ + /* First pass: walk UD chains, compute object sizes that can be computed. +osi.reexamine bitmap at the end will contain what variables that need +to be reexamined. For both static and dynamic size computation, +reexamination is for propagation across dependency loops. The dynamic +case has the additional use case where the computed expression needs +to be gimplified. */ osi.pass = 0; osi.changed = false; collect_object_sizes_for (, ptr); @@ -1823,11 +1825,16 @@ collect_object_sizes_for (struct object_size_info *osi, tree var) gcc_unreachable (); } - if (! reexamine || object_sizes_unknown_p (object_size_type, varno)) + /* Dynamic sizes use placeholder temps to return an answer, so it is always + * safe to set COMPUTED for them. */ + if ((object_size_type & OST_DYNAMIC) + || !reexamine || object_sizes_unknown_p (object_size_type, varno)) { bitmap_set_bit (computed[object_size_type], varno); if (!(object_size_type & OST_DYNAMIC)) bitmap_clear_bit (osi->reexamine, varno); + else if (reexamine) + bitmap_set_bit (osi->reexamine, varno); } else { -- 2.43.0
[PATCH] SECURITY.txt: Drop "exploitable" in reference to hardening issues
The "exploitable vulnerability" may lead to a misunderstanding that missed hardening issues are considered vulnerabilities, just that they're not exploitable. This is not true, since while hardening bugs may be security-relevant, the absence of hardening does not make a program any more vulnerable to exploits than without. Drop the "exploitable" word to make it clear that missed hardening is not considered a vulnerability. diff --git a/SECURITY.txt b/SECURITY.txt index b3e2bbfda90..126603d4c22 100644 --- a/SECURITY.txt +++ b/SECURITY.txt @@ -155,10 +155,10 @@ Security features implemented in GCC GCC implements a number of security features that reduce the impact of security issues in applications, such as -fstack-protector, -fstack-clash-protection, _FORTIFY_SOURCE and so on. A failure of -these features to function perfectly in all situations is not an -exploitable vulnerability in itself since it does not affect the -correctness of programs. Further, they're dependent on heuristics -and may not always have full coverage for protection. +these features to function perfectly in all situations is not a +vulnerability in itself since it does not affect the correctness of +programs. Further, they're dependent on heuristics and may not +always have full coverage for protection. Similarly, GCC may transform code in a way that the correctness of the expressed algorithm is preserved, but supplementary properties
Re: [gcc15] nested functions in C
On 2023-12-07 10:42, Eric Botcazou wrote: I think from a language standpoint, the general idea that nested functions are just any functions inside functions (which is how the C nested functions essentially behave) is too broad and they should be restricted to minimal implementations that, e.g. don't have side-effects or if they do, there's explicit syntactic sugar to make it clearer. That sounds totally arbitrary though. Algol-derived languages have had nested subprograms for ages, e.g. Pascal or Ada, and they can be very useful. I'll admit that it is a subjective preference and is probably not in the spirit of traditional C. Sid
Re: [C PATCH, v2] Add Walloc-size to warn about insufficient size in allocations [PR71219]
On 2023-12-06 09:41, Jakub Jelinek wrote: On Wed, Dec 06, 2023 at 09:30:32AM -0500, Siddhesh Poyarekar wrote: We have the -Wmemset-transposed-args warning, couldn't we have a similar one for calloc, and perhaps do it solely in the case where one uses sizeof of the type used in the cast pointer? So warn for (struct S *) calloc (sizeof (struct S), 1) or (struct S *) calloc (sizeof (struct S), n) but not for (struct S *) calloc (4, 15) or (struct S *) calloc (sizeof (struct T), 1) or similar? Of course check for compatible types of TYPE_MAIN_VARIANTs. +1, this could be an analyzer warning, since in practice it is just a code cleanliness issue. We don't do such things in the analyzer, nor it is possible, by the time analyzer sees the IL all the sizeofs etc. are folded. Analyzer is for expensive to compute warnings, code style warnings are normally implemented in the FEs. Thanks, understood. A separate FE warning is fine as well. Thanks, Sid
Re: [C PATCH, v2] Add Walloc-size to warn about insufficient size in allocations [PR71219]
On 2023-12-06 09:21, Jakub Jelinek wrote: On Wed, Dec 06, 2023 at 02:34:10PM +0100, Martin Uecker wrote: Further I think "size less than or equal to the size requested" is quite ambiguous in the calloc case, isn't the size requested in the calloc case actually nmemb * size rather than just size? This is unclear but it can be understood this way. This was also Joseph's point. I am happy to submit a patch that changes the code so that the swapped arguments to calloc do not cause a warning anymore. That would be my preference because then the allocation size is correct and it is purely a style warning. It doesn't follow how the warning is described: "Warn about calls to allocation functions decorated with attribute @code{alloc_size} that specify insufficient size for the target type of the pointer the result is assigned to" when the size is certainly sufficient. But wonder what others think about it. +1, from a libc perspective, the transposed arguments don't make a difference, a typical allocator will produce sufficiently sized allocation for the calloc call. BTW, shouldn't the warning be for C++ as well? Sure, I know, people use operator new more often, but still, the allocators are used in there as well. We have the -Wmemset-transposed-args warning, couldn't we have a similar one for calloc, and perhaps do it solely in the case where one uses sizeof of the type used in the cast pointer? So warn for (struct S *) calloc (sizeof (struct S), 1) or (struct S *) calloc (sizeof (struct S), n) but not for (struct S *) calloc (4, 15) or (struct S *) calloc (sizeof (struct T), 1) or similar? Of course check for compatible types of TYPE_MAIN_VARIANTs. +1, this could be an analyzer warning, since in practice it is just a code cleanliness issue. Thanks, Sid
Re: [gcc15] nested functions in C
On 2023-12-04 16:31, Martin Uecker wrote: If (assuming from them being called lambdas) they are primarily for small functions without side-effects then it's already a significantly stronger specification than what we have right now with C nested functions. That would end up enforcing what you demonstrate as the good way to use nested functions. The proposal we have seen for C23 (which was not accepted into C23 mostly due to timing and lack of implementation experience) were similar to C++'s lambdas and did not have any such restriction. Oh well :/ If nested functions are eventually going to make it into the C standard then effort is probably better spent in porting the C nested functions to use descriptors instead of executable stacks or heaps. I submitted a patch for this a long time ago which was based on the code for Ada that uses a bit in the pointer to differentiate between conventional pointers and descriptors. I would now prefer an approach that uses a qualifier on the function type to indicate that the static chain has to be set. A pointer to such a qualified function would a descriptor that consists of the address and the value for the static chain. This would be useful for many things. Ack, this probably becomes a gcc15 thing then, given that stage 1 has closed. Are you planning to revive your work and repost? Thanks, Sid
Re: [gcc15] nested functions in C
On 2023-12-04 13:51, Jakub Jelinek wrote: Why? The syntax doesn't seem to be something unexpected, and as C doesn't have lambdas, one can use the nested functions instead. The only problem is if you need to pass function pointers somewhere else (and target doesn't have function descriptors or something similar), if it is only done to make code more readable compared to say use of macros, I think the nested functions are better, one doesn't have to worry about multiple evaluations of argument side-effects etc. And if everything is inlined and SRA optimized, there is no extra cost. The problem of passing it as a function pointer to other functions is common with C++, only lambdas which don't capture anything actually can be convertible to function pointer, for anything else you need a template and instantiate it for a particular lambda (which is something you can't do in C). I think from a language standpoint, the general idea that nested functions are just any functions inside functions (which is how the C nested functions essentially behave) is too broad and they should be restricted to minimal implementations that, e.g. don't have side-effects or if they do, there's explicit syntactic sugar to make it clearer. If (like Martin stated earlier), nested functions are in fact going to enter the C standard in future then maybe this discussion is moot and we probably are better off implementing descriptor support for C to replace the on-stack trampolines instead of adding -Werror=trampolines in a hurry. Thanks, Sid
Re: [gcc15] nested functions in C
On 2023-12-04 13:48, Martin Uecker wrote: I empathize with Jakub's stated use case though of keeping the C frontend support for testing purposes, but that could easily be done behind a flag, or by putting nested C func deprecation behind a flag. I am relatively sure C will get some form of nested functions. Maybe as anonymous nested functions, i.e. lambdas, but I do not see a fundamental difference here (I personally like naming things for clarity, so i prefer named nested functions) If (assuming from them being called lambdas) they are primarily for small functions without side-effects then it's already a significantly stronger specification than what we have right now with C nested functions. That would end up enforcing what you demonstrate as the good way to use nested functions. I suppose minimal, contained side-effects (such as atomically updating a structure) may also constitute sound design, but that should be made explicit in the language. I don't disagree for cases like -Warray-bounds, but for warnings/errors that are more deterministic in nature (like -Werror=trampolines), they're going to point at actual problems and larger projects and distributions will usually prefer to at least track them, if not actually fix them. For Fedora we tend to provide macro overrides for packages that need to explicitly disable a security related flag. In projects such as mine, this will lead to a lot of code transformations as indicated above, i.e. much worse code. One could get away with it, since nested functions are rarely used, but I think this is bad, because a lot of code would improve if it used them. If nested functions are eventually going to make it into the C standard then effort is probably better spent in porting the C nested functions to use descriptors instead of executable stacks or heaps. Thanks, Sid
[gcc15] nested functions in C
[Branching this into a separate conversation to avoid derailing the patch, which isn't directly related] On 2023-12-04 12:21, Martin Uecker wrote: I do not really agree with that. Nested functions can substantially improve code quality and in C can avoid type unsafe use of void* pointers in callbacks. The code is often much better with nested functions than without. Nested functions and lambdas (i.e. anonymous nested functions) are used in many languages because they make code better and GNU's nested function are no exception. So I disagree with the idea that discouraging nested functions leads to better code - I think the exact opposite is true. I would argue that GNU's nested functions *are* an exception because they're like feathers stuck on a pig to try and make it fly; I think a significant specification effort is required to actually make it a cleanly usable feature. It *may* be possible to implement patterns that use C nested functions well enough *and* result in readable code, but IMO it is easier to write clunky and unmaintainable code with it. I empathize with Jakub's stated use case though of keeping the C frontend support for testing purposes, but that could easily be done behind a flag, or by putting nested C func deprecation behind a flag. I am generally wary of mitigations that may make exploitation of buffer overflows a bit harder while increasing the likelihood of buffer overflows by reducing type safety and/or code quality. But I would agree that trampolines are generally problematic. A better strategy would be wide function pointer type (as in Apple' Blocks extension). Alternatively, an explicit way to obtain the static chain for a nested function which could be used with __builtin_call_with_static_chain could also work. But in any case, I think it diminishes the value of -fhardening it if requires source code changes, because then it is not as easy to simply turn it on in larger projects / distributitions. I suppose you mean source code changes even in correct code just to comply with the flag? I don't disagree for cases like -Warray-bounds, but for warnings/errors that are more deterministic in nature (like -Werror=trampolines), they're going to point at actual problems and larger projects and distributions will usually prefer to at least track them, if not actually fix them. For Fedora we tend to provide macro overrides for packages that need to explicitly disable a security related flag. Thanks, Sid
Re: [PATCH] gcc: Disallow trampolines when -fhardened
On 2023-12-04 11:39, Andreas Schwab wrote: On Dez 04 2023, Siddhesh Poyarekar wrote: For hardened code in C, I think we really should look to step away from nested functions instead of adding ways to continue supporting it. There's probably a larger conversation to be had about the utility of nested functions in general for C (and whether this GCC extension should be deprecated altogether in future), but I feel like the -fhardened subset gives us the opportunity to enforce at least a safe subset for now, possibly extending it in future. Nested functions by itself don't need a trampoline, only if the address of it is passed outside the containing function's scope (as a callback, for example). Yes, that's why I said that the conversation about deprecating the C nested functions extension is a broader one (and hence for gcc 15) that will likely involve the question of whether dropping the extension altogether gives any benefit or if dropping support for on-stack trampolines is sufficient. On-heap trampolines are maybe slightly better in that they don't need an executable stack, but defaulting to on-heap trampolines for -fhardened seems like a lost opportunity to enforce better user code. Thanks, Sid
Re: [PATCH] gcc: Disallow trampolines when -fhardened
On 2023-12-02 04:42, Martin Uecker wrote: Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? -- >8 -- It came up that a good hardening strategy is to disable trampolines which may require executable stack. Therefore the following patch adds -Werror=trampolines to -fhardened. This would add a warning about specific code (where it is then unclear whether rewriting it is feasible or even an improvement), which seems different to all the other flags -fhardening has now. It's actually -Werror=trampolines, not just -Wtrampolines; the aim is to hard fail on producing trampolines and consequently, an executable stack. In general the goal of -fhardened is to produce hardened code and the nested function trampolines do the exact reverse of that, so -Werror=trampolines seems to align perfectly with that goal, doesn't it? GCC now has an option to allocate trampolines on the heap, which would seem to be a better fit. On the other hand, it does not work with longjmp which may be a limitation. For hardened code in C, I think we really should look to step away from nested functions instead of adding ways to continue supporting it. There's probably a larger conversation to be had about the utility of nested functions in general for C (and whether this GCC extension should be deprecated altogether in future), but I feel like the -fhardened subset gives us the opportunity to enforce at least a safe subset for now, possibly extending it in future. Thanks, Sid
Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
On 2023-11-02 10:12, Martin Uecker wrote: This shouldn't be necessary. The object-size pass can track pointer arithmeti if it comes after inserting the .ACCESS_WITH_SIZE. https://godbolt.org/z/fvc3aoPfd The problem is dependency tracking through the pointer arithmetic, which Jakub suggested to work around by passing a reference to the size in .ACCESS_WITH_SIZE to avoid DCE/reordering. Thanks, Sid
Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute
On 2023-10-31 12:26, Qing Zhao wrote: Hi, I wrote a summary based on our extensive discussion, hopefully this can be served as an informal proposal. Please take a look at it and let me know any comment or suggestion. There are some (???) in the section 3.2 and 3.6, those are my questions seeking for help. -:) Thanks again for all the help. Qing. Represent the missing dependence for the "counted_by" attribute and its consumers Qing Zhao 10/30/2023 == The whole discussion is at: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633783.html 1. The problem There is a data dependency between the size assignment and the implicit use of the size information in the __builtin_dynamic_object_size that is missing in the IL (line 11 and line 13 in the below example). Such information missing will result incorrect code reordering and other code transformations. 1 struct A 2 { 3 size_t size; 4 char buf[] __attribute__((counted_by(size))); 5 }; 6 7 size_t 8 foo (size_t sz) 9 { 10 struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char)); 11 obj->size = sz; 12 obj->buf[0] = 2; 13 return __builtin_dynamic_object_size (obj->buf, 1); 14 } Please see a more complicate example in the Appendex 1. We need to represent such data dependency correctly in the IL. 2. The solution: 2.1 Summary * Add a new internal function "ACCESS_WITH_SIZE" to carry the size information for every FAM field access; * In C FE, Replace every FAM field access whose TYPE has the "counted_by" attribute with the new internal function "ACCESS_WITH_SIZE"; * In every consumer of the size information, for example, BDOS or array bound sanitizer, query the size information or ACCESS_MODE information from the new internal function; * When the size information and the "ACCESS_MODE" information are not used anymore, possibly at the 2nd object size phase, replace the internal function with the actual FAM field access; * Some adjustment to inlining heuristic and some SSA passes to mitigate the impact to the optimizer and code generation. 2.2 The new internal function .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE) INTERNAL_FN (ACCESS_WITH_SIZE, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) which returns the "PTR" same as the 1st argument; 1st argument "PTR": Pointer to the object; 2nd argument "SIZE": The size of the pointed object, if the pointee of the "PTR" has a * real type, it's the number of the elements of the type; * void type, it's the number of bytes; 3rd argument "ACCESS_MODE": -1: Unknown access semantics 0: none 1: read_only 2: write_only 3: read_write NOTEs, A. This new internal function is intended for a more general use from all the 3 attributes, "access", "alloc_size", and the new "counted_by", to encode the "size" and "access_mode" information to the corresponding pointer. (in order to resolve PR96503, etc. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503) B. For "counted_by" and "alloc_size" attributes, the 3rd argument will be -1. C. In this wrieup, we focus on the implementation details for the "counted_by" attribute. However, this function should be ready to be used by "access" and "alloc_size" without issue. 2.3 A new semantic requirement in the user documentation of "counted_by" For the following structure including a FAM with a counted_by attribute: struct A { size_t size; char buf[] __attribute__((counted_by(size))); }; for any object with such type: struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char)); The setting to the size field should be done before the first reference to the FAM field. A more flexible specification could be stating that validation for a reference to the FAM field will use the latest value assigned to the size field before that reference. That will allow for situations like: o->size = val1; deref (o->buf); o->size = val2; making it clear that deref will see val1 and not val2. Such requirement to the user will guarantee that the first reference to the FAM knows the size of the FAM. We need to add this additional requirement to the user document. 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE In C FE: for every reference to a FAM, for example, "obj->buf" in the small example, check whether the corresponding FIELD_DECL has a "counted_by" attribute? if YES, replace the reference to "obj->buf" with a call to .ACCESS_WITH_SIZE (obj->buf, obj->size, -1); 2.5 Query the size info There are multiple consumers of the size info (and ACCESS_MODE info): * __builtin_dynamic_object_size; * array bound sanitizer; in these consumers, get the size info from the 2nd argument of the call to ACCESS_WITH_SIZE (PTR, SIZE, -1) 2.6 Eliminate the internal function when not useful anymore
Re: [PATCH] tree-optimization/109334: Improve computation for access attribute
On 2023-10-28 16:29, Martin Uecker wrote: Isn't this testcase h() in builtin-dynamic-object-size-20.c? If you're referring to testcase i(), then maybe "where the size is given by a non-trivial function of a function parameter, e.g. fn (size_t n, char buf[dummy(n)])." h() is supported. For i() we would need something as __builtin_access__with_size to record the result of dummy(). But the comment refers to the simpler case: fn (size_t n, char (*buf)[n]) [[gnu::access(read_write, 2, 1)]] This doesn't work because buf[n] does not have constant size, but it could be made to work more easily because the size is directly given by a function argument. Ah, so it would have been nice to have this more detailed explanation in the comment for clarity :) Thanks, Sid
Re: [PATCH] tree-optimization/109334: Improve computation for access attribute
On 2023-10-26 04:37, Martin Uecker wrote: Hi Sid and Jakub, here is the patch discussed in PR 109334. I can't approve, but here's a review: Martin tree-optimization/109334: Improve computation for access attribute The fix for PR104970 restricted size computations to the case where the access attribute was specified explicitly (no VLA). It also restricted it to void pointers or elements with constant sizes. The second restriction is enough to fix the original bug. Revert the first change to again allow size computations for VLA parameters and for VLA parameters together with an explicit access attribute. gcc/ChangeLog: PR tree-optimization/109334 * tree-object-size.cc (parm_object_size): Allow size computation for explicit access attributes. gcc/testsuite/ChangeLog: PR tree-optimization/109334 * gcc.dg/builtin-dynamic-object-size-20.c (test_parmsz_simple3): Supported again. (test_parmsz_external4): New test. * gcc.dg/builtin-dynamic-object-size-20.c: New test. * gcc.dg/pr104970.c: New test. diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c index 6da04202ffe..07e3da6f254 100644 --- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c +++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c @@ -455,7 +455,6 @@ test_parmsz_simple2 (size_t sz, char obj[]) return __builtin_dynamic_object_size (obj, 0); } -/* Implicitly constructed access attributes not supported yet. */ size_t __attribute__ ((noinline)) test_parmsz_simple3 (size_t sz, char obj[sz]) @@ -527,6 +526,13 @@ test_parmsz_internal3 (size_t sz1, size_t sz2, double obj[sz1][sz2]) return __builtin_dynamic_object_size (obj, 0); } This test case now works. OK. +size_t +__attribute__ ((noinline)) +test_parmsz_internal4 (size_t sz1, size_t sz2, double obj[sz1 + 1][4]) +{ + return __builtin_dynamic_object_size (obj, 0); +} + New test case that isn't supported yet. OK. /* Loops. */ size_t @@ -721,8 +727,8 @@ main (int argc, char **argv) if (test_parmsz_simple2 (__builtin_strlen (argv[0]) + 1, argv[0]) != __builtin_strlen (argv[0]) + 1) FAIL (); - /* Only explicitly added access attributes are supported for now. */ - if (test_parmsz_simple3 (__builtin_strlen (argv[0]) + 1, argv[0]) != -1) + if (test_parmsz_simple3 (__builtin_strlen (argv[0]) + 1, argv[0]) + != __builtin_strlen (argv[0]) + 1) FAIL (); int arr[42]; if (test_parmsz_scaled (arr, 42) != sizeof (arr)) @@ -759,6 +765,8 @@ main (int argc, char **argv) FAIL (); if (test_parmsz_internal3 (4, 4, obj) != -1) FAIL (); + if (test_parmsz_internal4 (3, 4, obj) != -1) +FAIL (); if (test_loop (arr, 42, 0, 32, 1) != 10 * sizeof (int)) FAIL (); if (test_loop (arr, 42, 32, -1, -1) != 0) diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-20.c b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-20.c new file mode 100644 index 000..2c8e07dd98d --- /dev/null +++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-20.c @@ -0,0 +1,49 @@ +/* PR 109334 + * { dg-do run } + * { dg-options "-O1" } */ + + +[[gnu::noinline,gnu::noipa]] +int f(int n, int buf[n]) +[[gnu::access(read_only, 2, 1)]] +{ +return __builtin_dynamic_object_size(buf, 0); +} + +[[gnu::noinline,gnu::noipa]] +int g(int n, int buf[]) +[[gnu::access(read_only, 2, 1)]] +{ +return __builtin_dynamic_object_size(buf, 0); +} + +[[gnu::noinline,gnu::noipa]] +int h(int n, int buf[n]) +{ +return __builtin_dynamic_object_size(buf, 0); +} + +int dummy(int x) { return x + 1; } + +[[gnu::noinline,gnu::noipa]] +int i(int n, int buf[dummy(n)]) +{ +return __builtin_dynamic_object_size(buf, 0); +} + +int main() +{ +int n = 10; +int buf[n]; +if (n * sizeof(int) != f(n, buf)) +__builtin_abort(); +if (n * sizeof(int) != g(n, buf)) +__builtin_abort(); +if (n * sizeof(int) != h(n, buf)) +__builtin_abort(); + +(void)i(n, buf); f(), g(), h() supported, but i() isn't. OK. + +return 0; +} + diff --git a/gcc/testsuite/gcc.dg/pr104970.c b/gcc/testsuite/gcc.dg/pr104970.c new file mode 100644 index 000..e24a7f22dfb --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr104970.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -D_FORTIFY_SOURCE=2" } */ The -D_FORTIFY_SOURCE=2 shouldn't be necessary since it doesn't really do anything in the context of this test. + +__inline void +memset2(void *__dest, int __ch, long __len) { + long __trans_tmp_1 = __builtin_dynamic_object_size(__dest, 0); + __builtin___memset_chk(__dest, __ch, __len, __trans_tmp_1); +} + +void +mleye(int l, double E[][l]) { memset2(E, 0, sizeof(double)); } New regression test for the ICE reported in
Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)
On 2023-10-25 09:27, Qing Zhao wrote: On Oct 24, 2023, at 7:56 PM, Siddhesh Poyarekar wrote: On 2023-10-24 18:51, Qing Zhao wrote: Thanks for the proposal! So what you suggested is: For every x.buf, change it as a __builtin_with_size(x.buf, x.L) in the FE, then the call to the _bdos (x.buf, 1) will Become: _bdos(__builtin_with_size(x.buf, x.L), 1)? Then the implicit use of x.L in _bdos(x.buf.1) will become explicit? Oops, I think Martin and I fell off-list in a subthread. I clarified that my comment was that any such annotation at object reference is probably too late and hence not the right place for it; basically it has the same problems as the option A in your comment. A better place to reinforce such a relationship would be the allocation+initialization site instead. I think Martin’s proposal might work, it’s different than the option A: A. Add an additional argument, the size parameter, to __bdos, A.1, during FE; A.2, during gimplification phase; Option A targets on the __bdos call, try to encode the implicit use to the call, this will not work when the real object has not been instantiation at the call site. However, Martin’s proposal targets on the FMA array itself, it will enhance the FAM access naturally with the size information. And such FAM access with size info will propagated to the __bdos site later through inlining, etc. and then tree-object-size can use the size information at that point. At the same time, the implicit use of the size is recorded correctly. So, I think that this proposal is natural and reasonable. Ack, we discussed this later in the thread and I agree[1]. Richard still has concerns[2] that I think may be addressed by putting __builtin_with_size at the point where the reference to x.buf escapes, but I'm not very sure about that. Oh, and Martin suggested using __builtin_with_size more generally[3] in bugzilla to address attribute inlining issues and we have high level consensus for a __builtin_with_access instead, which associates access type in addition to size with the target object. For the purposes of counted_by, access type could simply be -1. Thanks, Sid [1] https://inbox.sourceware.org/gcc-patches/73af949c-3caa-4b11-93ce-3064b95a9...@gotplt.org/T/#m4f3cafa489493180e258fd62aca0196a5f244039 [2] https://inbox.sourceware.org/gcc-patches/73af949c-3caa-4b11-93ce-3064b95a9...@gotplt.org/T/#mcf226f891621db8b640deaedd8942bb8519010f3 [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503#c6
Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)
On 2023-10-25 04:16, Martin Uecker wrote: Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener: Am 24.10.2023 um 22:38 schrieb Martin Uecker : Am Dienstag, dem 24.10.2023 um 20:30 + schrieb Qing Zhao: Hi, Sid, Really appreciate for your example and detailed explanation. Very helpful. I think that this example is an excellent example to show (almost) all the issues we need to consider. I slightly modified this example to make it to be compilable and run-able, as following: (but I still cannot make the incorrect reordering or DSE happening, anyway, the potential reordering possibility is there…) 1 #include 2 struct A 3 { 4 size_t size; 5 char buf[] __attribute__((counted_by(size))); 6 }; 7 8 static size_t 9 get_size_from (void *ptr) 10 { 11 return __builtin_dynamic_object_size (ptr, 1); 12 } 13 14 void 15 foo (size_t sz) 16 { 17 struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char)); 18 obj->size = sz; 19 obj->buf[0] = 2; 20 __builtin_printf (“%d\n", get_size_from (obj->buf)); 21 return; 22 } 23 24 int main () 25 { 26 foo (20); 27 return 0; 28 } When it’s set I suppose. Turn X.l = n; Into X.l = __builtin_with_size (x.buf, n); It would turn some_variable = (&) x.buf into some_variable = __builtin_with_size ( (&) x.buf. x.len) So the later access to x.buf and not the initialization of a member of the struct (which is too early). Hmm, so with Qing's example above, are you suggesting the transformation be to foo like so: 14 void 15 foo (size_t sz) 16 { 16.5 void * _1; 17 struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char)); 18 obj->size = sz; 19 obj->buf[0] = 2; 19.5 _1 = __builtin_with_size (obj->buf, obj->size); 20 __builtin_printf (“%d\n", get_size_from (_1)); 21 return; 22 } If yes then this could indeed work. I think I got thrown off by the reference to __bdos. Thanks, Sid
Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)
On 2023-10-24 18:51, Qing Zhao wrote: Thanks for the proposal! So what you suggested is: For every x.buf, change it as a __builtin_with_size(x.buf, x.L) in the FE, then the call to the _bdos (x.buf, 1) will Become: _bdos(__builtin_with_size(x.buf, x.L), 1)? Then the implicit use of x.L in _bdos(x.buf.1) will become explicit? Oops, I think Martin and I fell off-list in a subthread. I clarified that my comment was that any such annotation at object reference is probably too late and hence not the right place for it; basically it has the same problems as the option A in your comment. A better place to reinforce such a relationship would be the allocation+initialization site instead. Thanks, Sid
Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)
On 2023-10-24 18:41, Qing Zhao wrote: On Oct 24, 2023, at 5:03 PM, Siddhesh Poyarekar wrote: On 2023-10-24 16:30, Qing Zhao wrote: Situation 2: With O0, the routine “get_size_from” was NOT inlined into “foo”, therefore, the call to __bdos is Not in the same routine as the instantiation of the object, As a result, the TYPE info and the attached counted_by info of the object can NOT be USED by the __bdos call. But __bos/__bdos are barely useful without optimization; you need a minimum of -O1. You're right that if the call is never inlined then we don't care because the __bdos call does not get expanded to obj->size. However, the point of situation 2 is that the TYPE info cannot be used by the __bdos call *only for a while* (i.e. until the call gets inlined) and that window is an opportunity for the reordering/DSE to break things. The main point of situation 2 I tried made: there are situations where obj->size is not used at all by the __bdos, marking it as volatile is too conservative, unnecessarily prevent useful optimizations from happening. -:) Yes, that's the tradeoff. However, maybe this is the point where Kees jumps in and say the kernel doesn't really care as much or something like that :) Sid
Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)
On 2023-10-24 16:38, Martin Uecker wrote: Here is another proposal: Add a new builtin function __builtin_with_size(x, size) that return x but behaves similar to an allocation function in that BDOS can look at the size argument to discover the size. The FE insers this function when the field is accessed: __builtin_with_size(x.buf, x.L); In fact if we do this at the allocation site for x, it may also help with future warnings, where the compiler could flag a warning or error when it encounters this builtin but does not see an assignment to x.L. Thanks, Sid
Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)
On 2023-10-24 16:30, Qing Zhao wrote: Situation 2: With O0, the routine “get_size_from” was NOT inlined into “foo”, therefore, the call to __bdos is Not in the same routine as the instantiation of the object, As a result, the TYPE info and the attached counted_by info of the object can NOT be USED by the __bdos call. But __bos/__bdos are barely useful without optimization; you need a minimum of -O1. You're right that if the call is never inlined then we don't care because the __bdos call does not get expanded to obj->size. However, the point of situation 2 is that the TYPE info cannot be used by the __bdos call *only for a while* (i.e. until the call gets inlined) and that window is an opportunity for the reordering/DSE to break things. Thanks. Sid
Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)
On 2023-10-23 15:43, Qing Zhao wrote: On Oct 23, 2023, at 2:43 PM, Siddhesh Poyarekar wrote: On 2023-10-23 14:06, Martin Uecker wrote: We should aim for a good integration with the BDOS pass, so that it can propagate the information further, e.g. the following should work: struct { int L; char buf[] __counted_by(L) } x; x.L = N; x.buf = ...; char *p = >f; __bdos(p) -> N So we need to be smart on how we provide the size information for x->f to the backend. This would also be desirable for the language extension. This is essentially why there need to be frontend rules constraining reordering and reachability semantics of x.L, thus restricting DSE and reordering for it. My understanding is that Restricting DSE and reordering should be done by the proper data flow information, with a new argument added to the BDOS call, this correct data flow information could be maintained, and then the DSE and reordering will not happen. I don’t quite understand what kind of frontend rules should be added to constrain reordering and reachability semantics? Can you explain this a little bit more? Do you mean to add some rules or requirment to the new attribute that the users of the attribute should follow in the source code? Yes, but let me try and summarize the issues and the potential solutions at the end: This is not really a __bdos/__bos question, because that bit is trivial; if the structure is visible, the value is simply x.L. This is also why adding a reference to x.L in __bos/__bdos is not sufficient or even possible in, e.g. the above case you note. I am a little confused here, are we discussing how to resolve the potential reordering issue of the following: " struct annotated { size_t foo; char array[] __attribute__((counted_by (foo))); }; p->foo = 10; size = __builtin_dynamic_object_size (p->array,1); “? Or a bigger issue? Right, so the problem we're trying to solve is the reordering of __bdos w.r.t. initialization of the size parameter but to also account for DSE of the assignment, we can abstract this problem to that of DFA being unable to see implicit use of the size parameter. __bdos is the one such implicit user of the size parameter and you're proposing to solve this by encoding the relationship between buffer and size at the __bdos call site. But what about the case when the instantiation of the object is not at the same place as the __bdos call site, i.e. the DFA is unable to make that relationship? The example Martin showed where the subobject gets "hidden" behind a pointer was a trivial one where DFA *may* actually work in practice (because the object-size pass can thread through these assignments) but think about this one: struct A { size_t size; char buf[] __attribute__((counted_by(size))); } static size_t get_size_of (void *ptr) { return __bdos (ptr, 1); } void foo (size_t sz) { struct A *obj = __builtin_malloc (sz); obj->size = sz; ... __builtin_printf ("%zu\n", get_size_of (obj->array)); ... } Until get_size_of is inlined, no DFA can see the __bdos call in the same place as the point where obj is allocated. As a result, the assignment to obj->size could get reordered (or the store eliminated) w.r.t. the __bdos call until the inlining happens. As a result, the relationship between buf and size established by the attribute needs to be encoded into the type somehow. There are two options: Option 1: Encode the relationship in the type of buf This is kinda what you end up doing with component_ref_has_counted_by and it does show the relationship if one is looking (through that call), but nothing more that can be used to, e.g. prevent reordering or tell the optimizer that the reference to the buf member may imply a reference to the size member as well. This could be remedied by somehow encoding the USES relationship for size into the type of buf that the optimization passes can see. I feel like this may be a bit convoluted to specify in a future language extension in a way that will actually be well understood by developers, but it will likely generate faster runtime code. This will also likely require a bigger change across passes. Option 2: Encode the relationship in the type of size The other option is to enhance the type of size somehow so that it discourages reordering and store elimination, basically pessimizing code. I think volatile semantics might be the way to do this and may even be straightforward to specify in the future language extension given that it builds on a known language construct and is thematically related. However it does pessimize output for code that implements __counted_by__. Thanks, Sid
Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)
On 2023-10-23 14:06, Martin Uecker wrote: We should aim for a good integration with the BDOS pass, so that it can propagate the information further, e.g. the following should work: struct { int L; char buf[] __counted_by(L) } x; x.L = N; x.buf = ...; char *p = >f; __bdos(p) -> N So we need to be smart on how we provide the size information for x->f to the backend. This would also be desirable for the language extension. This is essentially why there need to be frontend rules constraining reordering and reachability semantics of x.L, thus restricting DSE and reordering for it. This is not really a __bdos/__bos question, because that bit is trivial; if the structure is visible, the value is simply x.L. This is also why adding a reference to x.L in __bos/__bdos is not sufficient or even possible in, e.g. the above case you note. Thanks, Sid
Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)
On 2023-10-23 08:34, Richard Biener wrote: A related issue is that assignment to the field and storage allocation are not tied together - if there's no use of the size data we might remove the store of it as dead. Maybe the trick then is to treat the size data as volatile? That ought to discourage reordering and also prevent elimination of the "dead" store? But we are an optimizing compiler, not a static analysis machine, so I fail to see how this is a useful suggestion. Sorry I didn't meant to suggest doing this in the middle-end. I think Martins suggestion to approach this as a language extension is more useful and would make it easier to handle this? I think handling for this (e.g. treating any storage allocated for the size member in the struct as volatile to prevent reordering or elimination) would have to be implemented in the front-end, regardless of whether it is a language extension or as a gcc attribute. How would making it a language extension vs a gcc attribute make it different? Thanks, Sid
Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)
On 2023-10-23 03:57, Richard Biener wrote: On Fri, Oct 20, 2023 at 10:41 PM Qing Zhao wrote: On Oct 20, 2023, at 3:10 PM, Siddhesh Poyarekar wrote: On 2023-10-20 14:38, Qing Zhao wrote: How about the following: Add one more parameter to __builtin_dynamic_object_size(), i.e __builtin_dynamic_object_size (_1,1,array_annotated->foo)? When we see the structure field has counted_by attribute. Or maybe add a barrier preventing any assignments to array_annotated->foo from being reordered below the __bdos call? Basically an __asm__ with array_annotated->foo in the clobber list ought to do it I think. Maybe just adding the array_annotated->foo to the use list of the call to __builtin_dynamic_object_size should be enough? But I am not sure how to implement this in the TREE level, is there a USE_LIST/CLOBBER_LIST for each call? Then I can just simply add the counted_by field “array_annotated->foo” to the USE_LIST of the call to __bdos? This might be the simplest solution? If the dynamic object size is derived of a field then I think you need to put the "load" of that memory location at the point (as argument) of the __bos call right at parsing time. I know that's awkward because you try to play tricks "discovering" that field only late, but that's not going to work. A related issue is that assignment to the field and storage allocation are not tied together - if there's no use of the size data we might remove the store of it as dead. Maybe the trick then is to treat the size data as volatile? That ought to discourage reordering and also prevent elimination of the "dead" store? Thanks, Sid
Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)
On 2023-10-20 14:38, Qing Zhao wrote: How about the following: Add one more parameter to __builtin_dynamic_object_size(), i.e __builtin_dynamic_object_size (_1,1,array_annotated->foo)? When we see the structure field has counted_by attribute. Or maybe add a barrier preventing any assignments to array_annotated->foo from being reordered below the __bdos call? Basically an __asm__ with array_annotated->foo in the clobber list ought to do it I think. It may not work for something like this though: static size_t get_size_of (void *ptr) { return __bdos (ptr, 1); } void foo (size_t sz) { array_annotated = __builtin_malloc (sz); array_annotated = sz; ... __builtin_printf ("%zu\n", get_size_of (array_annotated->foo)); ... } because the call to get_size_of () may not have been inlined that early. The more fool-proof alternative may be to put a compile time barrier right below the assignment to array_annotated->foo; I reckon you could do that early in the front end by marking the size identifier and then tracking assignments to that identifier. That may have a slight runtime performance overhead since it may prevent even legitimate reordering. I can't think of another alternative at the moment... Sid
Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)
[Sorry, I forgot to respond to this] On 2023-10-06 16:01, Martin Uecker wrote: Am Freitag, dem 06.10.2023 um 06:50 -0400 schrieb Siddhesh Poyarekar: On 2023-10-06 01:11, Martin Uecker wrote: Am Donnerstag, dem 05.10.2023 um 15:35 -0700 schrieb Kees Cook: On Thu, Oct 05, 2023 at 04:08:52PM -0400, Siddhesh Poyarekar wrote: 2. How would you handle signedness of the size field? The size gets converted to sizetype everywhere it is used and overflows/underflows may produce interesting results. Do you want to limit the types to unsigned or do you want to add a disclaimer in the docs? The former seems like the *right* thing to do given that it is a new feature; best to enforce the cleaner habit at the outset. The Linux kernel has a lot of "int" counters, so the goal is to catch negative offsets just like too-large offsets at runtime with the sanitizer and report 0 for __bdos. Refactoring all these to be unsigned is going to take time since at least some of them use the negative values as special values unrelated to array indexing. :( So, perhaps if unsigned counters are worth enforcing, can this be a separate warning the kernel can turn off initially? I think unsigned counters are much more problematic than signed ones because wraparound errors are more difficult to find. With unsigned you could potentially diagnose wraparound, but only if we add -fsanitize=unsigned-overflow *and* add mechanism to mark intentional wraparound *and* everybody adds this annotation after carefully screening their code *and* rewriting all operations such as (counter - 3) + 5 where the wraparound in the intermediate expression is harmless. For this reason, I do not think we should ever enforce some rule that the counter has to be unsigned. What we could do, is detect *storing* negative values into the counter at run-time using UBSan. (but if negative values are used for special cases, one also should be able to turn this off). All of the object size detection relies on object sizes being sizetype. The closest we could do with that is detect (sz != SIZE_MAX && sz > size_t / 2), since allocators typically cannot allocate more than SIZE_MAX / 2. I was talking about the counter in: struct { int counter; char buf[] __counted_by__((counter)) }; which could be checked to be positive either when stored to or when buf is used. And yes, we could also check the size of buf. Not sure what is done for VLAs now, but I guess it could be similar. Right now all object sizes are cast to sizetype and the generated dynamic expressions are such that overflows will result in the computed object size being zero. Non-generated expressions (like we could get with __counted_by__) will simply be cast; there's probably scope for improvement here, where we wrap that with an expression that returns 0 if the size exceeds SIZE_MAX / 2 since that's typically the limit for allocators. We use that heuristic elsewhere in the __bos/__bdos logic too. Thanks, Sid
Re: [V3][PATCH 1/3] Provide counted_by attribute to flexible array member field (PR108896)
On 2023-10-18 10:51, Qing Zhao wrote: + member FIELD_DECL is a valid field of the containing structure's fieldlist, + FIELDLIST, Report error and remove this attribute when it's not. */ +static void +verify_counted_by_attribute (tree fieldlist, tree field_decl) +{ + tree attr_counted_by = lookup_attribute ("counted_by", + DECL_ATTRIBUTES (field_decl)); + + if (!attr_counted_by) +return; + + /* If there is an counted_by attribute attached to the field, + verify it. */ + + const char *fieldname += IDENTIFIER_POINTER (TREE_VALUE (TREE_VALUE (attr_counted_by))); + + /* Verify the argument of the attrbute is a valid field of the s/attrbute/attribute/ + containing structure. */ + + tree counted_by_field = get_named_field (fieldlist, fieldname); + + /* Error when the field is not found in the containing structure. */ + if (!counted_by_field) +{ + error_at (DECL_SOURCE_LOCATION (field_decl), +"%qE attribute argument not a field declaration" +" in the same structure, ignore it", +(get_attribute_name (attr_counted_by))); Probably someone with English as a first language would make a better suggestion, but how about: Argument specified in %qE attribute is not a field declaration in the same structure, ignoring it. + + DECL_ATTRIBUTES (field_decl) += remove_attribute ("counted_by", DECL_ATTRIBUTES (field_decl)); +} + else + /* Error when the field is not with an integer type. */ Suggest: Flag an error when the field is not of an integer type. +{ + while (TREE_CHAIN (counted_by_field)) +counted_by_field = TREE_CHAIN (counted_by_field); + tree real_field = TREE_VALUE (counted_by_field); + + if (TREE_CODE (TREE_TYPE (real_field)) != INTEGER_TYPE) +{ + error_at (DECL_SOURCE_LOCATION (field_decl), + "%qE attribute argument not a field declaration" + " with integer type, ignore it", + (get_attribute_name (attr_counted_by))); Suggest: Argument specified in %qE attribute is not of an integer type, ignoring it. + + DECL_ATTRIBUTES (field_decl) += remove_attribute ("counted_by", DECL_ATTRIBUTES (field_decl)); +} +} + + return; I forgot to mention the redundant return here. Could you please clarify a little bit here, why the return here is redundant? It's the last line in the function, so even without that statement the function will return. Thanks, Sid
Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)
On 2023-10-06 01:11, Martin Uecker wrote: Am Donnerstag, dem 05.10.2023 um 15:35 -0700 schrieb Kees Cook: On Thu, Oct 05, 2023 at 04:08:52PM -0400, Siddhesh Poyarekar wrote: 2. How would you handle signedness of the size field? The size gets converted to sizetype everywhere it is used and overflows/underflows may produce interesting results. Do you want to limit the types to unsigned or do you want to add a disclaimer in the docs? The former seems like the *right* thing to do given that it is a new feature; best to enforce the cleaner habit at the outset. The Linux kernel has a lot of "int" counters, so the goal is to catch negative offsets just like too-large offsets at runtime with the sanitizer and report 0 for __bdos. Refactoring all these to be unsigned is going to take time since at least some of them use the negative values as special values unrelated to array indexing. :( So, perhaps if unsigned counters are worth enforcing, can this be a separate warning the kernel can turn off initially? I think unsigned counters are much more problematic than signed ones because wraparound errors are more difficult to find. With unsigned you could potentially diagnose wraparound, but only if we add -fsanitize=unsigned-overflow *and* add mechanism to mark intentional wraparound *and* everybody adds this annotation after carefully screening their code *and* rewriting all operations such as (counter - 3) + 5 where the wraparound in the intermediate expression is harmless. For this reason, I do not think we should ever enforce some rule that the counter has to be unsigned. What we could do, is detect *storing* negative values into the counter at run-time using UBSan. (but if negative values are used for special cases, one also should be able to turn this off). All of the object size detection relies on object sizes being sizetype. The closest we could do with that is detect (sz != SIZE_MAX && sz > size_t / 2), since allocators typically cannot allocate more than SIZE_MAX / 2. Sid
Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)
On 05-Oct-2023 18:35, Kees Cook wrote:On Thu, Oct 05, 2023 at 04:08:52PM -0400, Siddhesh Poyarekar wrote: > 2. How would you handle signedness of the size field? The size gets > converted to sizetype everywhere it is used and overflows/underflows may > produce interesting results. Do you want to limit the types to unsigned or > do you want to add a disclaimer in the docs? The former seems like the > *right* thing to do given that it is a new feature; best to enforce the > cleaner habit at the outset. The Linux kernel has a lot of "int" counters, so the goal is to catch negative offsets just like too-large offsets at runtime with the sanitizer and report 0 for __bdos. Refactoring all these to be unsigned is going to take time since at least some of them use the negative values as special values unrelated to array indexing. :( So, perhaps if unsigned counters are worth enforcing, can this be a separate warning the kernel can turn off initially?That should be fine, I just want to be sure we're thinking about this during the design. In that case we should probably add negative offset tests to ensure that we're actually catching these issues with __bdos.Thanks,Sid
Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)
On 2023-08-25 11:24, Qing Zhao wrote: This is the 3rd version of the patch, per our discussion based on the review comments for the 1st and 2nd version, the major changes in this version are: Hi Qing, I hope the review was helpful. Overall, a couple of things to consider: 1. How would you handle potential reordering between assignment of the size to the counted_by field with the __bdos call that may consume it? You'll probably need to express some kind of dependency there or in the worst case, insert a barrier to disallow reordering. 2. How would you handle signedness of the size field? The size gets converted to sizetype everywhere it is used and overflows/underflows may produce interesting results. Do you want to limit the types to unsigned or do you want to add a disclaimer in the docs? The former seems like the *right* thing to do given that it is a new feature; best to enforce the cleaner habit at the outset. Thanks, Sid ***Against 1st version: 1. change the name "element_count" to "counted_by"; 2. change the parameter for the attribute from a STRING to an Identifier; 3. Add logic and testing cases to handle anonymous structure/unions; 4. Clarify documentation to permit the situation when the allocation size is larger than what's specified by "counted_by", at the same time, it's user's error if allocation size is smaller than what's specified by "counted_by"; 5. Add a complete testing case for using counted_by attribute in __builtin_dynamic_object_size when there is mismatch between the allocation size and the value of "counted_by", the expecting behavior for each case and the explanation on why in the comments. ***Against 2rd version: 1. Identify a tree node sharing issue and fixed it in the routine "component_ref_get_counted_ty" of tree.cc; 2. Update the documentation and testing cases with the clear usage of the fomula to compute the allocation size: MAX (sizeof (struct A), offsetof (struct A, array[0]) + counted_by * sizeof(element)) (the algorithm used in tree-object-size.cc is correct). In this set of patches, the major functionality provided is: 1. a new attribute "counted_by"; 2. use this new attribute in bound sanitizer; 3. use this new attribute in dynamic object size for subobject size; As discussed, I plan to add two more separate patches sets after this initial patch set is approved and committed. set 1. A new warning option and a new sanitizer option for the user error when the allocation size is smaller than the value of "counted_by". set 2. An improvement to __builtin_dynamic_object_size for whole-object size of the structure with FAM annaoted with counted_by. there are also some existing bugs in tree-object-size.cc identified during the study, and PRs were filed to record them. these bugs will be fixed seperately with individual patches: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111030 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111040 Bootstrapped and regression tested on both aarch64 and X86, no issue. Please see more details on the description of this work on: https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619708.html and more discussions on https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626376.html Okay for committing? thanks. Qing Qing Zhao (3): Provide counted_by attribute to flexible array member field (PR108896) Use the counted_by atribute info in builtin object size [PR108896] Use the counted_by attribute information in bound sanitizer[PR108896] gcc/c-family/c-attribs.cc | 54 - gcc/c-family/c-common.cc | 13 ++ gcc/c-family/c-common.h | 1 + gcc/c-family/c-ubsan.cc | 16 ++ gcc/c/c-decl.cc | 79 +-- gcc/doc/extend.texi | 77 +++ .../gcc.dg/flex-array-counted-by-2.c | 74 ++ .../gcc.dg/flex-array-counted-by-3.c | 210 ++ gcc/testsuite/gcc.dg/flex-array-counted-by.c | 40 .../ubsan/flex-array-counted-by-bounds-2.c| 27 +++ .../ubsan/flex-array-counted-by-bounds.c | 46 gcc/tree-object-size.cc | 37 ++- gcc/tree.cc | 133 +++ gcc/tree.h| 15 ++ 14 files changed, 797 insertions(+), 25 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by.c create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c
Re: [V3][PATCH 2/3] Use the counted_by atribute info in builtin object size [PR108896]
On 2023-08-25 11:24, Qing Zhao wrote: Use the counted_by atribute info in builtin object size to compute the subobject size for flexible array members. gcc/ChangeLog: PR C/108896 * tree-object-size.cc (addr_object_size): Use the counted_by attribute info. * tree.cc (component_ref_has_counted_by_p): New function. (component_ref_get_counted_by): New function. * tree.h (component_ref_has_counted_by_p): New prototype. (component_ref_get_counted_by): New prototype. gcc/testsuite/ChangeLog: PR C/108896 * gcc.dg/flex-array-counted-by-2.c: New test. * gcc.dg/flex-array-counted-by-3.c: New test. --- .../gcc.dg/flex-array-counted-by-2.c | 74 ++ .../gcc.dg/flex-array-counted-by-3.c | 210 ++ gcc/tree-object-size.cc | 37 ++- gcc/tree.cc | 95 +++- gcc/tree.h| 10 + 5 files changed, 418 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-2.c b/gcc/testsuite/gcc.dg/flex-array-counted-by-2.c new file mode 100644 index ..ec580c1f1f01 --- /dev/null +++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-2.c @@ -0,0 +1,74 @@ +/* test the attribute counted_by and its usage in + * __builtin_dynamic_object_size. */ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +#include "builtin-object-size-common.h" + +#define expect(p, _v) do { \ +size_t v = _v; \ +if (p == v) \ + __builtin_printf ("ok: %s == %zd\n", #p, p); \ +else \ + { \ + __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v); \ + FAIL (); \ + } \ +} while (0); You're using this in a bunch of tests already; does it make sense to consolidate it into builtin-object-size-common.h? + +struct flex { + int b; + int c[]; +} *array_flex; + +struct annotated { + int b; + int c[] __attribute__ ((counted_by (b))); +} *array_annotated; + +struct nested_annotated { + struct { +union { + int b; + float f; +}; +int n; + }; + int c[] __attribute__ ((counted_by (b))); +} *array_nested_annotated; + +void __attribute__((__noinline__)) setup (int normal_count, int attr_count) +{ + array_flex += (struct flex *)malloc (sizeof (struct flex) ++ normal_count * sizeof (int)); + array_flex->b = normal_count; + + array_annotated += (struct annotated *)malloc (sizeof (struct annotated) + + attr_count * sizeof (int)); + array_annotated->b = attr_count; + + array_nested_annotated += (struct nested_annotated *)malloc (sizeof (struct nested_annotated) ++ attr_count * sizeof (int)); + array_nested_annotated->b = attr_count; + + return; +} + +void __attribute__((__noinline__)) test () +{ +expect(__builtin_dynamic_object_size(array_flex->c, 1), -1); +expect(__builtin_dynamic_object_size(array_annotated->c, 1), + array_annotated->b * sizeof (int)); +expect(__builtin_dynamic_object_size(array_nested_annotated->c, 1), + array_nested_annotated->b * sizeof (int)); +} Maybe another test where the allocation, size assignment and __bdos call happen in the same function, where the allocator is not recognized by gcc: void * __attribute__ ((noinline)) alloc (size_t sz) { return __builtin_malloc (sz); } void test (size_t sz) { array_annotated = alloc (sz); array_annotated->b = sz; return __builtin_dynamic_object_size (array_annotated->c, 1); } The interesting thing to test (and ensure in the codegen) is that the assignment to array_annotated->b does not get reordered to below the __builtin_dynamic_object_size call since technically there is no data dependency between the two. + +int main(int argc, char *argv[]) +{ + setup (10,10); + test (); + DONE (); +} diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c new file mode 100644 index ..a0c3cb88ec71 --- /dev/null +++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c @@ -0,0 +1,210 @@ +/* test the attribute counted_by and its usage in +__builtin_dynamic_object_size: what's the correct behavior when the +allocation size mismatched with the value of counted_by attribute? */ If the behaviour is undefined, does it make sense to add tests for this? Maybe once you have a -Wmismatched-counted-by or similar, we could have tests for that. I guess the counter-argument is that we keep track of this behaviour but not necessarily guarantee it. +/* { dg-do run } */ +/* { dg-options "-O -fstrict-flex-arrays=3" } */ + +#include "builtin-object-size-common.h" + +struct annotated { + size_t foo; + char others; + char array[]
Re: [V3][PATCH 1/3] Provide counted_by attribute to flexible array member field (PR108896)
On 2023-10-05 14:51, Siddhesh Poyarekar wrote: On 2023-08-25 11:24, Qing Zhao wrote: Provide a new counted_by attribute to flexible array member field. The obligatory "I can't ack the patch but here's a review" disclaimer :) 'counted_by (COUNT)' The 'counted_by' attribute may be attached to the flexible array member of a structure. It indicates that the number of the elements of the array is given by the field named "COUNT" in the same structure as the flexible array member. GCC uses this information to improve the results of the array bound sanitizer and the '__builtin_dynamic_object_size'. For instance, the following code: struct P { size_t count; char other; char array[] __attribute__ ((counted_by (count))); } *p; specifies that the 'array' is a flexible array member whose number of elements is given by the field 'count' in the same structure. The field that represents the number of the elements should have an integer type. An explicit 'counted_by' annotation defines a relationship between two objects, 'p->array' and 'p->count', that 'p->array' has _at least_ 'p->count' number of elements available. This relationship must hold even after any of these related objects are updated. It's the user's responsibility to make sure this relationship to be kept all the time. Otherwise the results of the array bound sanitizer and the '__builtin_dynamic_object_size' might be incorrect. For instance, in the following example, the allocated array has less elements than what's specified by the 'sbuf->count', this is an user error. As a result, out-of-bounds access to the array might not be detected. #define SIZE_BUMP 10 struct P *sbuf; void alloc_buf (size_t nelems) { sbuf = (struct P *) malloc (MAX (sizeof (struct P), (offsetof (struct P, array[0]) + nelems * sizeof (char; sbuf->count = nelems + SIZE_BUMP; /* This is invalid when the sbuf->array has less than sbuf->count elements. */ } In the following example, the 2nd update to the field 'sbuf->count' of the above structure will permit out-of-bounds access to the array 'sbuf>array' as well. #define SIZE_BUMP 10 struct P *sbuf; void alloc_buf (size_t nelems) { sbuf = (struct P *) malloc (MAX (sizeof (struct P), (offsetof (struct P, array[0]) + (nelems + SIZE_BUMP) * sizeof (char; sbuf->count = nelems; /* This is valid when the sbuf->array has at least sbuf->count elements. */ } void use_buf (int index) { sbuf->count = sbuf->count + SIZE_BUMP + 1; /* Now the value of sbuf->count is larger than the number of elements of sbuf->array. */ sbuf->array[index] = 0; /* then the out-of-bound access to this array might not be detected. */ } gcc/c-family/ChangeLog: PR C/108896 * c-attribs.cc (handle_counted_by_attribute): New function. (attribute_takes_identifier_p): Add counted_by attribute to the list. * c-common.cc (c_flexible_array_member_type_p): ...To this. * c-common.h (c_flexible_array_member_type_p): New prototype. gcc/c/ChangeLog: PR C/108896 * c-decl.cc (flexible_array_member_type_p): Renamed and moved to... (add_flexible_array_elts_to_size): Use renamed function. (is_flexible_array_member_p): Use renamed function. (verify_counted_by_attribute): New function. (finish_struct): Use renamed function and verify counted_by attribute. gcc/ChangeLog: PR C/108896 * doc/extend.texi: Document attribute counted_by. * tree.cc (get_named_field): New function. * tree.h (get_named_field): New prototype. gcc/testsuite/ChangeLog: PR C/108896 * gcc.dg/flex-array-counted-by.c: New test. --- gcc/c-family/c-attribs.cc | 54 - gcc/c-family/c-common.cc | 13 gcc/c-family/c-common.h | 1 + gcc/c/c-decl.cc | 79 +++- gcc/doc/extend.texi | 77 +++ gcc/testsuite/gcc.dg/flex-array-counted-by.c | 40 ++ gcc/tree.cc | 40 ++ gcc/tree.h | 5 ++ 8 files changed, 291 insertions(+), 18 deletions(-) create mode 100644 gcc/testsuite/gcc.
Re: [V3][PATCH 1/3] Provide counted_by attribute to flexible array member field (PR108896)
On 2023-08-25 11:24, Qing Zhao wrote: Provide a new counted_by attribute to flexible array member field. The obligatory "I can't ack the patch but here's a review" disclaimer :) 'counted_by (COUNT)' The 'counted_by' attribute may be attached to the flexible array member of a structure. It indicates that the number of the elements of the array is given by the field named "COUNT" in the same structure as the flexible array member. GCC uses this information to improve the results of the array bound sanitizer and the '__builtin_dynamic_object_size'. For instance, the following code: struct P { size_t count; char other; char array[] __attribute__ ((counted_by (count))); } *p; specifies that the 'array' is a flexible array member whose number of elements is given by the field 'count' in the same structure. The field that represents the number of the elements should have an integer type. An explicit 'counted_by' annotation defines a relationship between two objects, 'p->array' and 'p->count', that 'p->array' has _at least_ 'p->count' number of elements available. This relationship must hold even after any of these related objects are updated. It's the user's responsibility to make sure this relationship to be kept all the time. Otherwise the results of the array bound sanitizer and the '__builtin_dynamic_object_size' might be incorrect. For instance, in the following example, the allocated array has less elements than what's specified by the 'sbuf->count', this is an user error. As a result, out-of-bounds access to the array might not be detected. #define SIZE_BUMP 10 struct P *sbuf; void alloc_buf (size_t nelems) { sbuf = (struct P *) malloc (MAX (sizeof (struct P), (offsetof (struct P, array[0]) + nelems * sizeof (char; sbuf->count = nelems + SIZE_BUMP; /* This is invalid when the sbuf->array has less than sbuf->count elements. */ } In the following example, the 2nd update to the field 'sbuf->count' of the above structure will permit out-of-bounds access to the array 'sbuf>array' as well. #define SIZE_BUMP 10 struct P *sbuf; void alloc_buf (size_t nelems) { sbuf = (struct P *) malloc (MAX (sizeof (struct P), (offsetof (struct P, array[0]) + (nelems + SIZE_BUMP) * sizeof (char; sbuf->count = nelems; /* This is valid when the sbuf->array has at least sbuf->count elements. */ } void use_buf (int index) { sbuf->count = sbuf->count + SIZE_BUMP + 1; /* Now the value of sbuf->count is larger than the number of elements of sbuf->array. */ sbuf->array[index] = 0; /* then the out-of-bound access to this array might not be detected. */ } gcc/c-family/ChangeLog: PR C/108896 * c-attribs.cc (handle_counted_by_attribute): New function. (attribute_takes_identifier_p): Add counted_by attribute to the list. * c-common.cc (c_flexible_array_member_type_p): ...To this. * c-common.h (c_flexible_array_member_type_p): New prototype. gcc/c/ChangeLog: PR C/108896 * c-decl.cc (flexible_array_member_type_p): Renamed and moved to... (add_flexible_array_elts_to_size): Use renamed function. (is_flexible_array_member_p): Use renamed function. (verify_counted_by_attribute): New function. (finish_struct): Use renamed function and verify counted_by attribute. gcc/ChangeLog: PR C/108896 * doc/extend.texi: Document attribute counted_by. * tree.cc (get_named_field): New function. * tree.h (get_named_field): New prototype. gcc/testsuite/ChangeLog: PR C/108896 * gcc.dg/flex-array-counted-by.c: New test. --- gcc/c-family/c-attribs.cc| 54 - gcc/c-family/c-common.cc | 13 gcc/c-family/c-common.h | 1 + gcc/c/c-decl.cc | 79 +++- gcc/doc/extend.texi | 77 +++ gcc/testsuite/gcc.dg/flex-array-counted-by.c | 40 ++ gcc/tree.cc | 40 ++ gcc/tree.h | 5 ++ 8 files changed, 291 insertions(+), 18 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by.c diff --git
[committed 0/2] SECURITY.txt: Trivial fixups
Committed some trivial comma and indentation fixups that Jan shared with me off-list. Jan Engelhardt (2): secpol: add grammatically missing commas / remove one excess instance secpol: consistent indentation SECURITY.txt | 48 1 file changed, 24 insertions(+), 24 deletions(-) -- 2.41.0
[committed 2/2] secpol: consistent indentation
From: Jan Engelhardt 86% of the document have 4 spaces; adjust the remaining 14%. Signed-off-by: Jan Engelhardt ChangeLog: * SECURITY.txt: Fix up indentation. --- SECURITY.txt | 32 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/SECURITY.txt b/SECURITY.txt index 93792923583..b3e2bbfda90 100644 --- a/SECURITY.txt +++ b/SECURITY.txt @@ -173,33 +173,33 @@ Security features implemented in GCC Reporting private security bugs === - *All bugs reported in the GCC Bugzilla are public.* +*All bugs reported in the GCC Bugzilla are public.* - In order to report a private security bug that is not immediately - public, please contact one of the downstream distributions with - security teams. The following teams have volunteered to handle - such bugs: +In order to report a private security bug that is not immediately +public, please contact one of the downstream distributions with +security teams. The following teams have volunteered to handle +such bugs: Debian: secur...@debian.org Red Hat: secal...@redhat.com SUSE:secur...@suse.de AdaCore: product-secur...@adacore.com - Please report the bug to just one of these teams. It will be shared - with other teams as necessary. +Please report the bug to just one of these teams. It will be shared +with other teams as necessary. - The team contacted will take care of details such as vulnerability - rating and CVE assignment (http://cve.mitre.org/about/). It is likely - that the team will ask to file a public bug because the issue is - sufficiently minor and does not warrant an embargo. An embargo is not - a requirement for being credited with the discovery of a security - vulnerability. +The team contacted will take care of details such as vulnerability +rating and CVE assignment (http://cve.mitre.org/about/). It is likely +that the team will ask to file a public bug because the issue is +sufficiently minor and does not warrant an embargo. An embargo is not +a requirement for being credited with the discovery of a security +vulnerability. Reporting public security bugs == - It is expected that critical security bugs will be rare, and that most - security bugs can be reported in GCC, thus making - them public immediately. The system can be found here: +It is expected that critical security bugs will be rare, and that most +security bugs can be reported in GCC, thus making +them public immediately. The system can be found here: https://gcc.gnu.org/bugzilla/ -- 2.41.0
[committed 1/2] secpol: add grammatically missing commas / remove one excess instance
From: Jan Engelhardt Signed-off-by: Jan Engelhardt ChangeLog: * SECURITY.txt: Fix up commas. --- SECURITY.txt | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/SECURITY.txt b/SECURITY.txt index b65f24cfc2a..93792923583 100644 --- a/SECURITY.txt +++ b/SECURITY.txt @@ -3,12 +3,12 @@ What is a GCC security bug? A security bug is one that threatens the security of a system or network, or might compromise the security of data stored on it. -In the context of GCC there are multiple ways in which this might +In the context of GCC, there are multiple ways in which this might happen and some common scenarios are detailed below. If you're reporting a security issue and feel like it does not fit into any of the descriptions below, you're encouraged to reach out -through the GCC bugzilla or if needed, privately, by following the +through the GCC bugzilla or, if needed, privately, by following the instructions in the last two sections of this document. Compiler drivers, programs, libgccjit and support libraries @@ -24,11 +24,11 @@ Compiler drivers, programs, libgccjit and support libraries The libgccjit library can, despite the name, be used both for ahead-of-time compilation and for just-in-compilation. In both -cases it can be used to translate input representations (such as -source code) in the application context; in the latter case the +cases, it can be used to translate input representations (such as +source code) in the application context; in the latter case, the generated code is also run in the application context. -Limitations that apply to the compiler driver, apply here too in +Limitations that apply to the compiler driver apply here too in terms of trusting inputs and it is recommended that both the compilation *and* execution context of the code are appropriately sandboxed to contain the effects of any bugs in libgccjit, the @@ -43,7 +43,7 @@ Compiler drivers, programs, libgccjit and support libraries Libraries such as zlib that are bundled with GCC to build it will be treated the same as the compiler drivers and programs as far as -security coverage is concerned. However if you find an issue in +security coverage is concerned. However, if you find an issue in these libraries independent of their use in GCC, you should reach out to their upstream projects to report them. @@ -97,7 +97,7 @@ Language runtime libraries * libssp * libstdc++ -These libraries are intended to be used in arbitrary contexts and as +These libraries are intended to be used in arbitrary contexts and, as a result, bugs in these libraries may be evaluated for security impact. However, some of these libraries, e.g. libgo, libphobos, etc. are not maintained in the GCC project, due to which the GCC @@ -145,7 +145,7 @@ GCC plugins It should be noted that GCC may execute arbitrary code loaded by a user through the GCC plugin mechanism or through system preloading -mechanism. Such custom code should be vetted by the user for safety +mechanism. Such custom code should be vetted by the user for safety, as bugs exposed through such code will not be considered security issues. -- 2.41.0
Re: [PATCH v2] Add a GCC Security policy
On 2023-10-04 11:49, Alexander Monakov wrote: On Thu, 28 Sep 2023, Siddhesh Poyarekar wrote: Define a security process and exclusions to security issues for GCC and all components it ships. Some typos and wording suggestions below. I've incorporated all your and David's suggestions and pushed it. Thank you for iterating with me on this! Sid --- /dev/null +++ b/SECURITY.txt @@ -0,0 +1,205 @@ +What is a GCC security bug? +=== + +A security bug is one that threatens the security of a system or +network, or might compromise the security of data stored on it. +In the context of GCC there are multiple ways in which this might +happen and some common scenarios are detailed below. + +If you're reporting a security issue and feel like it does not fit +into any of the descriptions below, you're encouraged to reach out +through the GCC bugzilla or if needed, privately by following the +instructions in the last two sections of this document. + +Compiler drivers, programs, libgccjit and support libraries +--- + +The compiler driver processes source code, invokes other programs +such as the assembler and linker and generates the output result, +which may be assembly code or machine code. Compiling untrusted +sources can result in arbitrary code execution and unconstrained +resource consumption in the compiler. As a result, compilation of +such code should be done inside a sandboxed environment to ensure +that it does not compromise the development environment. "... the host environment" seems more appropriate. + +The libgccjit library can, despite the name, be used both for +ahead-of-time compilation and for just-in-compilation. In both +cases it can be used to translate input representations (such as +source code) in the application context; in the latter case the +generated code is also run in the application context. + +Limitations that apply to the compiler driver, apply here too in +terms of sanitizing inputs and it is recommended that both the s/sanitizing inputs/trusting inputs/ (I suggested it earlier, just unsure if you don't agree or it simply fell through the cracks) +compilation *and* execution context of the code are appropriately +sandboxed to contain the effects of any bugs in libgccjit, the +application code using it, or its generated code to the sandboxed +environment. + +Libraries such as libiberty, libcc1 and libcpp are not distributed +for runtime support and have similar challenges to compiler drivers. +While they are expected to be robust against arbitrary input, they +should only be used with trusted inputs when linked into the +compiler. + +Libraries such as zlib that bundled into GCC to build it will be 'are bundled with' (missing 'are', s/into/with/) +treated the same as the compiler drivers and programs as far as +security coverage is concerned. However if you find an issue in +these libraries independent of their use in GCC, you should reach +out to their upstream projects to report them. + +As a result, the only case for a potential security issue in the +compiler is when it generates vulnerable application code for +trusted input source code that is conforming to the relevant +programming standard or extensions documented as supported by GCC +and the algorithm expressed in the source code does not have the +vulnerability. The output application code could be considered +vulnerable if it produces an actual vulnerability in the target +application, specifically in the following cases: It seems ambiguous if the list that follows is meant to be an exhaustive enumeration. I think it is meant to give examples without covering all possibilities; if that's the case, I would suggest s/specifically in the following cases/for example/ If I misunderstood and the list is really meant to be exhaustive, it would be nice to make that clear and perhaps refer the reader to the second paragraph when their scenario does not fit. + +- The application dereferences an invalid memory location despite + the application sources being valid. +- The application reads from or writes to a valid but incorrect + memory location, resulting in an information integrity issue or an + information leak. +- The application ends up running in an infinite loop or with + severe degradation in performance despite the input sources having + no such issue, resulting in a Denial of Service. Note that + correct but non-performant code is not a security issue candidate, + this only applies to incorrect code that may result in performance + degradation severe enough to amount to a denial of service. +- The application crashes due to the generated incorrect code, +
[ping][PATCH v2] Add a GCC Security policy
Ping! On 2023-09-28 07:55, Siddhesh Poyarekar wrote: Define a security process and exclusions to security issues for GCC and all components it ships. Signed-off-by: Siddhesh Poyarekar --- SECURITY.txt | 205 +++ 1 file changed, 205 insertions(+) create mode 100644 SECURITY.txt diff --git a/SECURITY.txt b/SECURITY.txt new file mode 100644 index 000..14cb31570d3 --- /dev/null +++ b/SECURITY.txt @@ -0,0 +1,205 @@ +What is a GCC security bug? +=== + +A security bug is one that threatens the security of a system or +network, or might compromise the security of data stored on it. +In the context of GCC there are multiple ways in which this might +happen and some common scenarios are detailed below. + +If you're reporting a security issue and feel like it does not fit +into any of the descriptions below, you're encouraged to reach out +through the GCC bugzilla or if needed, privately by following the +instructions in the last two sections of this document. + +Compiler drivers, programs, libgccjit and support libraries +--- + +The compiler driver processes source code, invokes other programs +such as the assembler and linker and generates the output result, +which may be assembly code or machine code. Compiling untrusted +sources can result in arbitrary code execution and unconstrained +resource consumption in the compiler. As a result, compilation of +such code should be done inside a sandboxed environment to ensure +that it does not compromise the development environment. + +The libgccjit library can, despite the name, be used both for +ahead-of-time compilation and for just-in-compilation. In both +cases it can be used to translate input representations (such as +source code) in the application context; in the latter case the +generated code is also run in the application context. + +Limitations that apply to the compiler driver, apply here too in +terms of sanitizing inputs and it is recommended that both the +compilation *and* execution context of the code are appropriately +sandboxed to contain the effects of any bugs in libgccjit, the +application code using it, or its generated code to the sandboxed +environment. + +Libraries such as libiberty, libcc1 and libcpp are not distributed +for runtime support and have similar challenges to compiler drivers. +While they are expected to be robust against arbitrary input, they +should only be used with trusted inputs when linked into the +compiler. + +Libraries such as zlib that bundled into GCC to build it will be +treated the same as the compiler drivers and programs as far as +security coverage is concerned. However if you find an issue in +these libraries independent of their use in GCC, you should reach +out to their upstream projects to report them. + +As a result, the only case for a potential security issue in the +compiler is when it generates vulnerable application code for +trusted input source code that is conforming to the relevant +programming standard or extensions documented as supported by GCC +and the algorithm expressed in the source code does not have the +vulnerability. The output application code could be considered +vulnerable if it produces an actual vulnerability in the target +application, specifically in the following cases: + +- The application dereferences an invalid memory location despite + the application sources being valid. +- The application reads from or writes to a valid but incorrect + memory location, resulting in an information integrity issue or an + information leak. +- The application ends up running in an infinite loop or with + severe degradation in performance despite the input sources having + no such issue, resulting in a Denial of Service. Note that + correct but non-performant code is not a security issue candidate, + this only applies to incorrect code that may result in performance + degradation severe enough to amount to a denial of service. +- The application crashes due to the generated incorrect code, + resulting in a Denial of Service. + +Language runtime libraries +-- + +GCC also builds and distributes libraries that are intended to be +used widely to implement runtime support for various programming +languages. These include the following: + +* libada +* libatomic +* libbacktrace +* libcc1 +* libcody +* libcpp +* libdecnumber +* libffi +* libgcc +* libgfortran +* libgm2 +* libgo +* libgomp +* libitm +* libobjc +* libphobos +* libquadmath +* libssp +* libstdc++ + +These libraries are intended to be used
Re: [PATCH v2] Add a GCC Security policy
On 2023-09-28 12:55, Siddhesh Poyarekar wrote: Define a security process and exclusions to security issues for GCC and all components it ships. Signed-off-by: Siddhesh Poyarekar --- Sorry I forgot to summarize changes since the previous version: - Reworded the introduction so that it doesn't sound like we know *all* scenarios and also encourage reporters to reach out. - Fixed up support and diagnostic libraries sections based on Jakub's feedback. SECURITY.txt | 205 +++ 1 file changed, 205 insertions(+) create mode 100644 SECURITY.txt diff --git a/SECURITY.txt b/SECURITY.txt new file mode 100644 index 000..14cb31570d3 --- /dev/null +++ b/SECURITY.txt @@ -0,0 +1,205 @@ +What is a GCC security bug? +=== + +A security bug is one that threatens the security of a system or +network, or might compromise the security of data stored on it. +In the context of GCC there are multiple ways in which this might +happen and some common scenarios are detailed below. + +If you're reporting a security issue and feel like it does not fit +into any of the descriptions below, you're encouraged to reach out +through the GCC bugzilla or if needed, privately by following the +instructions in the last two sections of this document. + +Compiler drivers, programs, libgccjit and support libraries +--- + +The compiler driver processes source code, invokes other programs +such as the assembler and linker and generates the output result, +which may be assembly code or machine code. Compiling untrusted +sources can result in arbitrary code execution and unconstrained +resource consumption in the compiler. As a result, compilation of +such code should be done inside a sandboxed environment to ensure +that it does not compromise the development environment. + +The libgccjit library can, despite the name, be used both for +ahead-of-time compilation and for just-in-compilation. In both +cases it can be used to translate input representations (such as +source code) in the application context; in the latter case the +generated code is also run in the application context. + +Limitations that apply to the compiler driver, apply here too in +terms of sanitizing inputs and it is recommended that both the +compilation *and* execution context of the code are appropriately +sandboxed to contain the effects of any bugs in libgccjit, the +application code using it, or its generated code to the sandboxed +environment. + +Libraries such as libiberty, libcc1 and libcpp are not distributed +for runtime support and have similar challenges to compiler drivers. +While they are expected to be robust against arbitrary input, they +should only be used with trusted inputs when linked into the +compiler. + +Libraries such as zlib that bundled into GCC to build it will be +treated the same as the compiler drivers and programs as far as +security coverage is concerned. However if you find an issue in +these libraries independent of their use in GCC, you should reach +out to their upstream projects to report them. + +As a result, the only case for a potential security issue in the +compiler is when it generates vulnerable application code for +trusted input source code that is conforming to the relevant +programming standard or extensions documented as supported by GCC +and the algorithm expressed in the source code does not have the +vulnerability. The output application code could be considered +vulnerable if it produces an actual vulnerability in the target +application, specifically in the following cases: + +- The application dereferences an invalid memory location despite + the application sources being valid. +- The application reads from or writes to a valid but incorrect + memory location, resulting in an information integrity issue or an + information leak. +- The application ends up running in an infinite loop or with + severe degradation in performance despite the input sources having + no such issue, resulting in a Denial of Service. Note that + correct but non-performant code is not a security issue candidate, + this only applies to incorrect code that may result in performance + degradation severe enough to amount to a denial of service. +- The application crashes due to the generated incorrect code, + resulting in a Denial of Service. + +Language runtime libraries +-- + +GCC also builds and distributes libraries that are intended to be +used widely to implement runtime support for various programming +languages. These include the following: + +* libada +* libatomic +* libbacktrace +* libcc1 +* libcody
[PATCH v2] Add a GCC Security policy
Define a security process and exclusions to security issues for GCC and all components it ships. Signed-off-by: Siddhesh Poyarekar --- SECURITY.txt | 205 +++ 1 file changed, 205 insertions(+) create mode 100644 SECURITY.txt diff --git a/SECURITY.txt b/SECURITY.txt new file mode 100644 index 000..14cb31570d3 --- /dev/null +++ b/SECURITY.txt @@ -0,0 +1,205 @@ +What is a GCC security bug? +=== + +A security bug is one that threatens the security of a system or +network, or might compromise the security of data stored on it. +In the context of GCC there are multiple ways in which this might +happen and some common scenarios are detailed below. + +If you're reporting a security issue and feel like it does not fit +into any of the descriptions below, you're encouraged to reach out +through the GCC bugzilla or if needed, privately by following the +instructions in the last two sections of this document. + +Compiler drivers, programs, libgccjit and support libraries +--- + +The compiler driver processes source code, invokes other programs +such as the assembler and linker and generates the output result, +which may be assembly code or machine code. Compiling untrusted +sources can result in arbitrary code execution and unconstrained +resource consumption in the compiler. As a result, compilation of +such code should be done inside a sandboxed environment to ensure +that it does not compromise the development environment. + +The libgccjit library can, despite the name, be used both for +ahead-of-time compilation and for just-in-compilation. In both +cases it can be used to translate input representations (such as +source code) in the application context; in the latter case the +generated code is also run in the application context. + +Limitations that apply to the compiler driver, apply here too in +terms of sanitizing inputs and it is recommended that both the +compilation *and* execution context of the code are appropriately +sandboxed to contain the effects of any bugs in libgccjit, the +application code using it, or its generated code to the sandboxed +environment. + +Libraries such as libiberty, libcc1 and libcpp are not distributed +for runtime support and have similar challenges to compiler drivers. +While they are expected to be robust against arbitrary input, they +should only be used with trusted inputs when linked into the +compiler. + +Libraries such as zlib that bundled into GCC to build it will be +treated the same as the compiler drivers and programs as far as +security coverage is concerned. However if you find an issue in +these libraries independent of their use in GCC, you should reach +out to their upstream projects to report them. + +As a result, the only case for a potential security issue in the +compiler is when it generates vulnerable application code for +trusted input source code that is conforming to the relevant +programming standard or extensions documented as supported by GCC +and the algorithm expressed in the source code does not have the +vulnerability. The output application code could be considered +vulnerable if it produces an actual vulnerability in the target +application, specifically in the following cases: + +- The application dereferences an invalid memory location despite + the application sources being valid. +- The application reads from or writes to a valid but incorrect + memory location, resulting in an information integrity issue or an + information leak. +- The application ends up running in an infinite loop or with + severe degradation in performance despite the input sources having + no such issue, resulting in a Denial of Service. Note that + correct but non-performant code is not a security issue candidate, + this only applies to incorrect code that may result in performance + degradation severe enough to amount to a denial of service. +- The application crashes due to the generated incorrect code, + resulting in a Denial of Service. + +Language runtime libraries +-- + +GCC also builds and distributes libraries that are intended to be +used widely to implement runtime support for various programming +languages. These include the following: + +* libada +* libatomic +* libbacktrace +* libcc1 +* libcody +* libcpp +* libdecnumber +* libffi +* libgcc +* libgfortran +* libgm2 +* libgo +* libgomp +* libitm +* libobjc +* libphobos +* libquadmath +* libssp +* libstdc++ + +These libraries are intended to be used in arbitrary contexts and as +a result, bugs in these libraries
Re: [PATCH] Add a GCC Security policy
On 2023-09-20 08:29, Jakub Jelinek wrote: I just noticed (ENOCOFFEE) that the line (after removing libvtv) is: Support libraries such as libiberty, libcc1 and libcpp have been developed separately to share code with other tools such as binutils and gdb. Does that address your concern Jakub? I believe that is the case just for libiberty. libcpp is I think solely used by gcc itself (several frontends use it though, plus some build utilities in gcc). libcc1 is code for gdb with gcc implementation details. How about: Libraries that are not distributed for runtime language support such as libiberty, libcc1 and libcpp have similar challenges to compiler drivers. While they are expected to be robust against arbitrary input, they should only be used with trusted inputs. Thanks, Sid
Re: [PATCH] Add a GCC Security policy
On 2023-09-20 07:58, Siddhesh Poyarekar wrote: On 2023-09-20 07:55, Jakub Jelinek wrote: On Wed, Sep 20, 2023 at 07:50:43AM -0400, Siddhesh Poyarekar wrote: + Support libraries such as libiberty, libcc1 libvtv and libcpp have Missing comma before libvtv. But more importantly, libvtv is not support library like libiberty, libcpp, it is more like the sanitizer libraries runtime library for -fvtable-verify= . Ack, I'll move libvtv out. And, libcc1 also isn't a compiler support library, but support library for a GDB plugin. Isn't that like libiberty then, which also gets used by other toolchain projects? Maybe calling it "Toolchain support libraries" would make it more explicit? I just noticed (ENOCOFFEE) that the line (after removing libvtv) is: Support libraries such as libiberty, libcc1 and libcpp have been developed separately to share code with other tools such as binutils and gdb. Does that address your concern Jakub? Thanks, Sid
Re: [PATCH] Add a GCC Security policy
On 2023-09-20 07:55, Jakub Jelinek wrote: On Wed, Sep 20, 2023 at 07:50:43AM -0400, Siddhesh Poyarekar wrote: +Support libraries such as libiberty, libcc1 libvtv and libcpp have Missing comma before libvtv. But more importantly, libvtv is not support library like libiberty, libcpp, it is more like the sanitizer libraries runtime library for -fvtable-verify= . Ack, I'll move libvtv out. And, libcc1 also isn't a compiler support library, but support library for a GDB plugin. Isn't that like libiberty then, which also gets used by other toolchain projects? Maybe calling it "Toolchain support libraries" would make it more explicit? Thanks, Sid
[PATCH] Add a GCC Security policy
Define a security process and exclusions to security issues for GCC and all components it ships. Signed-off-by: Siddhesh Poyarekar --- Sending as a proper patch since there have been no further comments on the RFC. I toyed with the idea of making the distinction of "exploitable vulnerability" vs "missed hardening" more explicit near the top of the document but decided against further tinkering in the end since we already have a proper section dealing with it. Instead I made the language in the hardening section a bit more explicit, clarifying that missed hardening is not an *exploitable vulnerability*, which hopefully resolves the contradication of a bug in a security feature not being a security bug. I also added the AdaCore security contact at Arnaud's request. Thanks, Sid SECURITY.txt | 202 +++ 1 file changed, 202 insertions(+) create mode 100644 SECURITY.txt diff --git a/SECURITY.txt b/SECURITY.txt new file mode 100644 index 000..d2161f03bf5 --- /dev/null +++ b/SECURITY.txt @@ -0,0 +1,202 @@ +What is a GCC security bug? +=== + +A security bug is one that threatens the security of a system or +network, or might compromise the security of data stored on it. +In the context of GCC there are multiple ways in which this might +happen and they're detailed below. + +Compiler drivers, programs, libgccjit and support libraries +--- + +The compiler driver processes source code, invokes other programs +such as the assembler and linker and generates the output result, +which may be assembly code or machine code. Compiling untrusted +sources can result in arbitrary code execution and unconstrained +resource consumption in the compiler. As a result, compilation of +such code should be done inside a sandboxed environment to ensure +that it does not compromise the development environment. + +The libgccjit library can, despite the name, be used both for +ahead-of-time compilation and for just-in-compilation. In both +cases it can be used to translate input representations (such as +source code) in the application context; in the latter case the +generated code is also run in the application context. + +Limitations that apply to the compiler driver, apply here too in +terms of sanitizing inputs and it is recommended that both the +compilation *and* execution context of the code are appropriately +sandboxed to contain the effects of any bugs in libgccjit, the +application code using it, or its generated code to the sandboxed +environment. + +Support libraries such as libiberty, libcc1 libvtv and libcpp have +been developed separately to share code with other tools such as +binutils and gdb. These libraries again have similar challenges to +compiler drivers. While they are expected to be robust against +arbitrary input, they should only be used with trusted inputs. + +Libraries such as zlib that bundled into GCC to build it will be +treated the same as the compiler drivers and programs as far as +security coverage is concerned. However if you find an issue in +these libraries independent of their use in GCC, you should reach +out to their upstream projects to report them. + +As a result, the only case for a potential security issue in the +compiler is when it generates vulnerable application code for +trusted input source code that is conforming to the relevant +programming standard or extensions documented as supported by GCC +and the algorithm expressed in the source code does not have the +vulnerability. The output application code could be considered +vulnerable if it produces an actual vulnerability in the target +application, specifically in the following cases: + +- The application dereferences an invalid memory location despite + the application sources being valid. +- The application reads from or writes to a valid but incorrect + memory location, resulting in an information integrity issue or an + information leak. +- The application ends up running in an infinite loop or with + severe degradation in performance despite the input sources having + no such issue, resulting in a Denial of Service. Note that + correct but non-performant code is not a security issue candidate, + this only applies to incorrect code that may result in performance + degradation severe enough to amount to a denial of service. +- The application crashes due to the generated incorrect code, + resulting in a Denial of Service. + +Language runtime libraries +-- + +GCC also builds and distributes libraries that are intended to be +used widely to implement runtime support for various programming +languages. These i
Re: [PATCH 00/19] aarch64: Fix -fstack-protector issue
On 2023-09-12 11:25, Richard Sandiford via Gcc-patches wrote: This series of patches fixes deficiencies in GCC's -fstack-protector implementation for AArch64 when using dynamically allocated stack space. This is CVE-2023-4039. See: While this is a legitimate missed hardening, I'm not sure if this qualifies as a CVE-worthy vulnerability since correct programs won't actually be exploitable due to this. This is essentially the kind of thing that the "Security features implemented in GCC" section in the proposed security policy[1] describes. Thanks, Sid [1] https://inbox.sourceware.org/gcc-patches/ba133293-a7e8-8fe4-e1ba-7129b9e10...@gotplt.org/
Re: [RFC] GCC Security policy
Hello folks, Here's v3 of the top part of the security policy. Hopefully this addresses all concerns raised so far. Thanks, Sid What is a GCC security bug? === A security bug is one that threatens the security of a system or network, or might compromise the security of data stored on it. In the context of GCC there are multiple ways in which this might happen and they're detailed below. Compiler drivers, programs, libgccjit and support libraries --- The compiler driver processes source code, invokes other programs such as the assembler and linker and generates the output result, which may be assembly code or machine code. Compiling untrusted sources can result in arbitrary code execution and unconstrained resource consumption in the compiler. As a result, compilation of such code should be done inside a sandboxed environment to ensure that it does not compromise the development environment. The libgccjit library can, despite the name, be used both for ahead-of-time compilation and for just-in-compilation. In both cases it can be used to translate input representations (such as source code) in the application context; in the latter case the generated code is also run in the application context. Limitations that apply to the compiler driver, apply here too in terms of sanitizing inputs and it is recommended that both the compilation *and* execution context of the code are appropriately sandboxed to contain the effects of any bugs in libgccjit, the application code using it, or its generated code to the sandboxed environment. Support libraries such as libiberty, libcc1 libvtv and libcpp have been developed separately to share code with other tools such as binutils and gdb. These libraries again have similar challenges to compiler drivers. While they are expected to be robust against arbitrary input, they should only be used with trusted inputs. Libraries such as zlib that bundled into GCC to build it will be treated the same as the compiler drivers and programs as far as security coverage is concerned. However if you find an issue in these libraries independent of their use in GCC, you should reach out to their upstream projects to report them. As a result, the only case for a potential security issue in the compiler is when it generates vulnerable application code for trusted input source code that is conforming to the relevant programming standard or extensions documented as supported by GCC and the algorithm expressed in the source code does not have the vulnerability. The output application code could be considered vulnerable if it produces an actual vulnerability in the target application, specifically in the following cases: - The application dereferences an invalid memory location despite the application sources being valid. - The application reads from or writes to a valid but incorrect memory location, resulting in an information integrity issue or an information leak. - The application ends up running in an infinite loop or with severe degradation in performance despite the input sources having no such issue, resulting in a Denial of Service. Note that correct but non-performant code is not a security issue candidate, this only applies to incorrect code that may result in performance degradation severe enough to amount to a denial of service. - The application crashes due to the generated incorrect code, resulting in a Denial of Service. Language runtime libraries -- GCC also builds and distributes libraries that are intended to be used widely to implement runtime support for various programming languages. These include the following: * libada * libatomic * libbacktrace * libcc1 * libcody * libcpp * libdecnumber * libffi * libgcc * libgfortran * libgm2 * libgo * libgomp * libiberty * libitm * libobjc * libphobos * libquadmath * libsanitizer * libssp * libstdc++ These libraries are intended to be used in arbitrary contexts and as a result, bugs in these libraries may be evaluated for security impact. However, some of these libraries, e.g. libgo, libphobos, etc. are not maintained in the GCC project, due to which the GCC project may not be the correct point of contact for them. You are encouraged to look at README files within those library directories to locate the canonical security contact point for those projects and include them in the report. Once the issue is fixed in the upstream project, the fix will be synced into GCC in a future release. Most security vulnerabilities in these runtime
Re: Another bug for __builtin_object_size? (Or expected behavior)
On 2023-08-17 17:25, Qing Zhao wrote: It's not exactly the same issue, the earlier discussion was about choosing sizes in the same pass while the current one is about choosing between passes, but I agree it "rhymes". This is what I was alluding to originally (for OST_MINIMUM use MIN_EXPR if both passes returned a pass) but I haven't thought about it hard enough to be 100% confident that it's the better solution, especially for OST_MAXIMUM. We have two different sources to get SIZE information for the subobject: 1. From the TYPESIZE information embedded in the IR; 2. From the initialization information propagated from data flow, this includes both malloc call and the DECL_INIT. We need to choose between these two when both available, (these two information could be in the same pass as we discussed before, or in different passes which is shown in this discussion). I think that the MIN_EXPR might be the right choice (especially for OST_MAXIMUM) -:) It's worth a shot I guess. We could emit something like the following in early_object_sizes_execute_one: sz = (__bos(o->sub, ost) == unknown ? early_size : MIN_EXPR (__bos(o->sub, ost), early_size)); and see if it sticks. Thanks, Sid
Re: Another bug for __builtin_object_size? (Or expected behavior)
On 2023-08-17 16:23, Qing Zhao wrote: Then, I think whatever MIN or MAX, the early phase has more precise information than the later phase, we should use its result if it’s NOT UNKNOWN? We can't be sure about that though, can we? For example for something like this: struct S { int a; char b[10]; int c; }; size_t foo (struct S *s) { return __builtin_object_size (s->b, 1); } size_t bar () { struct S *in = malloc (8); return foo (in); } returns 10 for __builtin_object_size in early_objsz but when it sees the malloc in the later objsz pass, it returns 4: $ gcc/cc1 -fdump-tree-objsz-details -quiet -o - -O bug.c ... foo: .LFB0: .cfi_startproc movl$10, %eax ret .cfi_endproc ... bar: .LFB1: .cfi_startproc movl$4, %eax ret .cfi_endproc ... In fact, this ends up returning the wrong result for OST_MINIMUM: $ gcc/cc1 -fdump-tree-objsz-details -quiet -o - -O bug.c ... foo: .LFB0: .cfi_startproc movl$10, %eax ret .cfi_endproc ... bar: .LFB1: .cfi_startproc movl$10, %eax ret .cfi_endproc ... bar ought to have returned 4 too (and I'm betting the later objsz must have seen that) but it got overridden by the earlier estimate of 10. Okay, I see. Then is this the similar issue we discussed previously? (As following:) " Hi, Sid and Jakub, I have a question in the following source portion of the routine “addr_object_size” of gcc/tree-object-size.cc: 743 bytes = compute_object_offset (TREE_OPERAND (ptr, 0), var); 744 if (bytes != error_mark_node) 745 { 746 bytes = size_for_offset (var_size, bytes); 747 if (var != pt_var && pt_var_size && TREE_CODE (pt_var) == MEM_REF) 748 { 749 tree bytes2 = compute_object_offset (TREE_OPERAND (ptr, 0), 750pt_var); 751 if (bytes2 != error_mark_node) 752 { 753 bytes2 = size_for_offset (pt_var_size, bytes2); 754 bytes = size_binop (MIN_EXPR, bytes, bytes2); 755 } 756 } 757 } At line 754, why we always use “MIN_EXPR” whenever it’s for OST_MINIMUM or not? Shall we use (object_size_type & OST_MINIMUM ? MIN_EXPR : MAX_EXPR) That MIN_EXPR is not for OST_MINIMUM. It is to cater for allocations like this: typedef struct { int a; } A; size_t f() { A *p = malloc (1); return __builtin_object_size (p, 0); } where the returned size should be 1 and not sizeof (int). The mode doesn't really matter in this case. “ If this is the same issue, I think we can use the same solution: always use MIN_EXPR, What do you think? It's not exactly the same issue, the earlier discussion was about choosing sizes in the same pass while the current one is about choosing between passes, but I agree it "rhymes". This is what I was alluding to originally (for OST_MINIMUM use MIN_EXPR if both passes returned a pass) but I haven't thought about it hard enough to be 100% confident that it's the better solution, especially for OST_MAXIMUM. Thanks, Sid
Re: Another bug for __builtin_object_size? (Or expected behavior)
On 2023-08-17 15:27, Qing Zhao wrote: Yes, that's it. Maybe it's more correct if instead of MAX_EXPR if for OST_MINIMUM we stick with the early_objsz answer if it's non-zero. I'm not sure if that's the case for maximum size though, my gut says it isn't. So, the major purpose for adding the early object size phase is for computing SUBobjects size more precisely before the subobject information lost? I suppose it's more about being able to do it at all, rather than precision. Then, I think whatever MIN or MAX, the early phase has more precise information than the later phase, we should use its result if it’s NOT UNKNOWN? We can't be sure about that though, can we? For example for something like this: struct S { int a; char b[10]; int c; }; size_t foo (struct S *s) { return __builtin_object_size (s->b, 1); } size_t bar () { struct S *in = malloc (8); return foo (in); } returns 10 for __builtin_object_size in early_objsz but when it sees the malloc in the later objsz pass, it returns 4: $ gcc/cc1 -fdump-tree-objsz-details -quiet -o - -O bug.c ... foo: .LFB0: .cfi_startproc movl$10, %eax ret .cfi_endproc ... bar: .LFB1: .cfi_startproc movl$4, %eax ret .cfi_endproc ... In fact, this ends up returning the wrong result for OST_MINIMUM: $ gcc/cc1 -fdump-tree-objsz-details -quiet -o - -O bug.c ... foo: .LFB0: .cfi_startproc movl$10, %eax ret .cfi_endproc ... bar: .LFB1: .cfi_startproc movl$10, %eax ret .cfi_endproc ... bar ought to have returned 4 too (and I'm betting the later objsz must have seen that) but it got overridden by the earlier estimate of 10. We probably need smarter heuristics on choosing between the estimate of the early_objsz and late objsz passes each by itself isn't good enough for subobjects. Thanks, Sid
Re: Another bug for __builtin_object_size? (Or expected behavior)
On 2023-08-17 09:58, Qing Zhao wrote: So this is a (sort of) known issue, which necessitated the early_objsz pass to get an estimate before a subobject reference was optimized to a MEM_REF. Do you mean that after a subobject reference was optimized to a MEM_REF, there is no way to compute the size of the subobject anymore? Yes, in cases where the TYPE_SIZE is lost and there's no other allocation information to fall back on. However it looks like the MIN/MAX hack doesn't work in this case for OST_MINIMUM; it should probably get the minimum of the two passes if both passes were successful, or only the result of the pass that was successful. You mean that the following line: 2053 enum tree_code code = object_size_type & OST_MINIMUM ? MAX_EXPR : MIN_EXPR; Might need to be changed to: 2053 enum tree_code code = MIN_EXPR; Yes, that's it. Maybe it's more correct if instead of MAX_EXPR if for OST_MINIMUM we stick with the early_objsz answer if it's non-zero. I'm not sure if that's the case for maximum size though, my gut says it isn't. Thanks, Sid
Re: Another bug for __builtin_object_size? (Or expected behavior)
On 2023-08-16 11:59, Qing Zhao wrote: Jakub and Sid, During my study, I found an interesting behavior for the following small testing case: #include #include struct fixed { size_t foo; char b; char array[10]; } q = {}; #define noinline __attribute__((__noinline__)) static void noinline bar () { struct fixed *p = printf("the__bos of MAX p->array sub is %d \n", __builtin_object_size(p->array, 1)); printf("the__bos of MIN p->array sub is %d \n", __builtin_object_size(p->array, 3)); return; } int main () { bar (); return 0; } [opc@qinzhao-aarch64-ol8 108896]$ sh t /home/opc/Install/latest-d/bin/gcc -O -fstrict-flex-arrays=3 t2.c the__bos of MAX p->array sub is 10 the__bos of MIN p->array sub is 15 I assume that the Minimum size in the sub-object should be 10 too (i.e __builtin_object_size(p->array, 3) should be 10 too). So, first question: Is this correct or wrong behavior for __builtin_object_size(p->array, 3)? The second question is, when I debugged into why __builtin_object_size(p->array, 3) returns 15 instead of 10, I observed the following: 1. In “early_objz” phase, The IR for p->array is: (gdb) call debug_generic_expr(ptr) _5->array And the pt_var is: (gdb) call debug_generic_expr(pt_var) *p_5 As a result, the following condition in tree-object-size.cc: 585 if (pt_var != TREE_OPERAND (ptr, 0)) Was satisfied, and then the algorithm for computing the SUBOBJECT was invoked and the size of the subobject 10 was used. and then an MAX_EXPR was inserted after the __builtin_object_size call as: _3 = _5->array; _10 = __builtin_object_size (_3, 3); _4 = MAX_EXPR <_10, 10>; Till now, everything looks fine. 2. within “ccp1” phase, when folding the call to __builtin_object_size, the IR for the p-:>array is: (gdb) call debug_generic_expr(ptr) [(void *) + 9B] And the pt_var is: (gdb) call debug_generic_expr(pt_var) MEM [(void *) + 9B] As a result, the following condition in tree-object-size.cc: 585 if (pt_var != TREE_OPERAND (ptr, 0)) Was NOT satisfied, therefore the algorithm for computing the SUBOBJECT was NOT invoked at all, as a result, the size in the whole object, 15, was used. And then finally, MAX_EXPR (_10, 10) becomes MAX_EXPR (15, 10), 15 is the final result. Based on the above, is there any issue with the current algorithm? So this is a (sort of) known issue, which necessitated the early_objsz pass to get an estimate before a subobject reference was optimized to a MEM_REF. However it looks like the MIN/MAX hack doesn't work in this case for OST_MINIMUM; it should probably get the minimum of the two passes if both passes were successful, or only the result of the pass that was successful. Thanks, Sid
Re: [RFC] GCC Security policy
On 2023-08-16 11:06, Alexander Monakov wrote: No I understood the distinction you're trying to make, I just wanted to point out that the effect isn't all that different. The intent of the wording is not to prescribe a solution, but to describe what the compiler cannot do and hence, users must find a way to do this. I think we have a consensus on this part of the wording though because we're not really responsible for the prescription here and I'm happy with just asking users to sandbox. Nice! I suppose it's kinda like saying "don't try this at home". You know many will and some will break their leg while others will come out of it feeling invincible. Our job is to let them know that they will likely break their leg :) Continuing this analogy, I was protesting against doing our job by telling users "when trying this at home, make sure to wear vibranium shielding" while knowing for sure that nobody can, in fact, obtain said shielding, making our statement not helpful and rather tautological. :) How about this in the last section titled "Security features implemented in GCC", since that's where we also deal with security hardening. Similarly, GCC may transform code in a way that the correctness of the expressed algorithm is preserved but supplementary properties that are observable only outside the program or through a vulnerability in the program, may not be preserved. This is not a security issue in GCC and in such cases, the vulnerability that caused exposure of the supplementary properties must be fixed. Yeah, indicating scenarios that fall outside of intended guarantees should be helpful. I feel the exact text quoted above will be hard to decipher without knowing the discussion that led to it. Some sort of supplementary section with examples might help there. Ah, so I had started out by listing examples but dropped them before emailing. How about: Similarly, GCC may transform code in a way that the correctness of the expressed algorithm is preserved but supplementary properties that are observable only outside the program or through a vulnerability in the program, may not be preserved. Examples of such supplementary properties could be the state of memory after it is no longer in use, performance and timing characteristics of a program, state of the CPU cache, etc. Such issues are not security vulnerabilities in GCC and in such cases, the vulnerability that caused exposure of the supplementary properties must be fixed. In any case, I hope further discussion, clarification and wordsmithing goes productively for you both here on the list and during the Cauldron. Thanks! Sid
Re: [RFC] GCC Security policy
On 2023-08-15 19:07, Alexander Monakov wrote: On Tue, 15 Aug 2023, Siddhesh Poyarekar wrote: Thanks, this is nicer (see notes below). My main concern is that we shouldn't pretend there's some method of verifying that arbitrary source code is "safe" to pass to an unsandboxed compiler, nor should we push the responsibility of doing that on users. But responsibility would be pushed to users, wouldn't it? Making users responsible for verifying that sources are "safe" is not okay (we cannot teach them how to do that since there's no general method). Making users responsible for sandboxing the compiler is fine (there's a range of sandboxing solutions, from which they can choose according to their requirements and threat model). Sorry about the ambiguity. No I understood the distinction you're trying to make, I just wanted to point out that the effect isn't all that different. The intent of the wording is not to prescribe a solution, but to describe what the compiler cannot do and hence, users must find a way to do this. I think we have a consensus on this part of the wording though because we're not really responsible for the prescription here and I'm happy with just asking users to sandbox. I suppose it's kinda like saying "don't try this at home". You know many will and some will break their leg while others will come out of it feeling invincible. Our job is to let them know that they will likely break their leg :) inside a sandboxed environment to ensure that it does not compromise the development environment. Note that this still does not guarantee safety of the produced output programs and that such programs should still either be analyzed thoroughly for safety or run only inside a sandbox or an isolated system to avoid compromising the execution environment. The last statement seems to be a new addition. It is too broad and again makes a reference to analysis that appears quite theoretical. It might be better to drop this (and instead talk in more specific terms about any guarantees that produced binary code matches security properties intended by the sources; I believe Richard Sandiford raised this previously). OK, so I actually cover this at the end of the section; Richard's point AFAICT was about hardening, which I added another note for to make it explicit that missed hardening does not constitute a CVE-worthy threat: Thanks for the reminder. To illustrate what I was talking about, let me give two examples: 1) safety w.r.t timing attacks: even if the source code is written in a manner that looks timing-safe, it might be transformed in a way that mounting a timing attack on the resulting machine code is possible; 2) safety w.r.t information leaks: even if the source code attempts to discard sensitive data (such as passwords and keys) immediately after use, (partial) copies of that data may be left on stack and in registers, to be leaked later via a different vulnerability. For both 1) and 2), GCC is not engineered to respect such properties during optimization and code generation, so it's not appropriate for such tasks (a possible solution is to isolate such sensitive functions to separate files, compile to assembly, inspect the assembly to check that it still has the required properties, and use the inspected asm in subsequent builds instead of the original high-level source). How about this in the last section titled "Security features implemented in GCC", since that's where we also deal with security hardening. Similarly, GCC may transform code in a way that the correctness of the expressed algorithm is preserved but supplementary properties that are observable only outside the program or through a vulnerability in the program, may not be preserved. This is not a security issue in GCC and in such cases, the vulnerability that caused exposure of the supplementary properties must be fixed. Thanks, Sid
Re: [RFC] GCC Security policy
On 2023-08-16 04:25, Alexander Monakov wrote: On Tue, 15 Aug 2023, David Malcolm via Gcc-patches wrote: I'd prefer to reword this, as libgccjit was a poor choice of name for the library (sorry!), to make it clearer it can be used for both ahead- of-time and just-in-time compilation, and that as used for compilation, the host considerations apply, not just those of the generated target code. How about: The libgccjit library can, despite the name, be used both for ahead-of-time compilation and for just-in-compilation. In both cases it can be used to translate input representations (such as source code) in the application context; in the latter case the generated code is also run in the application context. Limitations that apply to the compiler driver, apply here too in terms of sanitizing inputs, so it is recommended that inputs are Thanks David! Unfortunately the lines that follow: either sanitized by an external program to allow only trusted, safe compilation and execution in the context of the application, again make a reference to a purely theoretical "external program" that is not going to exist in reality, and I made a fuss about that in another subthread (sorry Siddhesh). We shouldn't speak as if this solution is actually available to users. I know this is not the main point of your email, but we came up with a better wording for the compiler driver, and it would be good to align this text with that. How about: The libgccjit library can, despite the name, be used both for ahead-of-time compilation and for just-in-compilation. In both cases it can be used to translate input representations (such as source code) in the application context; in the latter case the generated code is also run in the application context. Limitations that apply to the compiler driver, apply here too in terms of sanitizing inputs and it is recommended that both the compilation *and* execution context of the code are appropriately sandboxed to contain the effects of any bugs in libgccjit, the application code using it, or its generated code to the sandboxed environment.
Re: [RFC] GCC Security policy
On 2023-08-15 10:07, Alexander Monakov wrote: On Tue, 15 Aug 2023, Siddhesh Poyarekar wrote: Does this as the first paragraph address your concerns: Thanks, this is nicer (see notes below). My main concern is that we shouldn't pretend there's some method of verifying that arbitrary source code is "safe" to pass to an unsandboxed compiler, nor should we push the responsibility of doing that on users. But responsibility would be pushed to users, wouldn't it? The compiler driver processes source code, invokes other programs such as the assembler and linker and generates the output result, which may be assembly code or machine code. It is necessary that all source code inputs to the compiler are trusted, since it is impossible for the driver to validate input source code for safety. The statement begins with "It is necessary", but the next statement offers an alternative in case the code is untrusted. This is a contradiction. Is it necessary or not in the end? I'd suggest to drop this statement and instead make a brief note that compiling crafted/untrusted sources can result in arbitrary code execution and unconstrained resource consumption in the compiler. So: The compiler driver processes source code, invokes other programs such as the assembler and linker and generates the output result, which may be assembly code or machine code. Compiling untrusted sources can result in arbitrary code execution and unconstrained resource consumption in the compiler. As a result, compilation of such code should be done inside a sandboxed environment to ensure that it does not compromise the development environment. For untrusted code should compilation should be done ^^ typo (spurious 'should') Ack, thanks. inside a sandboxed environment to ensure that it does not compromise the development environment. Note that this still does not guarantee safety of the produced output programs and that such programs should still either be analyzed thoroughly for safety or run only inside a sandbox or an isolated system to avoid compromising the execution environment. The last statement seems to be a new addition. It is too broad and again makes a reference to analysis that appears quite theoretical. It might be better to drop this (and instead talk in more specific terms about any guarantees that produced binary code matches security properties intended by the sources; I believe Richard Sandiford raised this previously). OK, so I actually cover this at the end of the section; Richard's point AFAICT was about hardening, which I added another note for to make it explicit that missed hardening does not constitute a CVE-worthy threat: As a result, the only case for a potential security issue in the compiler is when it generates vulnerable application code for trusted input source code that is conforming to the relevant programming standard or extensions documented as supported by GCC and the algorithm expressed in the source code does not have the vulnerability. The output application code could be considered vulnerable if it produces an actual vulnerability in the target application, specifically in the following cases: - The application dereferences an invalid memory location despite the application sources being valid. - The application reads from or writes to a valid but incorrect memory location, resulting in an information integrity issue or an information leak. - The application ends up running in an infinite loop or with severe degradation in performance despite the input sources having no such issue, resulting in a Denial of Service. Note that correct but non-performant code is not a security issue candidate, this only applies to incorrect code that may result in performance degradation severe enough to amount to a denial of service. - The application crashes due to the generated incorrect code, resulting in a Denial of Service.
Re: Is this a bug for __builtin_dynamic_object_size?
On 2023-08-14 19:12, Qing Zhao wrote: Hi, Sid, For the following testing case: #include #define noinline __attribute__((__noinline__)) static void noinline alloc_buf_more (int index) { struct annotated { long foo; char b; char array[index]; long c; } q, *p; p = printf("the__bdos of p->array whole max is %d \n", __builtin_dynamic_object_size(p->array, 0)); printf("the__bdos of p->array sub max is %d \n", __builtin_dynamic_object_size(p->array, 1)); printf("the__bdos of p->array whole min is %d \n", __builtin_dynamic_object_size(p->array, 2)); printf("the__bdos of p->array sub min is %d \n", __builtin_dynamic_object_size(p->array, 3)); return; } int main () { alloc_buf_more (10); return 0; } If I compile it with the latest upstream gcc and run it: /home/opc/Install/latest-d/bin/gcc -O t.c the__bdos of p->array whole max is 23 the__bdos of p->array sub max is 23 the__bdos of p->array whole min is 23 the__bdos of p->array sub min is 23 In which__builtin_dynamic_object_size(p->array, 0) and __builtin_dynamic_object_size(p->array, 1) return the same size, this seems wrong to me. There is one line in tree-object-size.cc might relate to this bug: (in the routine “addr_object_size”) 603 if (! TYPE_SIZE_UNIT (TREE_TYPE (var)) 604 || ! tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (var))) 605 || (pt_var_size && TREE_CODE (pt_var_size) == INTEGER_CST 606 && tree_int_cst_lt (pt_var_size, 607 TYPE_SIZE_UNIT (TREE_TYPE (var) 608 var = pt_var; I suspect that the above line 604 “ ! tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (var)))” relates to this bug, since the TYPESIZE of the VLA “array” is not a unsigned HOST_WIDE_INT, but we still can use its TYPESIZE for dynamic_object_size? What do you think? Thanks, yes that doesn't work. I'm trying to revive the patch I had submitted earlier[1] in the year and fix this issue too in that process. In general the subobject size computation doesn't handle variable sizes at all; it depends on whole object+offset to get size information, which ends up working only for flex arrays at the end of objects. Sid [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608914.html
Re: [RFC] GCC Security policy
On 2023-08-15 01:59, Alexander Monakov wrote: On Mon, 14 Aug 2023, Siddhesh Poyarekar wrote: There's no practical (programmatic) way to do such validation; it has to be a manual audit, which is why source code passed to the compiler has to be *trusted*. No, I do not think that is a logical conclusion. What is the problem with passing untrusted code to a sandboxed compiler? Right, that's what we're essentially trying to convey in the security policy text. It doesn't go into mechanisms for securing execution (because that's really beyond the scope of the *project's* policy IMO) but it states unambiguously that input to the compiler must be trusted: """ ... It is necessary that all source code inputs to the compiler are trusted, since it is impossible for the driver to validate input source code beyond conformance to a programming language standard... """ I see two issues with this. First, it reads as if people wishing to build not-entirely-trusted sources need to seek some other compiler, as somehow we seem to imply that sandboxing GCC is out of the question. Second, I take issue with the last part of the quoted text (language conformance): verifying standards conformance is also impossible (consider UB that manifests only during linking or dynamic loading) so GCC is only doing that on a best-effort basis with no guarantees. Does this as the first paragraph address your concerns: The compiler driver processes source code, invokes other programs such as the assembler and linker and generates the output result, which may be assembly code or machine code. It is necessary that all source code inputs to the compiler are trusted, since it is impossible for the driver to validate input source code for safety. For untrusted code should compilation should be done inside a sandboxed environment to ensure that it does not compromise the development environment. Note that this still does not guarantee safety of the produced output programs and that such programs should still either be analyzed thoroughly for safety or run only inside a sandbox or an isolated system to avoid compromising the execution environment. Thanks, Sid
Re: [RFC] GCC Security policy
On 2023-08-14 17:16, Alexander Monakov wrote: On Mon, 14 Aug 2023, Siddhesh Poyarekar wrote: 1. It makes it clear to users of the project the scope in which the project could be used and what safety it could reasonably expect from the project. In the context of GCC for example, it cannot expect the compiler to do a safety check of untrusted sources; the compiler will consider #include "/etc/passwd" just as valid code as #include and as a result, the onus is on the user environment to validate the input sources for safety. Whoa, no. We shouldn't make such statements unless we are prepared to explain to users how such validation can be practically implemented, which I'm sure we cannot in this case, due to future extensions such as the #embed directive, and ability to obfuscate filenames using the preprocessor. There's no practical (programmatic) way to do such validation; it has to be a manual audit, which is why source code passed to the compiler has to be *trusted*. I think it would be more honest to say that crafted sources can result in arbitrary code execution with the privileges of the user invoking the compiler, and hence the operator may want to ensure that no sensitive data is available to that user (via measures ranging from plain UNIX permissions, to chroots, to virtual machines, to air-gapped computers, depending on threat model). Right, that's what we're essentially trying to convey in the security policy text. It doesn't go into mechanisms for securing execution (because that's really beyond the scope of the *project's* policy IMO) but it states unambiguously that input to the compiler must be trusted: """ ... It is necessary that all source code inputs to the compiler are trusted, since it is impossible for the driver to validate input source code beyond conformance to a programming language standard... """ Resource consumption is another good reason to sandbox compilers. Agreed, we make that specific recommendation in the context of libgccjit. Thanks, Sid
Re: [RFC] GCC Security policy
On 2023-08-14 14:51, Richard Sandiford wrote: I think it would help to clarify what the aim of the security policy is. Specifically: (1) What service do we want to provide to users by classifying one thing as a security bug and another thing as not a security bug? (2) What service do we want to provide to the GNU community by the same classification? I think it will be easier to agree on the classification if we first agree on that. I actually wanted to do a talk on this at the Cauldron this year and *then* propose this for the gcc community, but I guess we could do this early :) So the core intent of a security policy for a project is to make clear the security stance of the project, specifying to the extent possible what kind of uses are considered safe and what kinds of bugs would be considered security issues in the context of those uses. There are a few advantages of doing this: 1. It makes it clear to users of the project the scope in which the project could be used and what safety it could reasonably expect from the project. In the context of GCC for example, it cannot expect the compiler to do a safety check of untrusted sources; the compiler will consider #include "/etc/passwd" just as valid code as #include and as a result, the onus is on the user environment to validate the input sources for safety. 2. It helps the security community (Mitre and other CNAs and security researchers) set correct expectations of the project so that they don't cry wolf for every segfault or ICE under the pretext that code could presumably be run as a service somehow and hence result in a "DoS". 3. This in turn helps stave off spurious CVE submissions that cause needless churn in downstream distributions. LLVM is already starting to see this[1] and it's only a matter of time before people start doing this for GCC. 4. It helps make a distinction between important bugs and security bugs; they're often conflated as one and the same thing. Security bugs are special because they require different handling from those that do not have a security impact, regardless of their actual importance. Unfortunately one of the reasons they're special is because there's a bunch of (pretty dumb) automation out there that rings alarm bells on every single CVE. Without a clear understanding of the context under which a project can be used, these alarm bells can be made unreasonably loud (due to incorrect scoring, see the LLVM CVE for instance; just one element in that vector changes the score from 0.0 to 5.5), causing needless churn in not just the code base but in downstream releases and end user environments. 5. This exercise is also a great start in developing an understanding of which parts in GCC are security sensitive and in what sense. Runtime libraries for example have a direct impact on application security. Compiler impact is a little less direct. Hardening features have another effect, but it's more mitigation-oriented than direct safety. This also informs us about the impact of various project actions such as bundling third-party libraries and development and maintenance of tooling within GCC and will hopefully guide policies around those practices. I hope this is a sufficient start. We don't necessarily want to get into the business of acknowledging or rejecting security issues as upstream at the moment (but see also the CNA discussion[2] of what we intend to do in that space for glibc) but having uniform upstream guidelines would be helpful to researchers as well as downstream consumers to help decide what constitutes a security issue. Thanks, Sid [1] https://nvd.nist.gov/vuln/detail/CVE-2023-29932 [2] https://inbox.sourceware.org/libc-alpha/1a44f25a-5aa3-28b7-1ecb-b3991d44c...@gotplt.org/T/
Re: [RFC] GCC Security policy
Hi, Here's the updated draft of the top part of the security policy with all of the recommendations incorporated. Thanks, Sid What is a GCC security bug? === A security bug is one that threatens the security of a system or network, or might compromise the security of data stored on it. In the context of GCC there are multiple ways in which this might happen and they're detailed below. Compiler drivers, programs, libgccjit and support libraries --- The compiler driver processes source code, invokes other programs such as the assembler and linker and generates the output result, which may be assembly code or machine code. It is necessary that all source code inputs to the compiler are trusted, since it is impossible for the driver to validate input source code beyond conformance to a programming language standard. The GCC JIT implementation, libgccjit, is intended to be plugged into applications to translate input source code in the application context. Limitations that apply to the compiler driver, apply here too in terms of sanitizing inputs, so it is recommended that inputs are either sanitized by an external program to allow only trusted, safe execution in the context of the application or the JIT execution context is appropriately sandboxed to contain the effects of any bugs in the JIT or its generated code to the sandboxed environment. Support libraries such as libiberty, libcc1 libvtv and libcpp have been developed separately to share code with other tools such as binutils and gdb. These libraries again have similar challenges to compiler drivers. While they are expected to be robust against arbitrary input, they should only be used with trusted inputs. Libraries such as zlib that bundled into GCC to build it will be treated the same as the compiler drivers and programs as far as security coverage is concerned. However if you find an issue in these libraries independent of their use in GCC, you should reach out to their upstream projects to report them. As a result, the only case for a potential security issue in all these cases is when it ends up generating vulnerable output for valid input source code. As a result, the only case for a potential security issue in the compiler is when it generates vulnerable application code for trusted input source code that is conforming to the relevant programming standard or extensions documented as supported by GCC and the algorithm expressed in the source code does not have the vulnerability. The output application code could be considered vulnerable if it produces an actual vulnerability in the target application, specifically in the following cases: - The application dereferences an invalid memory location despite the application sources being valid. - The application reads from or writes to a valid but incorrect memory location, resulting in an information integrity issue or an information leak. - The application ends up running in an infinite loop or with severe degradation in performance despite the input sources having no such issue, resulting in a Denial of Service. Note that correct but non-performant code is not a security issue candidate, this only applies to incorrect code that may result in performance degradation severe enough to amount to a denial of service. - The application crashes due to the generated incorrect code, resulting in a Denial of Service. Language runtime libraries -- GCC also builds and distributes libraries that are intended to be used widely to implement runtime support for various programming languages. These include the following: * libada * libatomic * libbacktrace * libcc1 * libcody * libcpp * libdecnumber * libffi * libgcc * libgfortran * libgm2 * libgo * libgomp * libiberty * libitm * libobjc * libphobos * libquadmath * libsanitizer * libssp * libstdc++ These libraries are intended to be used in arbitrary contexts and as a result, bugs in these libraries may be evaluated for security impact. However, some of these libraries, e.g. libgo, libphobos, etc. are not maintained in the GCC project, due to which the GCC project may not be the correct point of contact for them. You are encouraged to look at README files within those library directories to locate the canonical security contact point for those projects and include them in the report. Once the issue is fixed in the upstream project, the fix will be synced into GCC in a future release. Most security vulnerabilities in these runtime libraries arise when an application
Re: [RFC] GCC Security policy
On 2023-08-11 11:12, David Edelsohn wrote: The text above states "bugs in these libraries may be evaluated for security impact", but there is no comment about the criteria for a security impact, unlike the GLIBC SECURITY.md document. The text seems to imply the "What is a security bug?" definitions from GLIBC, but the definitions are not explicitly stated in the GCC Security policy. Should this "Language runtime libraries" section include some of the GLIBC "What is a security bug?" text or should the GCC "What is a security bug?" section earlier in this document include the text with a qualification that issues like buffer overflow, memory leaks, information disclosure, etc. specifically apply to "Language runtime libraries" and not all components of GCC? Yes, that makes sense. This part will likely evolve though, much like the glibc one did, based on reports we get over time. I'll work it in and post an updated draft. Thanks, Sid
Re: [RFC] GCC Security policy
On 2023-08-11 11:09, Paul Koning wrote: On Aug 11, 2023, at 10:36 AM, Siddhesh Poyarekar wrote: On 2023-08-10 14:50, Siddhesh Poyarekar wrote: As a result, the only case for a potential security issue in all these cases is when it ends up generating vulnerable output for valid input source code. I think this leaves open the interpretation "every wrong code bug is potentially a security bug". I suppose that's true in a trite sense, but not in a useful sense. As others said earlier in the thread, whether a wrong code bug in GCC leads to a security bug in the object code is too application-dependent to be a useful classification for GCC. I think we should explicitly say that we don't generally consider wrong code bugs to be security bugs. Leaving it implicit is bound to lead to misunderstanding. I see what you mean, but the context-dependence of a bug is something GCC will have to deal with, similar to how libraries have to deal with bugs. But I agree this probably needs some more expansion. Let me try and come up with something more detailed for that last paragraph. How's this: As a result, the only case for a potential security issue in the compiler is when it generates vulnerable application code for valid, trusted input source code. The output application code could be considered vulnerable if it produces an actual vulnerability in the target application, specifically in the following cases: You might make it explicit that we're talking about wrong code errors here -- in other words, the source code is correct (conforms to the standard) and the algorithm expressed in the source code does not have a vulnerability, but the generated code has semantics that differ from those of the source code such that it does have a vulnerability. Ack, thanks for the suggestion. - The application dereferences an invalid memory location despite the application sources being valid. - The application reads from or writes to a valid but incorrect memory location, resulting in an information integrity issue or an information leak. - The application ends up running in an infinite loop or with severe degradation in performance despite the input sources having no such issue, resulting in a Denial of Service. Note that correct but non-performant code is not a security issue candidate, this only applies to incorrect code that may result in performance degradation. The last sentence somewhat contradicts the preceding one. Perhaps "...may result in performance degradation severe enough to amount to a denial of service". Ack, will fix that up, thanks. Sid
Re: [RFC] GCC Security policy
On 2023-08-10 14:50, Siddhesh Poyarekar wrote: As a result, the only case for a potential security issue in all these cases is when it ends up generating vulnerable output for valid input source code. I think this leaves open the interpretation "every wrong code bug is potentially a security bug". I suppose that's true in a trite sense, but not in a useful sense. As others said earlier in the thread, whether a wrong code bug in GCC leads to a security bug in the object code is too application-dependent to be a useful classification for GCC. I think we should explicitly say that we don't generally consider wrong code bugs to be security bugs. Leaving it implicit is bound to lead to misunderstanding. I see what you mean, but the context-dependence of a bug is something GCC will have to deal with, similar to how libraries have to deal with bugs. But I agree this probably needs some more expansion. Let me try and come up with something more detailed for that last paragraph. How's this: As a result, the only case for a potential security issue in the compiler is when it generates vulnerable application code for valid, trusted input source code. The output application code could be considered vulnerable if it produces an actual vulnerability in the target application, specifically in the following cases: - The application dereferences an invalid memory location despite the application sources being valid. - The application reads from or writes to a valid but incorrect memory location, resulting in an information integrity issue or an information leak. - The application ends up running in an infinite loop or with severe degradation in performance despite the input sources having no such issue, resulting in a Denial of Service. Note that correct but non-performant code is not a security issue candidate, this only applies to incorrect code that may result in performance degradation. - The application crashes due to the generated incorrect code, resulting in a Denial of Service.
Re: [RFC] GCC Security policy
On 2023-08-10 14:28, Richard Sandiford wrote: Siddhesh Poyarekar writes: On 2023-08-08 10:30, Siddhesh Poyarekar wrote: Do you have a suggestion for the language to address libgcc, libstdc++, etc. and libiberty, libbacktrace, etc.? I'll work on this a bit and share a draft. Hi David, Here's what I came up with for different parts of GCC, including the runtime libraries. Over time we may find that specific parts of runtime libraries simply cannot be used safely in some contexts and flag that. Sid """ What is a GCC security bug? === A security bug is one that threatens the security of a system or network, or might compromise the security of data stored on it. In the context of GCC there are multiple ways in which this might happen and they're detailed below. Compiler drivers, programs, libgccjit and support libraries --- The compiler driver processes source code, invokes other programs such as the assembler and linker and generates the output result, which may be assembly code or machine code. It is necessary that all source code inputs to the compiler are trusted, since it is impossible for the driver to validate input source code beyond conformance to a programming language standard. The GCC JIT implementation, libgccjit, is intended to be plugged into applications to translate input source code in the application context. Limitations that apply to the compiler driver, apply here too in terms of sanitizing inputs, so it is recommended that inputs are either sanitized by an external program to allow only trusted, safe execution in the context of the application or the JIT execution context is appropriately sandboxed to contain the effects of any bugs in the JIT or its generated code to the sandboxed environment. Support libraries such as libiberty, libcc1 libvtv and libcpp have been developed separately to share code with other tools such as binutils and gdb. These libraries again have similar challenges to compiler drivers. While they are expected to be robust against arbitrary input, they should only be used with trusted inputs. Libraries such as zlib and libffi that bundled into GCC to build it will be treated the same as the compiler drivers and programs as far as security coverage is concerned. As a result, the only case for a potential security issue in all these cases is when it ends up generating vulnerable output for valid input source code. I think this leaves open the interpretation "every wrong code bug is potentially a security bug". I suppose that's true in a trite sense, but not in a useful sense. As others said earlier in the thread, whether a wrong code bug in GCC leads to a security bug in the object code is too application-dependent to be a useful classification for GCC. I think we should explicitly say that we don't generally consider wrong code bugs to be security bugs. Leaving it implicit is bound to lead to misunderstanding. I see what you mean, but the context-dependence of a bug is something GCC will have to deal with, similar to how libraries have to deal with bugs. But I agree this probably needs some more expansion. Let me try and come up with something more detailed for that last paragraph. There's another case that I think should be highlighted explicitly: GCC provides various security-hardening features. I think any failure of those feature to act as documented is poentially a security bug. Failure to follow reasonable expectations (even if not documented) might sometimes be a security bug too. Missed hardening in general does not put systems at immediate risk, so they're not considered CVE-worthy. In fact when bugs are evaluated for security risk at a source level (e.g. when NIST does it), hardening does not come into the picture at all. It's only at product levels that hardening features are accounted for, e.g. where -fstack-protector would reduce the seriousness of a stack buffer overflow and even there one must do an analysis to see if the generated code actually mitigated the overflow using the stack protector canary. Thanks, Sid
Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)
On 2023-08-10 12:39, Jakub Jelinek wrote: On Thu, Aug 10, 2023 at 12:30:06PM -0400, Siddhesh Poyarekar wrote: The definition of __bos/__bdos allows us the freedom to *estimate* rather than be precise, so I'd go for sizeof(x) + N * sizeof(*x.a) since it's bound to give the more conservative answer of the two. To be precise, we have the 0/1 modes vs. 2/3. So, when not determining __bos/__bdos from actual allocation size or size of an stack object or size of data section object but something else (say counted_by), perhaps 0/1 modes should give the upper estimate of sizeof (x) + N * sizeof(elt) and 2/3 modes should give a lower estimate, so offsetof + N * sizeof(elt), then user code can continue testing if both modes are equal to have exact number. Ack, that's fair. Thanks, Sid
Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)
On 2023-08-10 11:18, Martin Uecker wrote: Am Donnerstag, dem 10.08.2023 um 10:58 -0400 schrieb Siddhesh Poyarekar: On 2023-08-10 10:47, Martin Uecker wrote: Am Donnerstag, dem 10.08.2023 um 16:42 +0200 schrieb Jakub Jelinek: On Thu, Aug 10, 2023 at 04:38:21PM +0200, Martin Uecker wrote: Am Donnerstag, dem 10.08.2023 um 13:59 + schrieb Qing Zhao: On Aug 10, 2023, at 2:58 AM, Martin Uecker wrote: Am Mittwoch, dem 09.08.2023 um 20:10 + schrieb Qing Zhao: On Aug 9, 2023, at 12:21 PM, Michael Matz wrote: I am not sure for the reason given above. The following code would not work: struct foo_flex { int a; short b; char t[]; } x; x.a = 1; struct foo_flex *p = malloc(sizeof(x) + x.a); if (!p) abort(); memcpy(p, , sizeof(x)); // initialize struct Okay. Then, the user still should use the sizeof(struct foo_flex) + N * sizeof(foo->t) for the allocation, even though this might allocate more bytes than necessary. (But this is safe) Let me know if I still miss anything. The question is not only what the user should use to allocate, but also what BDOS should return. In my example the user uses the sizeof() + N * sizeof formula and the memcpy is safe, but it would be flagged as a buffer overrun if BDOS uses the offsetof formula. BDOS/BOS (at least the 0 level) should return what is actually allocated for the var, what size was passed to malloc and if it is a var with flex array member with initialization what is actually the size on the stack or in .data/.rodata etc. Agreed. But what about a struct with FAM with the new "counted_by" attribute if the original allocation is not visible? There's precedent for this through the __access__ attribute; __bos trusts what the attribute says about the allocation. The access attribute gives the size directly. The counted_by gives a length for the array which needs to be translated into a size via a formula. There are different formulas in use. The question is which formula should bdos trust? Whatever you pick, if this is not consistent with the actual allocation or use, then it will cause problems either by breaking code or not detecting buffer overruns. So it needs to be consistent with what GCC allocates for a var with FAM and initialization and also the user needs to be told what the right choice is so that he can use the right size for allocation and argument to memcpy / memset etc. We'd rather miss overflow to the extent of padding than to try and be overly aggressive; I doubt if we're missing much protection in practice by trying to account for the padding. The definition of __bos/__bdos allows us the freedom to *estimate* rather than be precise, so I'd go for sizeof(x) + N * sizeof(*x.a) since it's bound to give the more conservative answer of the two. Thanks, Sid
Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)
On 2023-08-10 10:47, Martin Uecker wrote: Am Donnerstag, dem 10.08.2023 um 16:42 +0200 schrieb Jakub Jelinek: On Thu, Aug 10, 2023 at 04:38:21PM +0200, Martin Uecker wrote: Am Donnerstag, dem 10.08.2023 um 13:59 + schrieb Qing Zhao: On Aug 10, 2023, at 2:58 AM, Martin Uecker wrote: Am Mittwoch, dem 09.08.2023 um 20:10 + schrieb Qing Zhao: On Aug 9, 2023, at 12:21 PM, Michael Matz wrote: I am not sure for the reason given above. The following code would not work: struct foo_flex { int a; short b; char t[]; } x; x.a = 1; struct foo_flex *p = malloc(sizeof(x) + x.a); if (!p) abort(); memcpy(p, , sizeof(x)); // initialize struct Okay. Then, the user still should use the sizeof(struct foo_flex) + N * sizeof(foo->t) for the allocation, even though this might allocate more bytes than necessary. (But this is safe) Let me know if I still miss anything. The question is not only what the user should use to allocate, but also what BDOS should return. In my example the user uses the sizeof() + N * sizeof formula and the memcpy is safe, but it would be flagged as a buffer overrun if BDOS uses the offsetof formula. BDOS/BOS (at least the 0 level) should return what is actually allocated for the var, what size was passed to malloc and if it is a var with flex array member with initialization what is actually the size on the stack or in .data/.rodata etc. Agreed. But what about a struct with FAM with the new "counted_by" attribute if the original allocation is not visible? There's precedent for this through the __access__ attribute; __bos trusts what the attribute says about the allocation. Sid
Re: [RFC] GCC Security policy
On 2023-08-09 14:17, David Edelsohn wrote: On Wed, Aug 9, 2023 at 1:33 PM Siddhesh Poyarekar <mailto:siddh...@gotplt.org>> wrote: On 2023-08-08 10:30, Siddhesh Poyarekar wrote: >> Do you have a suggestion for the language to address libgcc, >> libstdc++, etc. and libiberty, libbacktrace, etc.? > > I'll work on this a bit and share a draft. Hi David, Here's what I came up with for different parts of GCC, including the runtime libraries. Over time we may find that specific parts of runtime libraries simply cannot be used safely in some contexts and flag that. Sid Hi, Sid Thanks for iterating on this. """ What is a GCC security bug? === A security bug is one that threatens the security of a system or network, or might compromise the security of data stored on it. In the context of GCC there are multiple ways in which this might happen and they're detailed below. Compiler drivers, programs, libgccjit and support libraries --- The compiler driver processes source code, invokes other programs such as the assembler and linker and generates the output result, which may be assembly code or machine code. It is necessary that all source code inputs to the compiler are trusted, since it is impossible for the driver to validate input source code beyond conformance to a programming language standard. The GCC JIT implementation, libgccjit, is intended to be plugged into applications to translate input source code in the application context. Limitations that apply to the compiler driver, apply here too in terms of sanitizing inputs, so it is recommended that inputs are either sanitized by an external program to allow only trusted, safe execution in the context of the application or the JIT execution context is appropriately sandboxed to contain the effects of any bugs in the JIT or its generated code to the sandboxed environment. Support libraries such as libiberty, libcc1 libvtv and libcpp have been developed separately to share code with other tools such as binutils and gdb. These libraries again have similar challenges to compiler drivers. While they are expected to be robust against arbitrary input, they should only be used with trusted inputs. Libraries such as zlib and libffi that bundled into GCC to build it will be treated the same as the compiler drivers and programs as far as security coverage is concerned. Should we direct people to the upstream projects for their security policies? We bundle zlib and libffi so regardless of whether it's a security issue in those libraries (because security impact of memory safety bugs in general use libraries will be context dependent and hence get assigned CVEs more often than not), the context in gcc is well defined as a local unprivileged executable and hence not security-relevant. That said, we could add something like: However if you find a issue in these libraries independent of their use in GCC you should reach out to their upstream projects to report them. As a result, the only case for a potential security issue in all these cases is when it ends up generating vulnerable output for valid input source code. Language runtime libraries -- GCC also builds and distributes libraries that are intended to be used widely to implement runtime support for various programming languages. These include the following: * libada * libatomic * libbacktrace * libcc1 * libcody * libcpp * libdecnumber * libgcc * libgfortran * libgm2 * libgo * libgomp * libiberty * libitm * libobjc * libphobos * libquadmath * libssp * libstdc++ These libraries are intended to be used in arbitrary contexts and as a result, bugs in these libraries may be evaluated for security impact. However, some of these libraries, e.g. libgo, libphobos, etc. are not maintained in the GCC project, due to which the GCC project may not be the correct point of contact for them. You are encouraged to look at README files within those library directories to locate the canonical security contact point for those projects. As Richard mentioned, should GCC make a specific statement about the security policy / response for issues that
Re: [RFC] GCC Security policy
On 2023-08-08 10:30, Siddhesh Poyarekar wrote: Do you have a suggestion for the language to address libgcc, libstdc++, etc. and libiberty, libbacktrace, etc.? I'll work on this a bit and share a draft. Hi David, Here's what I came up with for different parts of GCC, including the runtime libraries. Over time we may find that specific parts of runtime libraries simply cannot be used safely in some contexts and flag that. Sid """ What is a GCC security bug? === A security bug is one that threatens the security of a system or network, or might compromise the security of data stored on it. In the context of GCC there are multiple ways in which this might happen and they're detailed below. Compiler drivers, programs, libgccjit and support libraries --- The compiler driver processes source code, invokes other programs such as the assembler and linker and generates the output result, which may be assembly code or machine code. It is necessary that all source code inputs to the compiler are trusted, since it is impossible for the driver to validate input source code beyond conformance to a programming language standard. The GCC JIT implementation, libgccjit, is intended to be plugged into applications to translate input source code in the application context. Limitations that apply to the compiler driver, apply here too in terms of sanitizing inputs, so it is recommended that inputs are either sanitized by an external program to allow only trusted, safe execution in the context of the application or the JIT execution context is appropriately sandboxed to contain the effects of any bugs in the JIT or its generated code to the sandboxed environment. Support libraries such as libiberty, libcc1 libvtv and libcpp have been developed separately to share code with other tools such as binutils and gdb. These libraries again have similar challenges to compiler drivers. While they are expected to be robust against arbitrary input, they should only be used with trusted inputs. Libraries such as zlib and libffi that bundled into GCC to build it will be treated the same as the compiler drivers and programs as far as security coverage is concerned. As a result, the only case for a potential security issue in all these cases is when it ends up generating vulnerable output for valid input source code. Language runtime libraries -- GCC also builds and distributes libraries that are intended to be used widely to implement runtime support for various programming languages. These include the following: * libada * libatomic * libbacktrace * libcc1 * libcody * libcpp * libdecnumber * libgcc * libgfortran * libgm2 * libgo * libgomp * libiberty * libitm * libobjc * libphobos * libquadmath * libssp * libstdc++ These libraries are intended to be used in arbitrary contexts and as a result, bugs in these libraries may be evaluated for security impact. However, some of these libraries, e.g. libgo, libphobos, etc. are not maintained in the GCC project, due to which the GCC project may not be the correct point of contact for them. You are encouraged to look at README files within those library directories to locate the canonical security contact point for those projects. Diagnostic libraries The sanitizer library bundled in GCC is intended to be used in diagnostic cases and not intended for use in sensitive environments. As a result, bugs in the sanitizer will not be considered security sensitive. GCC plugins --- It should be noted that GCC may execute arbitrary code loaded by a user through the GCC plugin mechanism or through system preloading mechanism. Such custom code should be vetted by the user for safety as bugs exposed through such code will not be considered security issues.
Re: [RFC] GCC Security policy
On 2023-08-08 11:48, David Malcolm wrote: On Tue, 2023-08-08 at 09:33 -0400, Paul Koning via Gcc-patches wrote: On Aug 8, 2023, at 9:01 AM, Jakub Jelinek via Gcc-patches wrote: On Tue, Aug 08, 2023 at 02:52:57PM +0200, Richard Biener via Gcc- patches wrote: There's probably external tools to do this, not sure if we should replicate things in the driver for this. But sure, I think the driver is the proper point to address any of such issues - iff we want to address them at all. Maybe a nice little google summer-of-code project ;) What I'd really like to avoid is having all compiler bugs (primarily ICEs) considered to be security bugs (e.g. DoS category), it would be terrible to release every week a new compiler because of the "security" issues. Indeed. But my answer would be that such things are not DoS issues. DoS means that an external input, over which you have little control, is impairing service. In the case of a compiler, if feeding it bad source code X.c causes it to crash, the answer is "well, then don't do that". Agreed. I'm not sure how to "wordsmith" this, but it seems like the sources and options on the *host* are assumed to be trusted, and that the act of *compiling* source on the host requires trusting them, just like the act of executing the compiled code on the target does. Though users may be more familiar with sandboxing the target than the host. We should spell this out further for libgccjit: libgccjit allows for ahead-of-time and JIT compilation of sources - but it assumes that those sources (and the compilation options) are trusted. [Adding Andrea Corallo to the addressees] For example, Emacs is using libgccjit to do ahead-of-time compilation of Emacs bytecode. I'm assuming that Emacs is assuming that its bytecode is trusted, and that there isn't any attempt by Emacs to sandbox the Emacs Lisp being processed. However, consider a situation in which someone attempted to, say, embed libgccjit inside a web browser to generate machine code from JavaScript, where the JavaScript is potentially controlled by an attacker. I think we want to explicitly say that that if you're going to do that, you need to put some other layer of defense in, so that you're not blithely accepting the inputs to the compilation (sources and options) from a potentially hostile source, where a crafted input sources could potentially hit an ICE in the compiler and thus crash the web browser. +1, this is precisely the kind of thing the security policy should warn against and suggest using sandboxing for. The compiler (or libgccjit) isn't really in a position to defend such uses, ICE or otherwise. Thanks, Sid
Re: [RFC] GCC Security policy
On 2023-08-08 10:37, Jakub Jelinek wrote: On Tue, Aug 08, 2023 at 10:30:10AM -0400, Siddhesh Poyarekar wrote: Do you have a suggestion for the language to address libgcc, libstdc++, etc. and libiberty, libbacktrace, etc.? I'll work on this a bit and share a draft. BTW, I think we should perhaps differentiate between production ready libraries (e.g. libgcc, libstdc++, libgomp, libatomic, libgfortran, libquadmath, libssp) vs. e.g. the sanitizer libraries which are meant for debugging and Agreed, that's why I need some time to sort all of the libraries gcc builds to categorize them into various levels of support in terms of safety re. untrusted input. Thanks, Sid
Re: [RFC] GCC Security policy
On 2023-08-08 10:14, David Edelsohn wrote: On Tue, Aug 8, 2023 at 10:07 AM Siddhesh Poyarekar <mailto:siddh...@gotplt.org>> wrote: On 2023-08-08 10:04, Richard Biener wrote: > On Tue, Aug 8, 2023 at 3:35 PM Ian Lance Taylor mailto:i...@google.com>> wrote: >> >> On Tue, Aug 8, 2023 at 6:02 AM Jakub Jelinek via Gcc-patches >> mailto:gcc-patches@gcc.gnu.org>> wrote: >>> >>> On Tue, Aug 08, 2023 at 02:52:57PM +0200, Richard Biener via Gcc-patches wrote: >>>> There's probably external tools to do this, not sure if we should replicate >>>> things in the driver for this. >>>> >>>> But sure, I think the driver is the proper point to address any of such >>>> issues - iff we want to address them at all. Maybe a nice little >>>> google summer-of-code project ;) >>> >>> What I'd really like to avoid is having all compiler bugs (primarily ICEs) >>> considered to be security bugs (e.g. DoS category), it would be terrible to >>> release every week a new compiler because of the "security" issues. >>> Running compiler on untrusted sources can trigger ICEs (which we want to fix >>> but there will always be some), or run into some compile time and/or compile >>> memory issue (we have various quadratic or worse spots), compiler stack >>> limits (deeply nested stuff e.g. during parsing but other areas as well). >>> So, people running fuzzers and reporting issues is great, but if they'd get >>> a CVE assigned for each ice-on-invalid-code, ice-on-valid-code, >>> each compile-time-hog and each memory-hog, that wouldn't be useful. >>> Runtime libraries or security issues in the code we generate for valid >>> sources are of course a different thing. >> >> >> I wonder if a security policy should say something about the -fplugin >> option. I agree that an ICE is not a security issue, but I wonder how >> many people are aware that a poorly chosen command line option can >> direct the compiler to run arbitrary code. For that matter the same >> is true of setting the GCC_EXEC_PREFIX environment variable, and no >> doubt several other environment variables. My point is not that we >> should change these, but that a security policy should draw attention >> to the fact that there are cases in which the compiler will >> unexpectedly run other programs. > > Well, if you run an arbitrary commandline from the internet you get > what you deserve, running "echo "Hello World" | gcc -xc - -o /dev/sda" > as root doesn't need plugins to shoot yourself in the foot. You need to > know what you're doing, otherwise you are basically executing an > arbitrary shell script with whatever privileges you have. I think it would be useful to mention caveats with plugins though, just like it would be useful to mention exceptions for libiberty and similar libraries that gcc builds. It only helps makes things clearer in terms of what security coverage the project provides. I have added a line to the Note section in the proposed text: GCC and its tools provide features and options that can run arbitrary user code (e.g., -fplugin). How about the following to make it clearer that arbitrary code in plugins is not considered secure: GCC and its tools provide features and options that can run arbitrary user code, e.g. using the -fplugin options. Such custom code should be vetted by the user for safety as bugs exposed through such code will not be considered security issues. I believe that the security implication already is addressed because the program is not tricked into a direct compromise of security. Do you have a suggestion for the language to address libgcc, libstdc++, etc. and libiberty, libbacktrace, etc.? I'll work on this a bit and share a draft. Thanks, Sid
Re: [RFC] GCC Security policy
On 2023-08-08 10:04, Richard Biener wrote: On Tue, Aug 8, 2023 at 3:35 PM Ian Lance Taylor wrote: On Tue, Aug 8, 2023 at 6:02 AM Jakub Jelinek via Gcc-patches wrote: On Tue, Aug 08, 2023 at 02:52:57PM +0200, Richard Biener via Gcc-patches wrote: There's probably external tools to do this, not sure if we should replicate things in the driver for this. But sure, I think the driver is the proper point to address any of such issues - iff we want to address them at all. Maybe a nice little google summer-of-code project ;) What I'd really like to avoid is having all compiler bugs (primarily ICEs) considered to be security bugs (e.g. DoS category), it would be terrible to release every week a new compiler because of the "security" issues. Running compiler on untrusted sources can trigger ICEs (which we want to fix but there will always be some), or run into some compile time and/or compile memory issue (we have various quadratic or worse spots), compiler stack limits (deeply nested stuff e.g. during parsing but other areas as well). So, people running fuzzers and reporting issues is great, but if they'd get a CVE assigned for each ice-on-invalid-code, ice-on-valid-code, each compile-time-hog and each memory-hog, that wouldn't be useful. Runtime libraries or security issues in the code we generate for valid sources are of course a different thing. I wonder if a security policy should say something about the -fplugin option. I agree that an ICE is not a security issue, but I wonder how many people are aware that a poorly chosen command line option can direct the compiler to run arbitrary code. For that matter the same is true of setting the GCC_EXEC_PREFIX environment variable, and no doubt several other environment variables. My point is not that we should change these, but that a security policy should draw attention to the fact that there are cases in which the compiler will unexpectedly run other programs. Well, if you run an arbitrary commandline from the internet you get what you deserve, running "echo "Hello World" | gcc -xc - -o /dev/sda" as root doesn't need plugins to shoot yourself in the foot. You need to know what you're doing, otherwise you are basically executing an arbitrary shell script with whatever privileges you have. I think it would be useful to mention caveats with plugins though, just like it would be useful to mention exceptions for libiberty and similar libraries that gcc builds. It only helps makes things clearer in terms of what security coverage the project provides. Thanks, Sid
Re: [RFC] GCC Security policy
On 2023-08-08 04:16, Richard Biener wrote: On Mon, Aug 7, 2023 at 7:30 PM David Edelsohn via Gcc-patches wrote: FOSS Best Practices recommends that projects have an official Security policy stated in a SECURITY.md or SECURITY.txt file at the root of the repository. GLIBC and Binutils have added such documents. Appended is a prototype for a Security policy file for GCC based on the Binutils document because GCC seems to have more affinity with Binutils as a tool. Do the runtime libraries distributed with GCC, especially libgcc, require additional security policies? [ ] Is it appropriate to use the Binutils SECURITY.txt as the starting point or should GCC use GLIBC SECURITY.md as the starting point for the GCC Security policy? [ ] Does GCC, or some components of GCC, require additional care because of runtime libraries like libgcc and libstdc++, and because of gcov and profile-directed feedback? I do think that the runtime libraries should at least be explicitly mentioned because they fall into the "generated output" category and bugs in the runtime are usually more severe as affecting a wider class of inputs. Ack, I'd expect libstdc++ and libgcc to be aligned with glibc's policies. libiberty and others on the other hand, would probably be more suitably aligned with binutils libbfd, where we assume trusted input. Thoughts? Thanks, David GCC Security Process What is a GCC security bug? === A security bug is one that threatens the security of a system or network, or might compromise the security of data stored on it. In the context of GCC there are two ways in which such bugs might occur. In the first, the programs themselves might be tricked into a direct compromise of security. In the second, the tools might introduce a vulnerability in the generated output that was not already present in the files used as input. Other than that, all other bugs will be treated as non-security issues. This does not mean that they will be ignored, just that they will not be given the priority that is given to security bugs. This stance applies to the creation tools in the GCC (e.g., gcc, g++, gfortran, gccgo, gccrs, gnat, cpp, gcov, etc.) and the libraries that they use. Notes: == None of the programs in GCC need elevated privileges to operate and it is recommended that users do not use them from accounts where such privileges are automatically available. I'll note that we could ourselves mitigate some of that by handling privileged invocation of the driver specially, dropping privs on exec of the sibling tools and possibly using temporary files or pipes to do the parts of the I/O that need to be privileged. It's not a bad idea, but it ends up giving legitimizing running the compiler as root, pushing the responsibility of privilege management to the driver. How about rejecting invocation as root altogether by default, bypassed with a --run-as-root flag instead? I've also been thinking about a --sandbox flag that isolates the build process (for gcc as well as binutils) into a separate namespace so that it's usable in a restricted mode on untrusted sources without exposing the rest of the system to it. Thanks, Sid
Re: One question on the source code of tree-object-size.cc
On 2023-08-04 15:06, Qing Zhao wrote: Yes, that's what I'm thinking. so `q` must be pointing to a single element. So you could deduce: 1. the minimum size of the whole object that q points to. You mean that the TYPE will determine the minimum size of the whole object? (Does this include the size of the flexible array member, or only the other part of the structure except the flexible array member?) Only the constant sized part of the structure. Okay. I see. But if the “counted_by” info is available, then from p->array, we can deduce the minimum size too, as sizeof(struct A) + q->foo * sizeof(int), right? Yes. Actually for minimum size we'd also need a guarantee that `alloc_buf_more` returns a valid allocated object. Why? Please explain a little bit here. So `alloc_buf_more` could return NULL, a valid pointer or an invalid pointer. So, we could end up returning a non-zero minimum size for an invalid or NULL pointer, which is incorrect, we don't know that. I see what’ s you mean now. However, if we already see p->array, then the p is guaranteed a valid pointer and not a NULL, right? (We are discussing on __builtin_dynamic_object_size (q->array, 2), we see q->array already) Yes, you could argue that for p->array, I agree, but not for p. We won't need the object validity guarantee for (2) beyond, e.g. guarding against a new NULL pointer dereference because it's a *maximum* estimate; an invalid or NULL pointer would have 0 size. So for such cases, __bos(q, 0) could return sizeof(*q) + (q ? q->foo:0) and __bos(q->array, 0) could be sizeof(*q) + q->foo - offsetof(q, array) There's no need to guard against a dereference in the second case because the q->array dereference already assumes that q is valid. q->array should also guarantee that q is a valid pointer for minimum size, right? Or do I miss anything here? Yes. Thanks, Sid
Re: One question on the source code of tree-object-size.cc
On 2023-08-04 11:27, Qing Zhao wrote: On Aug 4, 2023, at 10:40 AM, Siddhesh Poyarekar wrote: On 2023-08-03 13:34, Qing Zhao wrote: One thing I need to point out first is, currently, even for regular fixed size array in the structure, We have this same issue, for example: #define LENGTH 10 struct fix { size_t foo; int array[LENGTH]; }; … int main () { struct fix *p; p = alloc_buf_more (); expect(__builtin_object_size(p->array, 1), LENGTH * sizeof(int)); expect(__builtin_object_size(p->array, 0), -1); } Currently, for __builtin_object_size(p->array, 0), GCC return UNKNOWN for it. This is not a special issue for flexible array member. That's fine for fixed arrays at the end of a struct because the "whole object" size could be anything; `p` could be pointing to the beginning of an array for all we know. If however `array` is strictly a flex array, i.e.: ``` struct A { size_t foo; int array[]; }; ``` then there's no way in valid C to have an array of `struct fix`, Yes!! this is exactly the place that makes difference between structures with fixed arrays and the ones with flexible arrays. With such difference, I guess that using the type of the structure with flexible array member for p->array to get the size of the whole object p point to might be reasonable? Yes, that's what I'm thinking. so `q` must be pointing to a single element. So you could deduce: 1. the minimum size of the whole object that q points to. You mean that the TYPE will determine the minimum size of the whole object? (Does this include the size of the flexible array member, or only the other part of the structure except the flexible array member?) Only the constant sized part of the structure. Actually for minimum size we'd also need a guarantee that `alloc_buf_more` returns a valid allocated object. Why? Please explain a little bit here. So `alloc_buf_more` could return NULL, a valid pointer or an invalid pointer. So, we could end up returning a non-zero minimum size for an invalid or NULL pointer, which is incorrect, we don't know that. We won't need the object validity guarantee for (2) beyond, e.g. guarding against a new NULL pointer dereference because it's a *maximum* estimate; an invalid or NULL pointer would have 0 size. So for such cases, __bos(q, 0) could return sizeof(*q) + (q ? q->foo:0) and __bos(q->array, 0) could be sizeof(*q) + q->foo - offsetof(q, array) There's no need to guard against a dereference in the second case because the q->array dereference already assumes that q is valid. and 2. if you're able to determine the size of the flex array (through __element_count__(foo) for example), you could even determine the maximum size of the whole object. For (2) though, you'd break applications that overallocate and then expect to be able to use that overallocation despite the space not being reflected in the __element_count__. I think it's a bug in the application and I can't see a way for an application to be able to do this in a valid way so I'm inclined towards breaking it. Currently, we allow the situation when the allocation size for the whole object is larger than the value reflected in the “counted_by” attribute (the old name is __element_count__). But don’t allow the other way around (i.e, when the allocation size for the whole object is smaller than the value reflected in the “counted_by” attribute. Right, that's going to be the "break". For underallocation __bos will only end up overestimating the space available, which is not ideal, but won't end up breaking compatibility. Of course, the fact that gcc allows flex arrays to be in the middle of structs breaks the base assumption but that's something we need to get rid of anyway since there's no way for valid C programs to use that safely. Since GCC14, we started to deprecate this extension (allow flex array to be in the middle of structs). https://gcc.gnu.org/pipermail/gcc-cvs/2023-June/385730.html Yes, that's what I'm banking on. Thanks, Sid
Re: One question on the source code of tree-object-size.cc
On 2023-08-04 10:40, Siddhesh Poyarekar wrote: On 2023-08-03 13:34, Qing Zhao wrote: One thing I need to point out first is, currently, even for regular fixed size array in the structure, We have this same issue, for example: #define LENGTH 10 struct fix { size_t foo; int array[LENGTH]; }; … int main () { struct fix *p; p = alloc_buf_more (); expect(__builtin_object_size(p->array, 1), LENGTH * sizeof(int)); expect(__builtin_object_size(p->array, 0), -1); } Currently, for __builtin_object_size(p->array, 0), GCC return UNKNOWN for it. This is not a special issue for flexible array member. That's fine for fixed arrays at the end of a struct because the "whole object" size could be anything; `p` could be pointing to the beginning of an array for all we know. If however `array` is strictly a flex array, i.e.: ``` struct A { size_t foo; int array[]; }; ``` then there's no way in valid C to have an array of `struct fix`, so `q` must be pointing to a single element. So you could deduce: 1. the minimum size of the whole object that q points to. Actually for minimum size we'd also need a guarantee that `alloc_buf_more` returns a valid allocated object. Sid
Re: One question on the source code of tree-object-size.cc
On 2023-08-03 13:34, Qing Zhao wrote: One thing I need to point out first is, currently, even for regular fixed size array in the structure, We have this same issue, for example: #define LENGTH 10 struct fix { size_t foo; int array[LENGTH]; }; … int main () { struct fix *p; p = alloc_buf_more (); expect(__builtin_object_size(p->array, 1), LENGTH * sizeof(int)); expect(__builtin_object_size(p->array, 0), -1); } Currently, for __builtin_object_size(p->array, 0), GCC return UNKNOWN for it. This is not a special issue for flexible array member. That's fine for fixed arrays at the end of a struct because the "whole object" size could be anything; `p` could be pointing to the beginning of an array for all we know. If however `array` is strictly a flex array, i.e.: ``` struct A { size_t foo; int array[]; }; ``` then there's no way in valid C to have an array of `struct fix`, so `q` must be pointing to a single element. So you could deduce: 1. the minimum size of the whole object that q points to. and 2. if you're able to determine the size of the flex array (through __element_count__(foo) for example), you could even determine the maximum size of the whole object. For (2) though, you'd break applications that overallocate and then expect to be able to use that overallocation despite the space not being reflected in the __element_count__. I think it's a bug in the application and I can't see a way for an application to be able to do this in a valid way so I'm inclined towards breaking it. Of course, the fact that gcc allows flex arrays to be in the middle of structs breaks the base assumption but that's something we need to get rid of anyway since there's no way for valid C programs to use that safely. Thanks, Sid
Re: One question on the source code of tree-object-size.cc
On 2023-08-03 12:43, Qing Zhao wrote: Surely we could emit that for __bdos(q->array, 0) though, couldn't we? For __bdos(q->array, 0), we only have the access info for the sub-object q->array, we can surely decide the size of the sub-object q->array, but we still cannot decide the whole object that is pointed by q (the same reason as above), right? It's tricky, I mean we could assume p to be a valid object due to the dereference and hence assume that q->foo is also valid and that there's at least sizeof(*q) + q->foo * sizeof (q->array) bytes available. The question then is whether q could be pointing to an element of an array of `struct annotated`. Could we ever have a valid array of such structs that have a flex array at the end? Wouldn't it always be a single object? In fact for all pointers to such structs with a flex array at the end, could we always assume that it is a single object and never part of an array, and hence return sizeof()? Thanks, Sid
Re: One question on the source code of tree-object-size.cc
On 2023-08-02 10:02, Qing Zhao wrote: /*when checking the observed access p->array, we only have info on the observed access, i.e, the TYPE_SIZE info from the access. We don't have info on the whole object. */ expect(__builtin_dynamic_object_size(q->array, 1), q->foo * sizeof(int)); expect(__builtin_dynamic_object_size(q->array, 0), -1); expect(__builtin_dynamic_object_size(q->array, 3), q->foo * sizeof(int)); expect(__builtin_dynamic_object_size(q->array, 2), 0); /*when checking the pointer p, we have no observed allocation nor observed access. therefore, we cannot determine the size info here. */ expect(__builtin_dynamic_object_size(q, 1), -1); expect(__builtin_dynamic_object_size(q, 0), -1); expect(__builtin_dynamic_object_size(q, 3), 0); expect(__builtin_dynamic_object_size(q, 2), 0); I'm wondering if we could sizeof (*q) + q->foo for __bdos(q, 0), but I suppose it could mean generating code that potentially dereferences an invalid pointer. Surely we could emit that for __bdos(q->array, 0) though, couldn't we? Thanks, Sid
Re: One question on the source code of tree-object-size.cc
On 2023-08-01 18:57, Kees Cook wrote: return p; } /* in the following function, malloc allocated less space than size of the struct fix. Then what's the correct behavior we expect the __builtin_object_size should have for the following? */ static struct fix * noinline alloc_buf_less () { struct fix *p; p = malloc(sizeof (struct fix) - SIZE_BUMP * sizeof (int)); /*when checking the observed access p->array, we have info on both observered allocation and observed access, A. from observed allocation (alloc_size): (LENGTH - SIZE_BUMP) * sizeof (int) B. from observed access (TYPE): LENGTH * sizeof (int) */ /* for MAXIMUM size in the whole object: currently, GCC always used the A. */ expect(__builtin_object_size(p->array, 0), (LENGTH - SIZE_BUMP) * sizeof(int)); ok: __builtin_object_size(p->array, 0) == 20 My brain just melted a little, as this is now an under-sized instance of "p", so we have an incomplete allocation. (I would expect -Warray-bounds to yell very loudly for this.) But, technically, yes, this looks like the right calculation. AFAIK, -Warray-bounds will only yell in case of a dereference that the compiler may potentially see as being beyond that 20 byte bound; it won't actually see the undersized allocation. An analyzer warning would be useful for just the undersized allocation regardless of whether the code actually ends up accessing the object beyond the allocation bounds. Thanks, Sid