Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices
On Thu, Nov 16, 2017 at 4:08 AM, Martin Seborwrote: > On 11/15/2017 03:51 AM, Richard Biener wrote: >> >> On Tue, Nov 14, 2017 at 6:45 PM, Martin Sebor wrote: >>> >>> On 11/14/2017 05:28 AM, Richard Biener wrote: On Mon, Nov 13, 2017 at 6:37 PM, Martin Sebor wrote: > > > Richard, this thread may have been conflated with the one Re: > [PATCH] enhance -Warray-bounds to detect out-of-bounds offsets > (PR 82455) They are about different things. > > I'm still looking for approval of: > > https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01208.html >>> >>> >>> >>> Sorry, I pointed to an outdated version. This is the latest >>> version: >>> >>> https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01304.html >>> >>> My bad... >>> >>> + tree maxbound + = build_int_cst (sizetype, ~(1LLU << (TYPE_PRECISION (sizetype) - 1))); this looks possibly bogus. Can you instead use up_bound_p1 = wide_int_to_tree (sizetype, wi::div_trunc (wi::max_value (TYPE_PRECISION (sizetype), SIGNED), wi::to_wide (eltsize))); please? Note you are _not_ computing the proper upper bound here because that is what you compute plus low_bound. + up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound, eltsize); + + tree arg = TREE_OPERAND (ref, 0); + tree_code code = TREE_CODE (arg); + if (code == COMPONENT_REF) + { + HOST_WIDE_INT off; + if (tree base = get_addr_base_and_unit_offset (ref, )) +{ + tree size = TYPE_SIZE_UNIT (TREE_TYPE (base)); + if (TREE_CODE (size) == INTEGER_CST) + up_bound_p1 = int_const_binop (MINUS_EXPR, up_bound_p1, size); I think I asked this multiple times now but given 'ref' is the variable array-ref a.b.c[i] when you call get_addr_base_and_unit_offset (ref, ) you always get a NULL_TREE return value. So I asked you to pass it 'arg' instead ... which gets you the offset of a.b.c, which looks like what you intended to get anyway. I also wonder what you compute here - you are looking at the size of 'base' but that is the size of 'a'. You don't even use the computed offset! Which means you could have used get_base_address instead!? Also the type of 'base' may be completely off given MEM[ + 8].b.c[i] would return blk as base which might be an array of chars and not in any way related to the type of the innermost structure we access with COMPONENT_REFs. Why are you only looking at COMPONENT_REF args anyways? You don't want to handle a.b[3][i]? That is, I'd have expected you do if (get_addr_base_and_unit_offset (ref, )) up_bound_p1 = wide_int_to_tree (sizetype, wi::sub (wi::to_wide (up_bound_p1), off)); >> >> >> ^ > > > Please see the attached update. Ok. Thanks, Richard. > Martin
Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices
On 11/15/2017 03:51 AM, Richard Biener wrote: On Tue, Nov 14, 2017 at 6:45 PM, Martin Seborwrote: On 11/14/2017 05:28 AM, Richard Biener wrote: On Mon, Nov 13, 2017 at 6:37 PM, Martin Sebor wrote: Richard, this thread may have been conflated with the one Re: [PATCH] enhance -Warray-bounds to detect out-of-bounds offsets (PR 82455) They are about different things. I'm still looking for approval of: https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01208.html Sorry, I pointed to an outdated version. This is the latest version: https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01304.html My bad... + tree maxbound + = build_int_cst (sizetype, ~(1LLU << (TYPE_PRECISION (sizetype) - 1))); this looks possibly bogus. Can you instead use up_bound_p1 = wide_int_to_tree (sizetype, wi::div_trunc (wi::max_value (TYPE_PRECISION (sizetype), SIGNED), wi::to_wide (eltsize))); please? Note you are _not_ computing the proper upper bound here because that is what you compute plus low_bound. + up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound, eltsize); + + tree arg = TREE_OPERAND (ref, 0); + tree_code code = TREE_CODE (arg); + if (code == COMPONENT_REF) + { + HOST_WIDE_INT off; + if (tree base = get_addr_base_and_unit_offset (ref, )) +{ + tree size = TYPE_SIZE_UNIT (TREE_TYPE (base)); + if (TREE_CODE (size) == INTEGER_CST) + up_bound_p1 = int_const_binop (MINUS_EXPR, up_bound_p1, size); I think I asked this multiple times now but given 'ref' is the variable array-ref a.b.c[i] when you call get_addr_base_and_unit_offset (ref, ) you always get a NULL_TREE return value. So I asked you to pass it 'arg' instead ... which gets you the offset of a.b.c, which looks like what you intended to get anyway. I also wonder what you compute here - you are looking at the size of 'base' but that is the size of 'a'. You don't even use the computed offset! Which means you could have used get_base_address instead!? Also the type of 'base' may be completely off given MEM[ + 8].b.c[i] would return blk as base which might be an array of chars and not in any way related to the type of the innermost structure we access with COMPONENT_REFs. Why are you only looking at COMPONENT_REF args anyways? You don't want to handle a.b[3][i]? That is, I'd have expected you do if (get_addr_base_and_unit_offset (ref, )) up_bound_p1 = wide_int_to_tree (sizetype, wi::sub (wi::to_wide (up_bound_p1), off)); ^ Please see the attached update. Martin PR tree-optimization/82588 - missing -Warray-bounds on a excessively large index PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds inner indic gcc/ChangeLog: PR tree-optimization/82588 PR tree-optimization/82583 * tree-vrp.c (check_array_ref): Handle flexible array members, string literals, and inner indices. (search_for_addr_array): Add detail to diagnostics. gcc/testsuite/ChangeLog: PR tree-optimization/82588 PR tree-optimization/82583 * c-c++-common/Warray-bounds.c: New test. * gcc.dg/Warray-bounds-11.c: Adjust. * gcc.dg/Warray-bounds-22.c: New test. diff --git a/gcc/testsuite/c-c++-common/Warray-bounds.c b/gcc/testsuite/c-c++-common/Warray-bounds.c new file mode 100644 index 000..bea36fb --- /dev/null +++ b/gcc/testsuite/c-c++-common/Warray-bounds.c @@ -0,0 +1,259 @@ +/* PR tree-optimization/82588 - missing -Warray-bounds on an excessively + large index + { dg-do compile } + { dg-require-effective-target alloca } + { dg-options "-O2 -Warray-bounds -ftrack-macro-expansion=0" } */ + +#define SIZE_MAX __SIZE_MAX__ +#define DIFF_MAX __PTRDIFF_MAX__ +#define DIFF_MIN (-DIFF_MAX - 1) + +#define offsetof(T, m) __builtin_offsetof (T, m) + +typedef __PTRDIFF_TYPE__ ssize_t; +typedef __SIZE_TYPE__size_t; + +extern ssize_t signed_value (void) +{ + extern volatile ssize_t signed_value_source; + return signed_value_source; +} + +extern size_t unsigned_value (void) +{ + extern volatile size_t unsigned_value_source; + return unsigned_value_source; +} + +ssize_t signed_range (ssize_t min, ssize_t max) +{ + ssize_t val = signed_value (); + return val < min || max < val ? min : val; +} + +typedef struct AX { int n; char ax[]; } AX; + +typedef struct A1 { int i; char a1[1]; } A1; +typedef struct B { int i; struct A1 a1x[]; } B; + +void sink (int, ...); + +#define R(min, max) signed_range (min, max) +#define T(expr) sink (0, expr) + +struct __attribute__ ((packed)) S16 { unsigned i: 16; }; + +void farr_char (void) +{ + extern char ac[]; + + T (ac[DIFF_MIN]); /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .char *\\\[]." } */ + T (ac[-1]); /* { dg-warning "array subscript -1 is below array bounds" } */ + T (ac[0]); + + T (ac[DIFF_MAX - 1]); + T (ac[DIFF_MAX]); /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */ + T
Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices
On 11/15/2017 03:51 AM, Richard Biener wrote: On Tue, Nov 14, 2017 at 6:45 PM, Martin Seborwrote: On 11/14/2017 05:28 AM, Richard Biener wrote: On Mon, Nov 13, 2017 at 6:37 PM, Martin Sebor wrote: Richard, this thread may have been conflated with the one Re: [PATCH] enhance -Warray-bounds to detect out-of-bounds offsets (PR 82455) They are about different things. I'm still looking for approval of: https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01208.html Sorry, I pointed to an outdated version. This is the latest version: https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01304.html My bad... + tree maxbound + = build_int_cst (sizetype, ~(1LLU << (TYPE_PRECISION (sizetype) - 1))); this looks possibly bogus. Can you instead use up_bound_p1 = wide_int_to_tree (sizetype, wi::div_trunc (wi::max_value (TYPE_PRECISION (sizetype), SIGNED), wi::to_wide (eltsize))); please? Note you are _not_ computing the proper upper bound here because that is what you compute plus low_bound. + up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound, eltsize); + + tree arg = TREE_OPERAND (ref, 0); + tree_code code = TREE_CODE (arg); + if (code == COMPONENT_REF) + { + HOST_WIDE_INT off; + if (tree base = get_addr_base_and_unit_offset (ref, )) +{ + tree size = TYPE_SIZE_UNIT (TREE_TYPE (base)); + if (TREE_CODE (size) == INTEGER_CST) + up_bound_p1 = int_const_binop (MINUS_EXPR, up_bound_p1, size); I think I asked this multiple times now but given 'ref' is the variable array-ref a.b.c[i] when you call get_addr_base_and_unit_offset (ref, ) you always get a NULL_TREE return value. So I asked you to pass it 'arg' instead ... which gets you the offset of a.b.c, which looks like what you intended to get anyway. I also wonder what you compute here - you are looking at the size of 'base' but that is the size of 'a'. You don't even use the computed offset! Which means you could have used get_base_address instead!? Also the type of 'base' may be completely off given MEM[ + 8].b.c[i] would return blk as base which might be an array of chars and not in any way related to the type of the innermost structure we access with COMPONENT_REFs. Why are you only looking at COMPONENT_REF args anyways? You don't want to handle a.b[3][i]? That is, I'd have expected you do if (get_addr_base_and_unit_offset (ref, )) up_bound_p1 = wide_int_to_tree (sizetype, wi::sub (wi::to_wide (up_bound_p1), off)); ^
Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices
On Tue, Nov 14, 2017 at 6:45 PM, Martin Seborwrote: > On 11/14/2017 05:28 AM, Richard Biener wrote: >> >> On Mon, Nov 13, 2017 at 6:37 PM, Martin Sebor wrote: >>> >>> Richard, this thread may have been conflated with the one Re: >>> [PATCH] enhance -Warray-bounds to detect out-of-bounds offsets >>> (PR 82455) They are about different things. >>> >>> I'm still looking for approval of: >>> >>> https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01208.html > > > Sorry, I pointed to an outdated version. This is the latest > version: > > https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01304.html > > My bad... > > >> >> + tree maxbound >> + = build_int_cst (sizetype, ~(1LLU << (TYPE_PRECISION (sizetype) - 1))); >> >> this looks possibly bogus. Can you instead use >> >> up_bound_p1 >> = wide_int_to_tree (sizetype, wi::div_trunc (wi::max_value >> (TYPE_PRECISION (sizetype), SIGNED), wi::to_wide (eltsize))); >> >> please? Note you are _not_ computing the proper upper bound here because >> that >> is what you compute plus low_bound. >> >> + up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound, eltsize); >> >> + >> + tree arg = TREE_OPERAND (ref, 0); >> + tree_code code = TREE_CODE (arg); >> + if (code == COMPONENT_REF) >> + { >> + HOST_WIDE_INT off; >> + if (tree base = get_addr_base_and_unit_offset (ref, )) >> +{ >> + tree size = TYPE_SIZE_UNIT (TREE_TYPE (base)); >> + if (TREE_CODE (size) == INTEGER_CST) >> + up_bound_p1 = int_const_binop (MINUS_EXPR, up_bound_p1, size); >> >> I think I asked this multiple times now but given 'ref' is the >> variable array-ref >> a.b.c[i] when you call get_addr_base_and_unit_offset (ref, ) you >> always >> get a NULL_TREE return value. >> >> So I asked you to pass it 'arg' instead ... which gets you the offset of >> a.b.c, which looks like what you intended to get anyway. >> >> I also wonder what you compute here - you are looking at the size of >> 'base' >> but that is the size of 'a'. You don't even use the computed offset! >> Which >> means you could have used get_base_address instead!? Also the type >> of 'base' may be completely off given MEM[ + 8].b.c[i] would return >> blk >> as base which might be an array of chars and not in any way related to >> the type of the innermost structure we access with COMPONENT_REFs. >> >> Why are you only looking at COMPONENT_REF args anyways? You >> don't want to handle a.b[3][i]? >> >> That is, I'd have expected you do >> >>if (get_addr_base_and_unit_offset (ref, )) >> up_bound_p1 = wide_int_to_tree (sizetype, wi::sub (wi::to_wide >> (up_bound_p1), off)); ^ Richard. >> Richard. >> >>> Thanks >>> Martin >>> >>> > The difficulty with a testcase like > > struct { struct A { int b[1]; } a[5]; } x; > > x.a[i].b[j] > > is that b is not considered an array at struct end since one of my > recent changes to array_at_struct_end (basically it disallows having > a flex array as the last member of an array). > > It would still stand for non-array components with variable offset > but you can't create C testcases for that. > > So yes, for the specific case within the array_at_struct_end_p > condition > get_addr_base_and_unit_offset is enough. IIRC the conditon was > a bit more than just get_addr_base_and_unit_offset. up_bound != > INTEGER_CST for example. So make the above > > void foo (int n, int i) > { > struct { struct A { int b[n]; } a[5]; } x; > return x.a[i].b[PTRDIFF_MAX/2]; > } > > with appropriately adjusted constant. Does that give you the testcase > you want? Thank you for the test case. It is diagnosed the same way irrespective of which of the two functions is used so it serves to confirm my understanding that the only difference between the two functions is bits vs bytes. Unless you have another test case that does demonstrate that get_ref_base_and_extent is necessary/helpful, is the last patch okay to commit? (Again, to be clear, I'm happy to change or enhance the patch if I can verify that the change handles cases that the current patch misses.) > > As of "it works, catches corner-cases, ..." - yes, it does, but it > adds code that needs to be maintained, may contain bugs, is > executed even for valid code. Understood. I don't claim the enhancement is free of any cost whatsoever. But it is teeny by most standards and it doesn't detect just excessively large indices but also negative indices into last member arrays (bug 68325) and out-of-bounds indices (bug 82583). The "excessively large" part does come largely for free with the other checks. Martin >>> >>> >>> >
Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices
On 11/14/2017 05:28 AM, Richard Biener wrote: On Mon, Nov 13, 2017 at 6:37 PM, Martin Seborwrote: Richard, this thread may have been conflated with the one Re: [PATCH] enhance -Warray-bounds to detect out-of-bounds offsets (PR 82455) They are about different things. I'm still looking for approval of: https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01208.html Sorry, I pointed to an outdated version. This is the latest version: https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01304.html My bad... + tree maxbound + = build_int_cst (sizetype, ~(1LLU << (TYPE_PRECISION (sizetype) - 1))); this looks possibly bogus. Can you instead use up_bound_p1 = wide_int_to_tree (sizetype, wi::div_trunc (wi::max_value (TYPE_PRECISION (sizetype), SIGNED), wi::to_wide (eltsize))); please? Note you are _not_ computing the proper upper bound here because that is what you compute plus low_bound. + up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound, eltsize); + + tree arg = TREE_OPERAND (ref, 0); + tree_code code = TREE_CODE (arg); + if (code == COMPONENT_REF) + { + HOST_WIDE_INT off; + if (tree base = get_addr_base_and_unit_offset (ref, )) +{ + tree size = TYPE_SIZE_UNIT (TREE_TYPE (base)); + if (TREE_CODE (size) == INTEGER_CST) + up_bound_p1 = int_const_binop (MINUS_EXPR, up_bound_p1, size); I think I asked this multiple times now but given 'ref' is the variable array-ref a.b.c[i] when you call get_addr_base_and_unit_offset (ref, ) you always get a NULL_TREE return value. So I asked you to pass it 'arg' instead ... which gets you the offset of a.b.c, which looks like what you intended to get anyway. I also wonder what you compute here - you are looking at the size of 'base' but that is the size of 'a'. You don't even use the computed offset! Which means you could have used get_base_address instead!? Also the type of 'base' may be completely off given MEM[ + 8].b.c[i] would return blk as base which might be an array of chars and not in any way related to the type of the innermost structure we access with COMPONENT_REFs. Why are you only looking at COMPONENT_REF args anyways? You don't want to handle a.b[3][i]? That is, I'd have expected you do if (get_addr_base_and_unit_offset (ref, )) up_bound_p1 = wide_int_to_tree (sizetype, wi::sub (wi::to_wide (up_bound_p1), off)); Richard. Thanks Martin The difficulty with a testcase like struct { struct A { int b[1]; } a[5]; } x; x.a[i].b[j] is that b is not considered an array at struct end since one of my recent changes to array_at_struct_end (basically it disallows having a flex array as the last member of an array). It would still stand for non-array components with variable offset but you can't create C testcases for that. So yes, for the specific case within the array_at_struct_end_p condition get_addr_base_and_unit_offset is enough. IIRC the conditon was a bit more than just get_addr_base_and_unit_offset. up_bound != INTEGER_CST for example. So make the above void foo (int n, int i) { struct { struct A { int b[n]; } a[5]; } x; return x.a[i].b[PTRDIFF_MAX/2]; } with appropriately adjusted constant. Does that give you the testcase you want? Thank you for the test case. It is diagnosed the same way irrespective of which of the two functions is used so it serves to confirm my understanding that the only difference between the two functions is bits vs bytes. Unless you have another test case that does demonstrate that get_ref_base_and_extent is necessary/helpful, is the last patch okay to commit? (Again, to be clear, I'm happy to change or enhance the patch if I can verify that the change handles cases that the current patch misses.) As of "it works, catches corner-cases, ..." - yes, it does, but it adds code that needs to be maintained, may contain bugs, is executed even for valid code. Understood. I don't claim the enhancement is free of any cost whatsoever. But it is teeny by most standards and it doesn't detect just excessively large indices but also negative indices into last member arrays (bug 68325) and out-of-bounds indices (bug 82583). The "excessively large" part does come largely for free with the other checks. Martin
Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices
On Mon, Nov 13, 2017 at 6:37 PM, Martin Seborwrote: > Richard, this thread may have been conflated with the one Re: > [PATCH] enhance -Warray-bounds to detect out-of-bounds offsets > (PR 82455) They are about different things. > > I'm still looking for approval of: > > https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01208.html + tree maxbound + = build_int_cst (sizetype, ~(1LLU << (TYPE_PRECISION (sizetype) - 1))); this looks possibly bogus. Can you instead use up_bound_p1 = wide_int_to_tree (sizetype, wi::div_trunc (wi::max_value (TYPE_PRECISION (sizetype), SIGNED), wi::to_wide (eltsize))); please? Note you are _not_ computing the proper upper bound here because that is what you compute plus low_bound. + up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound, eltsize); + + tree arg = TREE_OPERAND (ref, 0); + tree_code code = TREE_CODE (arg); + if (code == COMPONENT_REF) + { + HOST_WIDE_INT off; + if (tree base = get_addr_base_and_unit_offset (ref, )) +{ + tree size = TYPE_SIZE_UNIT (TREE_TYPE (base)); + if (TREE_CODE (size) == INTEGER_CST) + up_bound_p1 = int_const_binop (MINUS_EXPR, up_bound_p1, size); I think I asked this multiple times now but given 'ref' is the variable array-ref a.b.c[i] when you call get_addr_base_and_unit_offset (ref, ) you always get a NULL_TREE return value. So I asked you to pass it 'arg' instead ... which gets you the offset of a.b.c, which looks like what you intended to get anyway. I also wonder what you compute here - you are looking at the size of 'base' but that is the size of 'a'. You don't even use the computed offset! Which means you could have used get_base_address instead!? Also the type of 'base' may be completely off given MEM[ + 8].b.c[i] would return blk as base which might be an array of chars and not in any way related to the type of the innermost structure we access with COMPONENT_REFs. Why are you only looking at COMPONENT_REF args anyways? You don't want to handle a.b[3][i]? That is, I'd have expected you do if (get_addr_base_and_unit_offset (ref, )) up_bound_p1 = wide_int_to_tree (sizetype, wi::sub (wi::to_wide (up_bound_p1), off)); Richard. > Thanks > Martin > > >>> The difficulty with a testcase like >>> >>> struct { struct A { int b[1]; } a[5]; } x; >>> >>> x.a[i].b[j] >>> >>> is that b is not considered an array at struct end since one of my >>> recent changes to array_at_struct_end (basically it disallows having >>> a flex array as the last member of an array). >>> >>> It would still stand for non-array components with variable offset >>> but you can't create C testcases for that. >>> >>> So yes, for the specific case within the array_at_struct_end_p condition >>> get_addr_base_and_unit_offset is enough. IIRC the conditon was >>> a bit more than just get_addr_base_and_unit_offset. up_bound != >>> INTEGER_CST for example. So make the above >>> >>> void foo (int n, int i) >>> { >>> struct { struct A { int b[n]; } a[5]; } x; >>> return x.a[i].b[PTRDIFF_MAX/2]; >>> } >>> >>> with appropriately adjusted constant. Does that give you the testcase >>> you want? >> >> >> Thank you for the test case. It is diagnosed the same way >> irrespective of which of the two functions is used so it serves >> to confirm my understanding that the only difference between >> the two functions is bits vs bytes. >> >> Unless you have another test case that does demonstrate that >> get_ref_base_and_extent is necessary/helpful, is the last patch >> okay to commit? >> >> (Again, to be clear, I'm happy to change or enhance the patch if >> I can verify that the change handles cases that the current patch >> misses.) >> >>> >>> As of "it works, catches corner-cases, ..." - yes, it does, but it >>> adds code that needs to be maintained, may contain bugs, is >>> executed even for valid code. >> >> >> Understood. I don't claim the enhancement is free of any cost >> whatsoever. But it is teeny by most standards and it doesn't >> detect just excessively large indices but also negative indices >> into last member arrays (bug 68325) and out-of-bounds indices >> (bug 82583). The "excessively large" part does come largely >> for free with the other checks. >> >> Martin > >
Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices
Richard, this thread may have been conflated with the one Re: [PATCH] enhance -Warray-bounds to detect out-of-bounds offsets (PR 82455) They are about different things. I'm still looking for approval of: https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01208.html Thanks Martin The difficulty with a testcase like struct { struct A { int b[1]; } a[5]; } x; x.a[i].b[j] is that b is not considered an array at struct end since one of my recent changes to array_at_struct_end (basically it disallows having a flex array as the last member of an array). It would still stand for non-array components with variable offset but you can't create C testcases for that. So yes, for the specific case within the array_at_struct_end_p condition get_addr_base_and_unit_offset is enough. IIRC the conditon was a bit more than just get_addr_base_and_unit_offset. up_bound != INTEGER_CST for example. So make the above void foo (int n, int i) { struct { struct A { int b[n]; } a[5]; } x; return x.a[i].b[PTRDIFF_MAX/2]; } with appropriately adjusted constant. Does that give you the testcase you want? Thank you for the test case. It is diagnosed the same way irrespective of which of the two functions is used so it serves to confirm my understanding that the only difference between the two functions is bits vs bytes. Unless you have another test case that does demonstrate that get_ref_base_and_extent is necessary/helpful, is the last patch okay to commit? (Again, to be clear, I'm happy to change or enhance the patch if I can verify that the change handles cases that the current patch misses.) As of "it works, catches corner-cases, ..." - yes, it does, but it adds code that needs to be maintained, may contain bugs, is executed even for valid code. Understood. I don't claim the enhancement is free of any cost whatsoever. But it is teeny by most standards and it doesn't detect just excessively large indices but also negative indices into last member arrays (bug 68325) and out-of-bounds indices (bug 82583). The "excessively large" part does come largely for free with the other checks. Martin
Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices
On 11/07/2017 03:18 AM, Richard Biener wrote: On Tue, Nov 7, 2017 at 4:23 AM, Martin Seborwrote: On 11/06/2017 11:41 AM, Jeff Law wrote: On 10/29/2017 10:15 AM, Martin Sebor wrote: Ping -- please see my reply below. On 10/20/2017 09:57 AM, Richard Biener wrote: get_addr_base_and_unit_offset will return NULL if there's any variable component in 'ref'. So as written it seems to be dead code (you want to pass 'arg'?) Sorry, I'm not sure I understand what you mean. What do you think is dead code? The call to get_addr_base_and_unit_offset() is also made for an array of unspecified bound (up_bound is null) and for an array at the end of a struct. For those the function returns non-null, and for the others (arrays of runtime bound) it returns null. (I passed arg instead of ref but I see no difference in my tests.) If you pass a.b.c[i] it will return NULL, if you pass a.b.c ('arg') it will return the offset of 'c'. If you pass a.b[j].c it will still return NULL. You could use get_ref_base_and_extent which will return the offset of a.b[0].c in this case and sets max_size != size - but you are only interested in offset. The disadvantage of get_ref_base_and_extent is it returns offset in bits thus if the offset is too large for a HWI you'll instead get offset == 0 and max_size == -1. Thus I'm saying this is dead code for variable array accesses (even for the array you are warning about). Yes, for constant index and at-struct-end you'll get sth, but the warning is in VRP because of variable indexes. So I suggest to pass 'arg' and use get_ref_base_and_extent for some extra precision (and possible lossage for very very large structures). Computing bit offsets defeats the out-of-bounds flexible array index detection because the computation overflows (the function sets the offset to zero). It shouldn't if you pass arg rather than ref. I suspect you missed my reply in IRC where I confirmed that this approach doesn't work for the reason you yourself mentioned above: The disadvantage of get_ref_base_and_extent is it returns offset in bits thus if the offset is too large for a HWI you'll instead get offset == 0 and max_size == -1. This means that using the function you suggest would either prevent detecting the out-of-bounds indices that overflow due to the bit offset, thus largely defeating the purpose of the patch (to detect excessively large indices), or not printing the value of the out-of bounds index in the warning in this case, which would in turn mean further changes to the rest of the logic. I think Richi's point is that it's more important (in his opinion) to handle the varying objects within an access like a.b.c[i] than to handle cases that would overflow when converted to bits. Thus he'd prefer to use get_ref_base_and_extent over get_addr_base_and_unit_offset. I wonder if a hybrid approach here would work. ie, use get_ref_base_and_extent per Richi's request. When that returns an max_size of -1, then call get_addr_base_and_unit_offset and if it returns that the offset is huge, then warn without any deeper analysis. Yea, this could trip a false positive if someone makes an array that is half the address space (give or take), but that's probably not at all common. Whereas the additional cases handled by get_ref_base_and_extent are perhaps more useful. Thoughts? I'm always open to suggestions for improvements. I just haven't been able to construct a test case to demonstrate the improvement gained by using get_ref_base_and_extent. The only test cases I've come up with show false positives. It could very well be because of my lack of experience with the APIs so if there is a trade-off here that could be compensated for by the approach you suggest I'm happy to incorporate it into the patch. I just need a test case to verify that the solution works as intended. The difficulty with a testcase like struct { struct A { int b[1]; } a[5]; } x; x.a[i].b[j] is that b is not considered an array at struct end since one of my recent changes to array_at_struct_end (basically it disallows having a flex array as the last member of an array). It would still stand for non-array components with variable offset but you can't create C testcases for that. So yes, for the specific case within the array_at_struct_end_p condition get_addr_base_and_unit_offset is enough. IIRC the conditon was a bit more than just get_addr_base_and_unit_offset. up_bound != INTEGER_CST for example. So make the above void foo (int n, int i) { struct { struct A { int b[n]; } a[5]; } x; return x.a[i].b[PTRDIFF_MAX/2]; } with appropriately adjusted constant. Does that give you the testcase you want? Thank you for the test case. It is diagnosed the same way irrespective of which of the two functions is used so it serves to confirm my understanding that the only difference between the two functions is bits vs bytes. Unless you have another test case that
Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices
On Tue, Nov 7, 2017 at 4:23 AM, Martin Seborwrote: > On 11/06/2017 11:41 AM, Jeff Law wrote: >> >> On 10/29/2017 10:15 AM, Martin Sebor wrote: >>> >>> Ping -- please see my reply below. >>> >>> On 10/20/2017 09:57 AM, Richard Biener wrote: get_addr_base_and_unit_offset will return NULL if there's any > > variable component in 'ref'. So as written it seems to be dead code (you want to pass 'arg'?) >>> >>> >>> >>> Sorry, I'm not sure I understand what you mean. What do you think >>> is dead code? The call to get_addr_base_and_unit_offset() is also >>> made for an array of unspecified bound (up_bound is null) and for >>> an array at the end of a struct. For those the function returns >>> non-null, and for the others (arrays of runtime bound) it returns >>> null. (I passed arg instead of ref but I see no difference in >>> my tests.) >> >> >> If you pass a.b.c[i] it will return NULL, if you pass a.b.c ('arg') > > it will >> >> return the offset of 'c'. If you pass a.b[j].c it will still return > > NULL. >> >> You could use get_ref_base_and_extent which will return the offset >> of a.b[0].c in this case and sets max_size != size - but you are only >> interested in offset. The disadvantage of get_ref_base_and_extent >> is it returns offset in bits thus if the offset is too large for a > > HWI >> >> you'll instead get offset == 0 and max_size == -1. >> >> Thus I'm saying this is dead code for variable array accesses >> (even for the array you are warning about). Yes, for constant index >> and at-struct-end you'll get sth, but the warning is in VRP because >> of variable indexes. >> >> So I suggest to pass 'arg' and use get_ref_base_and_extent >> for some extra precision (and possible lossage for very very large >> structures). > > > Computing bit offsets defeats the out-of-bounds flexible array > index detection because the computation overflows (the function > sets the offset to zero). It shouldn't if you pass arg rather than ref. >>> >>> >>> I suspect you missed my reply in IRC where I confirmed that this >>> approach doesn't work for the reason you yourself mentioned above: >>> >> The disadvantage of get_ref_base_and_extent >> is it returns offset in bits thus if the offset is too large >> for a HWI you'll instead get offset == 0 and max_size == -1. >>> >>> >>> This means that using the function you suggest would either prevent >>> detecting the out-of-bounds indices that overflow due to the bit >>> offset, thus largely defeating the purpose of the patch (to detect >>> excessively large indices), or not printing the value of the out-of >>> bounds index in the warning in this case, which would in turn mean >>> further changes to the rest of the logic. >> >> I think Richi's point is that it's more important (in his opinion) to >> handle the varying objects within an access like a.b.c[i] than to handle >> cases that would overflow when converted to bits. Thus he'd prefer to >> use get_ref_base_and_extent over get_addr_base_and_unit_offset. >> >> I wonder if a hybrid approach here would work. >> >> ie, use get_ref_base_and_extent per Richi's request. When that returns >> an max_size of -1, then call get_addr_base_and_unit_offset and if it >> returns that the offset is huge, then warn without any deeper analysis. >> >> Yea, this could trip a false positive if someone makes an array that is >> half the address space (give or take), but that's probably not at all >> common. Whereas the additional cases handled by get_ref_base_and_extent >> are perhaps more useful. >> >> Thoughts? > > > I'm always open to suggestions for improvements. I just haven't > been able to construct a test case to demonstrate the improvement > gained by using get_ref_base_and_extent. The only test cases I've > come up with show false positives. It could very well be because > of my lack of experience with the APIs so if there is a trade-off > here that could be compensated for by the approach you suggest I'm > happy to incorporate it into the patch. I just need a test case > to verify that the solution works as intended. The difficulty with a testcase like struct { struct A { int b[1]; } a[5]; } x; x.a[i].b[j] is that b is not considered an array at struct end since one of my recent changes to array_at_struct_end (basically it disallows having a flex array as the last member of an array). It would still stand for non-array components with variable offset but you can't create C testcases for that. So yes, for the specific case within the array_at_struct_end_p condition get_addr_base_and_unit_offset is enough. IIRC the conditon was a bit more than just get_addr_base_and_unit_offset. up_bound != INTEGER_CST for example. So make the above void foo (int n, int
Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices
On 11/06/2017 11:41 AM, Jeff Law wrote: On 10/29/2017 10:15 AM, Martin Sebor wrote: Ping -- please see my reply below. On 10/20/2017 09:57 AM, Richard Biener wrote: get_addr_base_and_unit_offset will return NULL if there's any variable component in 'ref'. So as written it seems to be dead code (you want to pass 'arg'?) Sorry, I'm not sure I understand what you mean. What do you think is dead code? The call to get_addr_base_and_unit_offset() is also made for an array of unspecified bound (up_bound is null) and for an array at the end of a struct. For those the function returns non-null, and for the others (arrays of runtime bound) it returns null. (I passed arg instead of ref but I see no difference in my tests.) If you pass a.b.c[i] it will return NULL, if you pass a.b.c ('arg') it will return the offset of 'c'. If you pass a.b[j].c it will still return NULL. You could use get_ref_base_and_extent which will return the offset of a.b[0].c in this case and sets max_size != size - but you are only interested in offset. The disadvantage of get_ref_base_and_extent is it returns offset in bits thus if the offset is too large for a HWI you'll instead get offset == 0 and max_size == -1. Thus I'm saying this is dead code for variable array accesses (even for the array you are warning about). Yes, for constant index and at-struct-end you'll get sth, but the warning is in VRP because of variable indexes. So I suggest to pass 'arg' and use get_ref_base_and_extent for some extra precision (and possible lossage for very very large structures). Computing bit offsets defeats the out-of-bounds flexible array index detection because the computation overflows (the function sets the offset to zero). It shouldn't if you pass arg rather than ref. I suspect you missed my reply in IRC where I confirmed that this approach doesn't work for the reason you yourself mentioned above: The disadvantage of get_ref_base_and_extent is it returns offset in bits thus if the offset is too large for a HWI you'll instead get offset == 0 and max_size == -1. This means that using the function you suggest would either prevent detecting the out-of-bounds indices that overflow due to the bit offset, thus largely defeating the purpose of the patch (to detect excessively large indices), or not printing the value of the out-of bounds index in the warning in this case, which would in turn mean further changes to the rest of the logic. I think Richi's point is that it's more important (in his opinion) to handle the varying objects within an access like a.b.c[i] than to handle cases that would overflow when converted to bits. Thus he'd prefer to use get_ref_base_and_extent over get_addr_base_and_unit_offset. I wonder if a hybrid approach here would work. ie, use get_ref_base_and_extent per Richi's request. When that returns an max_size of -1, then call get_addr_base_and_unit_offset and if it returns that the offset is huge, then warn without any deeper analysis. Yea, this could trip a false positive if someone makes an array that is half the address space (give or take), but that's probably not at all common. Whereas the additional cases handled by get_ref_base_and_extent are perhaps more useful. Thoughts? I'm always open to suggestions for improvements. I just haven't been able to construct a test case to demonstrate the improvement gained by using get_ref_base_and_extent. The only test cases I've come up with show false positives. It could very well be because of my lack of experience with the APIs so if there is a trade-off here that could be compensated for by the approach you suggest I'm happy to incorporate it into the patch. I just need a test case to verify that the solution works as intended. Martin
Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices
On 10/30/2017 06:02 AM, Richard Biener wrote: On Sun, Oct 29, 2017 at 5:15 PM, Martin Seborwrote: Ping -- please see my reply below. On 10/20/2017 09:57 AM, Richard Biener wrote: get_addr_base_and_unit_offset will return NULL if there's any variable component in 'ref'. So as written it seems to be dead code (you want to pass 'arg'?) Sorry, I'm not sure I understand what you mean. What do you think is dead code? The call to get_addr_base_and_unit_offset() is also made for an array of unspecified bound (up_bound is null) and for an array at the end of a struct. For those the function returns non-null, and for the others (arrays of runtime bound) it returns null. (I passed arg instead of ref but I see no difference in my tests.) If you pass a.b.c[i] it will return NULL, if you pass a.b.c ('arg') it will return the offset of 'c'. If you pass a.b[j].c it will still return NULL. You could use get_ref_base_and_extent which will return the offset of a.b[0].c in this case and sets max_size != size - but you are only interested in offset. The disadvantage of get_ref_base_and_extent is it returns offset in bits thus if the offset is too large for a HWI you'll instead get offset == 0 and max_size == -1. Thus I'm saying this is dead code for variable array accesses (even for the array you are warning about). Yes, for constant index and at-struct-end you'll get sth, but the warning is in VRP because of variable indexes. So I suggest to pass 'arg' and use get_ref_base_and_extent for some extra precision (and possible lossage for very very large structures). Computing bit offsets defeats the out-of-bounds flexible array index detection because the computation overflows (the function sets the offset to zero). It shouldn't if you pass arg rather than ref. I suspect you missed my reply in IRC where I confirmed that this approach doesn't work for the reason you yourself mentioned above: The disadvantage of get_ref_base_and_extent is it returns offset in bits thus if the offset is too large for a HWI you'll instead get offset == 0 and max_size == -1. This means that using the function you suggest would either prevent detecting the out-of-bounds indices that overflow due to the bit offset, thus largely defeating the purpose of the patch (to detect excessively large indices), or not printing the value of the out-of bounds index in the warning in this case, which would in turn mean further changes to the rest of the logic. Well, it would not handle the case of a.b[offset-bringing-you-bitoffset-overflow].c[i] but it would handle (compared to your version of the patch) a.b[j].c[i] with i being the unreasonably large offset. That's beause we look at the bit offset of a.b[j].c thus the _start_ of the array. AFAICS, using get_ref_base_and_extent only results in missing a set of cases that get_addr_base_and_unit_offset doesn't. It's entirely possible I'm missing something but I can find no other difference or construct a test case where an out-of-bounds index is detected with the former that isn't with the latter. Here's an example: #define MAX (__PTRDIFF_MAX__ - 256) struct A { char a[256]; char ax[]; }; int f (struct A *p, unsigned long i) { if (i < MAX + 1) i = MAX + 1; // bug: sizeof *p + i > PTRDIFF_MAX return p->ax[i]; // missing -Warray-bounds } This is because the offset of the array returned by get_ref_base_and_extent when the bit offset computation overflows is reset to zero. So the function seems strictly worse in this context that the other. Am I missing something that makes this a worthwhile tradeoff? (If so, can you please show a test case?) I still find the half-address-space case totally pointless to warn about (even more to get "precise" here by subtracting the offset of the array). There's not a single testcase that looks motivating to me (like an error with some signed->unsigned conversion and then GCC magically figuring out a range you'd warn for) -Warray-bounds is documented to ...warn about subscripts to arrays that are always out of bounds. Excessively large indices that would imply an object in excess of PTRDIFF_MAX bytes are always out of bounds, so expecting them to be diagnosed consistently, even in corner cases, is in line with the documentation (and presumably also the intent) of the warning. Other compilers (Clang and Intel ICC) already detect more of these kinds of problems than GCC does. In addition, C++ requires them to be diagnosed in constexpr contexts. Improving GCC to match and even exceed these other compilers is simple and inexpensive, and makes GCC a better implementation. I welcome suggestions to improve the patch (or submit a follow up one) and I'm open to arguments that one solution is superior than another but I don't understand your objection to a working solution on the grounds that it handles corner cases. All bugs are corner cases that someone didn't get
Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices
On 10/29/2017 10:15 AM, Martin Sebor wrote: > Ping -- please see my reply below. > > On 10/20/2017 09:57 AM, Richard Biener wrote: >> get_addr_base_and_unit_offset will return NULL if there's any >>> variable >> component in 'ref'. So as written it seems to be dead code (you >> want to pass 'arg'?) > > > Sorry, I'm not sure I understand what you mean. What do you think > is dead code? The call to get_addr_base_and_unit_offset() is also > made for an array of unspecified bound (up_bound is null) and for > an array at the end of a struct. For those the function returns > non-null, and for the others (arrays of runtime bound) it returns > null. (I passed arg instead of ref but I see no difference in > my tests.) If you pass a.b.c[i] it will return NULL, if you pass a.b.c ('arg') >>> it will return the offset of 'c'. If you pass a.b[j].c it will still return >>> NULL. You could use get_ref_base_and_extent which will return the offset of a.b[0].c in this case and sets max_size != size - but you are only interested in offset. The disadvantage of get_ref_base_and_extent is it returns offset in bits thus if the offset is too large for a >>> HWI you'll instead get offset == 0 and max_size == -1. Thus I'm saying this is dead code for variable array accesses (even for the array you are warning about). Yes, for constant index and at-struct-end you'll get sth, but the warning is in VRP because of variable indexes. So I suggest to pass 'arg' and use get_ref_base_and_extent for some extra precision (and possible lossage for very very large structures). >>> >>> Computing bit offsets defeats the out-of-bounds flexible array >>> index detection because the computation overflows (the function >>> sets the offset to zero). >> >> It shouldn't if you pass arg rather than ref. > > I suspect you missed my reply in IRC where I confirmed that this > approach doesn't work for the reason you yourself mentioned above: > The disadvantage of get_ref_base_and_extent is it returns offset in bits thus if the offset is too large for a HWI you'll instead get offset == 0 and max_size == -1. > > This means that using the function you suggest would either prevent > detecting the out-of-bounds indices that overflow due to the bit > offset, thus largely defeating the purpose of the patch (to detect > excessively large indices), or not printing the value of the out-of > bounds index in the warning in this case, which would in turn mean > further changes to the rest of the logic. I think Richi's point is that it's more important (in his opinion) to handle the varying objects within an access like a.b.c[i] than to handle cases that would overflow when converted to bits. Thus he'd prefer to use get_ref_base_and_extent over get_addr_base_and_unit_offset. I wonder if a hybrid approach here would work. ie, use get_ref_base_and_extent per Richi's request. When that returns an max_size of -1, then call get_addr_base_and_unit_offset and if it returns that the offset is huge, then warn without any deeper analysis. Yea, this could trip a false positive if someone makes an array that is half the address space (give or take), but that's probably not at all common. Whereas the additional cases handled by get_ref_base_and_extent are perhaps more useful. Thoughts? jeff
Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices
On Sun, Oct 29, 2017 at 5:15 PM, Martin Seborwrote: > Ping -- please see my reply below. > > > On 10/20/2017 09:57 AM, Richard Biener wrote: >> >> get_addr_base_and_unit_offset will return NULL if there's any >>> >>> variable >> >> component in 'ref'. So as written it seems to be dead code (you >> want to pass 'arg'?) > > > > Sorry, I'm not sure I understand what you mean. What do you think > is dead code? The call to get_addr_base_and_unit_offset() is also > made for an array of unspecified bound (up_bound is null) and for > an array at the end of a struct. For those the function returns > non-null, and for the others (arrays of runtime bound) it returns > null. (I passed arg instead of ref but I see no difference in > my tests.) If you pass a.b.c[i] it will return NULL, if you pass a.b.c ('arg') >>> >>> it will return the offset of 'c'. If you pass a.b[j].c it will still return >>> >>> NULL. You could use get_ref_base_and_extent which will return the offset of a.b[0].c in this case and sets max_size != size - but you are only interested in offset. The disadvantage of get_ref_base_and_extent is it returns offset in bits thus if the offset is too large for a >>> >>> HWI you'll instead get offset == 0 and max_size == -1. Thus I'm saying this is dead code for variable array accesses (even for the array you are warning about). Yes, for constant index and at-struct-end you'll get sth, but the warning is in VRP because of variable indexes. So I suggest to pass 'arg' and use get_ref_base_and_extent for some extra precision (and possible lossage for very very large structures). >>> >>> >>> Computing bit offsets defeats the out-of-bounds flexible array >>> index detection because the computation overflows (the function >>> sets the offset to zero). >> >> >> It shouldn't if you pass arg rather than ref. > > > I suspect you missed my reply in IRC where I confirmed that this > approach doesn't work for the reason you yourself mentioned above: > The disadvantage of get_ref_base_and_extent is it returns offset in bits thus if the offset is too large for a HWI you'll instead get offset == 0 and max_size == -1. > > This means that using the function you suggest would either prevent > detecting the out-of-bounds indices that overflow due to the bit > offset, thus largely defeating the purpose of the patch (to detect > excessively large indices), or not printing the value of the out-of > bounds index in the warning in this case, which would in turn mean > further changes to the rest of the logic. Well, it would not handle the case of a.b[offset-bringing-you-bitoffset-overflow].c[i] but it would handle (compared to your version of the patch) a.b[j].c[i] with i being the unreasonably large offset. That's beause we look at the bit offset of a.b[j].c thus the _start_ of the array. I still find the half-address-space case totally pointless to warn about (even more to get "precise" here by subtracting the offset of the array). There's not a single testcase that looks motivating to me (like an error with some signed->unsigned conversion and then GCC magically figuring out a range you'd warn for) The diagnostic wording changes like @@ -6718,7 +6750,8 @@ check_array_ref (location_t location, tree ref, bool ignore_off_by_one) && tree_int_cst_le (low_sub, low_bound)) { warning_at (location, OPT_Warray_bounds, - "array subscript is outside array bounds"); + "array subscript [%E, %E] is outside array bounds of %qT", + low_sub, up_sub, artype); TREE_NO_WARNING (ref) = 1; } } are ok for trunk. Thanks, Richard. >> I'll go ahead with the first version >>> >>> unless you have a different suggestion. > > > But before I commit the patch as is I want to double-check to make > sure you don't have further suggestions. > > Martin > > PS For reference the last version of the patch is here: > > https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01304.html
Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices
Ping -- please see my reply below. On 10/20/2017 09:57 AM, Richard Biener wrote: get_addr_base_and_unit_offset will return NULL if there's any variable component in 'ref'. So as written it seems to be dead code (you want to pass 'arg'?) Sorry, I'm not sure I understand what you mean. What do you think is dead code? The call to get_addr_base_and_unit_offset() is also made for an array of unspecified bound (up_bound is null) and for an array at the end of a struct. For those the function returns non-null, and for the others (arrays of runtime bound) it returns null. (I passed arg instead of ref but I see no difference in my tests.) If you pass a.b.c[i] it will return NULL, if you pass a.b.c ('arg') it will return the offset of 'c'. If you pass a.b[j].c it will still return NULL. You could use get_ref_base_and_extent which will return the offset of a.b[0].c in this case and sets max_size != size - but you are only interested in offset. The disadvantage of get_ref_base_and_extent is it returns offset in bits thus if the offset is too large for a HWI you'll instead get offset == 0 and max_size == -1. Thus I'm saying this is dead code for variable array accesses (even for the array you are warning about). Yes, for constant index and at-struct-end you'll get sth, but the warning is in VRP because of variable indexes. So I suggest to pass 'arg' and use get_ref_base_and_extent for some extra precision (and possible lossage for very very large structures). Computing bit offsets defeats the out-of-bounds flexible array index detection because the computation overflows (the function sets the offset to zero). It shouldn't if you pass arg rather than ref. I suspect you missed my reply in IRC where I confirmed that this approach doesn't work for the reason you yourself mentioned above: >>> The disadvantage of get_ref_base_and_extent >>> is it returns offset in bits thus if the offset is too large >>> for a HWI you'll instead get offset == 0 and max_size == -1. This means that using the function you suggest would either prevent detecting the out-of-bounds indices that overflow due to the bit offset, thus largely defeating the purpose of the patch (to detect excessively large indices), or not printing the value of the out-of bounds index in the warning in this case, which would in turn mean further changes to the rest of the logic. I'll go ahead with the first version unless you have a different suggestion. But before I commit the patch as is I want to double-check to make sure you don't have further suggestions. Martin PS For reference the last version of the patch is here: https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01304.html
Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices
On October 20, 2017 5:43:40 PM GMT+02:00, Martin Seborwrote: >On 10/20/2017 02:08 AM, Richard Biener wrote: >> On Fri, Oct 20, 2017 at 1:00 AM, Martin Sebor >wrote: >>> On 10/19/2017 02:34 AM, Richard Biener wrote: On Thu, Oct 19, 2017 at 1:19 AM, Martin Sebor >wrote: > > On 10/18/2017 04:48 AM, Richard Biener wrote: >> >> >> On Wed, Oct 18, 2017 at 5:34 AM, Martin Sebor >wrote: >>> >>> >>> While testing my latest -Wrestrict changes I noticed a number of >>> opportunities to improve the -Warray-bounds warning. Attached >>> is a patch that implements a solution for the following subset >>> of these: >>> >>> PR tree-optimization/82596 - missing -Warray-bounds on an out-of >>> bounds index into string literal >>> PR tree-optimization/82588 - missing -Warray-bounds on an >excessively >>> large index >>> PR tree-optimization/82583 - missing -Warray-bounds on >out-of-bounds >>> inner indices >>> >>> I meant to use size_type_node (size_t), not sizetype. But I just checked that ptrdiff_type_node is initialized in build_common_tree_nodes and thus always available. >>> >>> >>> I see. Using ptrdiff_type_node is preferable for the targets >>> where ptrdiff_t has a greater precision than size_t (e.g., VMS). >>> It makes sense now. I should remember to change all the other >>> places where I introduced ssizetype to use ptrdiff_type_node. >>> > As an aside, at some point I would like to get away from a type > based limit in all these warnings and instead use one that can > be controlled by an option so that a user can impose a lower limit > on the maximum size of an object and have all size-related >warnings > (and perhaps even optimizations) enforce it and benefit from it. You could add a --param that is initialized from ptrdiff_type_node. >>> >>> >>> Yes, that's an option to consider. Thanks. >>> >>> >> + tree arg = TREE_OPERAND (ref, 0); >> + tree_code code = TREE_CODE (arg); >> + if (code == COMPONENT_REF) >> + { >> + HOST_WIDE_INT off; >> + if (tree base = get_addr_base_and_unit_offset (ref, >)) >> + up_bound_p1 = fold_build2 (MINUS_EXPR, ssizetype, >> up_bound_p1, >> + TYPE_SIZE_UNIT (TREE_TYPE >> (base))); >> + else >> + return; >> >> so this gives up on a.b[i].c.d[k] (ok, array_at_struct_end_p will >be >> false). >> simply not subtracting anyhing instead of returning would be >> conservatively >> correct, no? Likewise subtracting the offset of the array for >all >> "previous" >> variably indexed components with assuming the lowest value for >the >> index. >> But as above I think compensating for the offset of the array >within the >> object >> is academic ... ;) > > > > I was going to say yes (it gives up) but on second thought I don't > think it does. Only the major index can be unbounded and the code > does consider the size of the sub-array when checking the major > index. So, IIUC, I think this works correctly as is (*). What > doesn't work is VLAs but those are a separate problem. Let me > know if I misunderstood your question. get_addr_base_and_unit_offset will return NULL if there's any >variable component in 'ref'. So as written it seems to be dead code (you want to pass 'arg'?) >>> >>> >>> Sorry, I'm not sure I understand what you mean. What do you think >>> is dead code? The call to get_addr_base_and_unit_offset() is also >>> made for an array of unspecified bound (up_bound is null) and for >>> an array at the end of a struct. For those the function returns >>> non-null, and for the others (arrays of runtime bound) it returns >>> null. (I passed arg instead of ref but I see no difference in >>> my tests.) >> >> If you pass a.b.c[i] it will return NULL, if you pass a.b.c ('arg') >it will >> return the offset of 'c'. If you pass a.b[j].c it will still return >NULL. >> You could use get_ref_base_and_extent which will return the offset >> of a.b[0].c in this case and sets max_size != size - but you are only >> interested in offset. The disadvantage of get_ref_base_and_extent >> is it returns offset in bits thus if the offset is too large for a >HWI >> you'll instead get offset == 0 and max_size == -1. >> >> Thus I'm saying this is dead code for variable array accesses >> (even for the array you are warning about). Yes, for constant index >> and at-struct-end you'll get sth, but the warning is in VRP because >> of variable indexes. >> >> So I suggest to pass 'arg' and use get_ref_base_and_extent >> for some extra precision (and possible lossage for very very large >> structures). > >Computing bit offsets defeats
Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices
On 10/20/2017 02:08 AM, Richard Biener wrote: On Fri, Oct 20, 2017 at 1:00 AM, Martin Seborwrote: On 10/19/2017 02:34 AM, Richard Biener wrote: On Thu, Oct 19, 2017 at 1:19 AM, Martin Sebor wrote: On 10/18/2017 04:48 AM, Richard Biener wrote: On Wed, Oct 18, 2017 at 5:34 AM, Martin Sebor wrote: While testing my latest -Wrestrict changes I noticed a number of opportunities to improve the -Warray-bounds warning. Attached is a patch that implements a solution for the following subset of these: PR tree-optimization/82596 - missing -Warray-bounds on an out-of bounds index into string literal PR tree-optimization/82588 - missing -Warray-bounds on an excessively large index PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds inner indices I meant to use size_type_node (size_t), not sizetype. But I just checked that ptrdiff_type_node is initialized in build_common_tree_nodes and thus always available. I see. Using ptrdiff_type_node is preferable for the targets where ptrdiff_t has a greater precision than size_t (e.g., VMS). It makes sense now. I should remember to change all the other places where I introduced ssizetype to use ptrdiff_type_node. As an aside, at some point I would like to get away from a type based limit in all these warnings and instead use one that can be controlled by an option so that a user can impose a lower limit on the maximum size of an object and have all size-related warnings (and perhaps even optimizations) enforce it and benefit from it. You could add a --param that is initialized from ptrdiff_type_node. Yes, that's an option to consider. Thanks. + tree arg = TREE_OPERAND (ref, 0); + tree_code code = TREE_CODE (arg); + if (code == COMPONENT_REF) + { + HOST_WIDE_INT off; + if (tree base = get_addr_base_and_unit_offset (ref, )) + up_bound_p1 = fold_build2 (MINUS_EXPR, ssizetype, up_bound_p1, + TYPE_SIZE_UNIT (TREE_TYPE (base))); + else + return; so this gives up on a.b[i].c.d[k] (ok, array_at_struct_end_p will be false). simply not subtracting anyhing instead of returning would be conservatively correct, no? Likewise subtracting the offset of the array for all "previous" variably indexed components with assuming the lowest value for the index. But as above I think compensating for the offset of the array within the object is academic ... ;) I was going to say yes (it gives up) but on second thought I don't think it does. Only the major index can be unbounded and the code does consider the size of the sub-array when checking the major index. So, IIUC, I think this works correctly as is (*). What doesn't work is VLAs but those are a separate problem. Let me know if I misunderstood your question. get_addr_base_and_unit_offset will return NULL if there's any variable component in 'ref'. So as written it seems to be dead code (you want to pass 'arg'?) Sorry, I'm not sure I understand what you mean. What do you think is dead code? The call to get_addr_base_and_unit_offset() is also made for an array of unspecified bound (up_bound is null) and for an array at the end of a struct. For those the function returns non-null, and for the others (arrays of runtime bound) it returns null. (I passed arg instead of ref but I see no difference in my tests.) If you pass a.b.c[i] it will return NULL, if you pass a.b.c ('arg') it will return the offset of 'c'. If you pass a.b[j].c it will still return NULL. You could use get_ref_base_and_extent which will return the offset of a.b[0].c in this case and sets max_size != size - but you are only interested in offset. The disadvantage of get_ref_base_and_extent is it returns offset in bits thus if the offset is too large for a HWI you'll instead get offset == 0 and max_size == -1. Thus I'm saying this is dead code for variable array accesses (even for the array you are warning about). Yes, for constant index and at-struct-end you'll get sth, but the warning is in VRP because of variable indexes. So I suggest to pass 'arg' and use get_ref_base_and_extent for some extra precision (and possible lossage for very very large structures). Computing bit offsets defeats the out-of-bounds flexible array index detection because the computation overflows (the function sets the offset to zero). I'll go ahead with the first version unless you have a different suggestion. Thanks Martin Thus instead of + tree maxbound = TYPE_MAX_VALUE (ptrdiff_type_node); + + up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound, eltsize); + + tree arg = TREE_OPERAND (ref, 0); + tree_code code = TREE_CODE (arg); + if (code == COMPONENT_REF) + { + HOST_WIDE_INT off; + if (tree base = get_addr_base_and_unit_offset (arg, )) + { + tree size = TYPE_SIZE_UNIT (TREE_TYPE
Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices
On Fri, Oct 20, 2017 at 1:00 AM, Martin Seborwrote: > On 10/19/2017 02:34 AM, Richard Biener wrote: >> >> On Thu, Oct 19, 2017 at 1:19 AM, Martin Sebor wrote: >>> >>> On 10/18/2017 04:48 AM, Richard Biener wrote: On Wed, Oct 18, 2017 at 5:34 AM, Martin Sebor wrote: > > > While testing my latest -Wrestrict changes I noticed a number of > opportunities to improve the -Warray-bounds warning. Attached > is a patch that implements a solution for the following subset > of these: > > PR tree-optimization/82596 - missing -Warray-bounds on an out-of > bounds index into string literal > PR tree-optimization/82588 - missing -Warray-bounds on an excessively > large index > PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds > inner indices > > >> I meant to use size_type_node (size_t), not sizetype. But >> I just checked that ptrdiff_type_node is initialized in >> build_common_tree_nodes and thus always available. > > > I see. Using ptrdiff_type_node is preferable for the targets > where ptrdiff_t has a greater precision than size_t (e.g., VMS). > It makes sense now. I should remember to change all the other > places where I introduced ssizetype to use ptrdiff_type_node. > >> >>> As an aside, at some point I would like to get away from a type >>> based limit in all these warnings and instead use one that can >>> be controlled by an option so that a user can impose a lower limit >>> on the maximum size of an object and have all size-related warnings >>> (and perhaps even optimizations) enforce it and benefit from it. >> >> >> You could add a --param that is initialized from ptrdiff_type_node. > > > Yes, that's an option to consider. Thanks. > > >> + tree arg = TREE_OPERAND (ref, 0); + tree_code code = TREE_CODE (arg); + if (code == COMPONENT_REF) + { + HOST_WIDE_INT off; + if (tree base = get_addr_base_and_unit_offset (ref, )) + up_bound_p1 = fold_build2 (MINUS_EXPR, ssizetype, up_bound_p1, + TYPE_SIZE_UNIT (TREE_TYPE (base))); + else + return; so this gives up on a.b[i].c.d[k] (ok, array_at_struct_end_p will be false). simply not subtracting anyhing instead of returning would be conservatively correct, no? Likewise subtracting the offset of the array for all "previous" variably indexed components with assuming the lowest value for the index. But as above I think compensating for the offset of the array within the object is academic ... ;) >>> >>> >>> >>> I was going to say yes (it gives up) but on second thought I don't >>> think it does. Only the major index can be unbounded and the code >>> does consider the size of the sub-array when checking the major >>> index. So, IIUC, I think this works correctly as is (*). What >>> doesn't work is VLAs but those are a separate problem. Let me >>> know if I misunderstood your question. >> >> >> get_addr_base_and_unit_offset will return NULL if there's any variable >> component in 'ref'. So as written it seems to be dead code (you >> want to pass 'arg'?) > > > Sorry, I'm not sure I understand what you mean. What do you think > is dead code? The call to get_addr_base_and_unit_offset() is also > made for an array of unspecified bound (up_bound is null) and for > an array at the end of a struct. For those the function returns > non-null, and for the others (arrays of runtime bound) it returns > null. (I passed arg instead of ref but I see no difference in > my tests.) If you pass a.b.c[i] it will return NULL, if you pass a.b.c ('arg') it will return the offset of 'c'. If you pass a.b[j].c it will still return NULL. You could use get_ref_base_and_extent which will return the offset of a.b[0].c in this case and sets max_size != size - but you are only interested in offset. The disadvantage of get_ref_base_and_extent is it returns offset in bits thus if the offset is too large for a HWI you'll instead get offset == 0 and max_size == -1. Thus I'm saying this is dead code for variable array accesses (even for the array you are warning about). Yes, for constant index and at-struct-end you'll get sth, but the warning is in VRP because of variable indexes. So I suggest to pass 'arg' and use get_ref_base_and_extent for some extra precision (and possible lossage for very very large structures). Thus instead of + tree maxbound = TYPE_MAX_VALUE (ptrdiff_type_node); + + up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound, eltsize); + + tree arg = TREE_OPERAND (ref, 0); + tree_code code = TREE_CODE (arg); + if (code == COMPONENT_REF) + { + HOST_WIDE_INT off; + if (tree base = get_addr_base_and_unit_offset (arg, )) + { + tree size =
Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices
On 10/19/2017 02:34 AM, Richard Biener wrote: On Thu, Oct 19, 2017 at 1:19 AM, Martin Seborwrote: On 10/18/2017 04:48 AM, Richard Biener wrote: On Wed, Oct 18, 2017 at 5:34 AM, Martin Sebor wrote: While testing my latest -Wrestrict changes I noticed a number of opportunities to improve the -Warray-bounds warning. Attached is a patch that implements a solution for the following subset of these: PR tree-optimization/82596 - missing -Warray-bounds on an out-of bounds index into string literal PR tree-optimization/82588 - missing -Warray-bounds on an excessively large index PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds inner indices I meant to use size_type_node (size_t), not sizetype. But I just checked that ptrdiff_type_node is initialized in build_common_tree_nodes and thus always available. I see. Using ptrdiff_type_node is preferable for the targets where ptrdiff_t has a greater precision than size_t (e.g., VMS). It makes sense now. I should remember to change all the other places where I introduced ssizetype to use ptrdiff_type_node. As an aside, at some point I would like to get away from a type based limit in all these warnings and instead use one that can be controlled by an option so that a user can impose a lower limit on the maximum size of an object and have all size-related warnings (and perhaps even optimizations) enforce it and benefit from it. You could add a --param that is initialized from ptrdiff_type_node. Yes, that's an option to consider. Thanks. + tree arg = TREE_OPERAND (ref, 0); + tree_code code = TREE_CODE (arg); + if (code == COMPONENT_REF) + { + HOST_WIDE_INT off; + if (tree base = get_addr_base_and_unit_offset (ref, )) + up_bound_p1 = fold_build2 (MINUS_EXPR, ssizetype, up_bound_p1, + TYPE_SIZE_UNIT (TREE_TYPE (base))); + else + return; so this gives up on a.b[i].c.d[k] (ok, array_at_struct_end_p will be false). simply not subtracting anyhing instead of returning would be conservatively correct, no? Likewise subtracting the offset of the array for all "previous" variably indexed components with assuming the lowest value for the index. But as above I think compensating for the offset of the array within the object is academic ... ;) I was going to say yes (it gives up) but on second thought I don't think it does. Only the major index can be unbounded and the code does consider the size of the sub-array when checking the major index. So, IIUC, I think this works correctly as is (*). What doesn't work is VLAs but those are a separate problem. Let me know if I misunderstood your question. get_addr_base_and_unit_offset will return NULL if there's any variable component in 'ref'. So as written it seems to be dead code (you want to pass 'arg'?) Sorry, I'm not sure I understand what you mean. What do you think is dead code? The call to get_addr_base_and_unit_offset() is also made for an array of unspecified bound (up_bound is null) and for an array at the end of a struct. For those the function returns non-null, and for the others (arrays of runtime bound) it returns null. (I passed arg instead of ref but I see no difference in my tests.) I was asking you to remove the 'else return' because w/o subtracting the upper bound is just more conservative. Sure, that sounds good. Attached is an updated patch. Thanks Martin PR tree-optimization/82588 - missing -Warray-bounds on a excessively large index PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds inner indic gcc/ChangeLog: PR tree-optimization/82588 PR tree-optimization/82583 * tree-vrp.c (check_array_ref): Handle flexible array members, string literals, and inner indices. (search_for_addr_array): Add detail to diagnostics. gcc/testsuite/ChangeLog: PR tree-optimization/82588 PR tree-optimization/82583 * c-c++-common/Warray-bounds.c: New test. * gcc.dg/Warray-bounds-11.c: Adjust. * gcc.dg/Warray-bounds-22.c: New test. diff --git a/gcc/testsuite/c-c++-common/Warray-bounds.c b/gcc/testsuite/c-c++-common/Warray-bounds.c new file mode 100644 index 000..2207999 --- /dev/null +++ b/gcc/testsuite/c-c++-common/Warray-bounds.c @@ -0,0 +1,240 @@ +/* PR tree-optimization/82588 - missing -Warray-bounds on an excessively + large index + { dg-do compile } + { dg-options "-O2 -Warray-bounds -ftrack-macro-expansion=0" } */ + +#define SIZE_MAX __SIZE_MAX__ +#define SSIZE_MAX __PTRDIFF_MAX__ +#define SSIZE_MIN (-SSIZE_MAX - 1) + +typedef __PTRDIFF_TYPE__ ssize_t; +typedef __SIZE_TYPE__size_t; + +extern ssize_t signed_value (void) +{ + extern volatile ssize_t signed_value_source; + return signed_value_source; +} + +extern size_t unsigned_value (void) +{ + extern volatile size_t unsigned_value_source; + return unsigned_value_source; +} + +ssize_t signed_range (ssize_t min, ssize_t max) +{
Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices
Good question! STRING_CST does have a domain. The problem is that array_at_struct_end_p() returns true for STRING_CST. I've added the handling to the function and removed the block above from the latest patch. Can you split out the STRING_CST handling and commit that separately (split the testcase)? That part looks ok. I've committed r253902. It turns out, however, that this subset of the patch doesn't fix the whole problem. What's still missing is the handling of: int g (int i) { return (i < 0 ? ABC : DEF)[7]; // missing -Warray-bounds } Surprisingly, this happens to work in C++ but not in C (which is in contrast to bug 82609 where it's the other way around). The root cause is that the expression is represented as a MEM_REF(ADDR_EXPR (STRING_CST)) and check_array_bounds() only considers ARRAY_REF and ADDR_EXPR. I will submit a separate patch for that, perhaps along with one for bug 82612 that you commented on this morning (missing -Warray-bounds on a non-zero offset from the address of a non- array object). Martin
Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices
On Thu, Oct 19, 2017 at 1:19 AM, Martin Seborwrote: > On 10/18/2017 04:48 AM, Richard Biener wrote: >> >> On Wed, Oct 18, 2017 at 5:34 AM, Martin Sebor wrote: >>> >>> While testing my latest -Wrestrict changes I noticed a number of >>> opportunities to improve the -Warray-bounds warning. Attached >>> is a patch that implements a solution for the following subset >>> of these: >>> >>> PR tree-optimization/82596 - missing -Warray-bounds on an out-of >>> bounds index into string literal >>> PR tree-optimization/82588 - missing -Warray-bounds on an excessively >>> large index >>> PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds >>> inner indices >>> >>> The patch also adds more detail to the -Warray-bounds diagnostics >>> to make it easier to see the cause of the problem. >>> >>> Richard, since the changes touch tree-vrp.c, I look in particular >>> for your comments. >> >> >> + /* Accesses to trailing arrays via pointers may access storage >> +beyond the types array bounds. For such arrays, or for flexible >> +array members as well as for other arrays of an unknown size, >> +replace the upper bound with a more permissive one that assumes >> +the size of the largest object is SSIZE_MAX. */ >> >> I believe handling those cases are somewhat academic, but ... > > > Thanks for the quick review! > > I agree the SSIZE_MAX tests handle corner cases and there are > arguably more important gaps here to plug (e.g., VLAs). Then > again, most bugs tend to lurk in corner cases of one kind or > another and these seemed like a good way for me to come up to > speed on the implementation before tackling those. If you > have suggestions for which to dive into next I'm happy to take > them. > >> + tree eltype = TREE_TYPE (ref); >> + tree eltsize = TYPE_SIZE_UNIT (eltype); >> >> needs to use array_ref_element_size. Note that this size can be >> non-constant in which case you now ICE so you have to check >> this is an INTEGER_CST. > > > Thanks. I did reproduce a few ICEs due to VLAs. I've fixed > the problems and added tests for them. One-dimensional VLAs > are now handled the same way arrays of unknown bound are. > Handling them more intelligently (i.e., taking into account > the ranges their bounds are in) and especially handling > multidimensional VLAs will take some effort. > >> >> + tree maxbound = TYPE_MAX_VALUE (ssizetype); >> >> please don't use ssizetype - sizetype can be of different precision >> than pointers (and thus objects). size_type_node would come >> close (maps to size_t), eventually ptrdiff_type_node is also >> defined by all frontends. > > > Okay, I've changed it to sizetype (it would be nice if there > were a cleaner way of doing it rather than by bit-twiddling.) > > ptrdiff_type would have been my first choice but it's only defined > by the C/C++ front ends and not available in the middle-end, so > the other warnings that consider object sizes deal with ssizetype > (e.g., -Walloc-size-larger-than, -Wstringop- overflow, and > -Wformat-overflow). > > That said, I'm not sure I understand under what conditions > ssizetype is not the signed equivalent of sizetype. Can you > clarify? I meant to use size_type_node (size_t), not sizetype. But I just checked that ptrdiff_type_node is initialized in build_common_tree_nodes and thus always available. > As an aside, at some point I would like to get away from a type > based limit in all these warnings and instead use one that can > be controlled by an option so that a user can impose a lower limit > on the maximum size of an object and have all size-related warnings > (and perhaps even optimizations) enforce it and benefit from it. You could add a --param that is initialized from ptrdiff_type_node. >> + up_bound_p1 = fold_build2 (TRUNC_DIV_EXPR, ssizetype, maxbound, >> eltsize); >> + >> >> int_const_binop if you insist on using trees... > > > Sure. (I think offset_int would be more convenient but the rest > of the function uses trees so I just stuck to those to avoid > converting things back and forth or disrupting more of the code > than I had to.) Understood. >> >> + tree arg = TREE_OPERAND (ref, 0); >> + tree_code code = TREE_CODE (arg); >> + if (code == COMPONENT_REF) >> + { >> + HOST_WIDE_INT off; >> + if (tree base = get_addr_base_and_unit_offset (ref, )) >> + up_bound_p1 = fold_build2 (MINUS_EXPR, ssizetype, up_bound_p1, >> + TYPE_SIZE_UNIT (TREE_TYPE (base))); >> + else >> + return; >> >> so this gives up on a.b[i].c.d[k] (ok, array_at_struct_end_p will be >> false). >> simply not subtracting anyhing instead of returning would be >> conservatively >> correct, no? Likewise subtracting the offset of the array for all >> "previous" >> variably indexed components with assuming the lowest value for the index. >> But as above I
Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices
On 10/18/2017 04:48 AM, Richard Biener wrote: On Wed, Oct 18, 2017 at 5:34 AM, Martin Seborwrote: While testing my latest -Wrestrict changes I noticed a number of opportunities to improve the -Warray-bounds warning. Attached is a patch that implements a solution for the following subset of these: PR tree-optimization/82596 - missing -Warray-bounds on an out-of bounds index into string literal PR tree-optimization/82588 - missing -Warray-bounds on an excessively large index PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds inner indices The patch also adds more detail to the -Warray-bounds diagnostics to make it easier to see the cause of the problem. Richard, since the changes touch tree-vrp.c, I look in particular for your comments. + /* Accesses to trailing arrays via pointers may access storage +beyond the types array bounds. For such arrays, or for flexible +array members as well as for other arrays of an unknown size, +replace the upper bound with a more permissive one that assumes +the size of the largest object is SSIZE_MAX. */ I believe handling those cases are somewhat academic, but ... Thanks for the quick review! I agree the SSIZE_MAX tests handle corner cases and there are arguably more important gaps here to plug (e.g., VLAs). Then again, most bugs tend to lurk in corner cases of one kind or another and these seemed like a good way for me to come up to speed on the implementation before tackling those. If you have suggestions for which to dive into next I'm happy to take them. + tree eltype = TREE_TYPE (ref); + tree eltsize = TYPE_SIZE_UNIT (eltype); needs to use array_ref_element_size. Note that this size can be non-constant in which case you now ICE so you have to check this is an INTEGER_CST. Thanks. I did reproduce a few ICEs due to VLAs. I've fixed the problems and added tests for them. One-dimensional VLAs are now handled the same way arrays of unknown bound are. Handling them more intelligently (i.e., taking into account the ranges their bounds are in) and especially handling multidimensional VLAs will take some effort. + tree maxbound = TYPE_MAX_VALUE (ssizetype); please don't use ssizetype - sizetype can be of different precision than pointers (and thus objects). size_type_node would come close (maps to size_t), eventually ptrdiff_type_node is also defined by all frontends. Okay, I've changed it to sizetype (it would be nice if there were a cleaner way of doing it rather than by bit-twiddling.) ptrdiff_type would have been my first choice but it's only defined by the C/C++ front ends and not available in the middle-end, so the other warnings that consider object sizes deal with ssizetype (e.g., -Walloc-size-larger-than, -Wstringop- overflow, and -Wformat-overflow). That said, I'm not sure I understand under what conditions ssizetype is not the signed equivalent of sizetype. Can you clarify? As an aside, at some point I would like to get away from a type based limit in all these warnings and instead use one that can be controlled by an option so that a user can impose a lower limit on the maximum size of an object and have all size-related warnings (and perhaps even optimizations) enforce it and benefit from it. + up_bound_p1 = fold_build2 (TRUNC_DIV_EXPR, ssizetype, maxbound, eltsize); + int_const_binop if you insist on using trees... Sure. (I think offset_int would be more convenient but the rest of the function uses trees so I just stuck to those to avoid converting things back and forth or disrupting more of the code than I had to.) + tree arg = TREE_OPERAND (ref, 0); + tree_code code = TREE_CODE (arg); + if (code == COMPONENT_REF) + { + HOST_WIDE_INT off; + if (tree base = get_addr_base_and_unit_offset (ref, )) + up_bound_p1 = fold_build2 (MINUS_EXPR, ssizetype, up_bound_p1, + TYPE_SIZE_UNIT (TREE_TYPE (base))); + else + return; so this gives up on a.b[i].c.d[k] (ok, array_at_struct_end_p will be false). simply not subtracting anyhing instead of returning would be conservatively correct, no? Likewise subtracting the offset of the array for all "previous" variably indexed components with assuming the lowest value for the index. But as above I think compensating for the offset of the array within the object is academic ... ;) I was going to say yes (it gives up) but on second thought I don't think it does. Only the major index can be unbounded and the code does consider the size of the sub-array when checking the major index. So, IIUC, I think this works correctly as is (*). What doesn't work is VLAs but those are a separate problem. Let me know if I misunderstood your question. + else if (code == STRING_CST) + up_bound_p1 = build_int_cst (ssizetype, TREE_STRING_LENGTH (arg)); that one is more interesting -- why's
Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices
On Wed, Oct 18, 2017 at 5:34 AM, Martin Seborwrote: > While testing my latest -Wrestrict changes I noticed a number of > opportunities to improve the -Warray-bounds warning. Attached > is a patch that implements a solution for the following subset > of these: > > PR tree-optimization/82596 - missing -Warray-bounds on an out-of > bounds index into string literal > PR tree-optimization/82588 - missing -Warray-bounds on an excessively > large index > PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds > inner indices > > The patch also adds more detail to the -Warray-bounds diagnostics > to make it easier to see the cause of the problem. > > Richard, since the changes touch tree-vrp.c, I look in particular > for your comments. + /* Accesses to trailing arrays via pointers may access storage +beyond the types array bounds. For such arrays, or for flexible +array members as well as for other arrays of an unknown size, +replace the upper bound with a more permissive one that assumes +the size of the largest object is SSIZE_MAX. */ I believe handling those cases are somewhat academic, but ... + tree eltype = TREE_TYPE (ref); + tree eltsize = TYPE_SIZE_UNIT (eltype); needs to use array_ref_element_size. Note that this size can be non-constant in which case you now ICE so you have to check this is an INTEGER_CST. + tree maxbound = TYPE_MAX_VALUE (ssizetype); please don't use ssizetype - sizetype can be of different precision than pointers (and thus objects). size_type_node would come close (maps to size_t), eventually ptrdiff_type_node is also defined by all frontends. + up_bound_p1 = fold_build2 (TRUNC_DIV_EXPR, ssizetype, maxbound, eltsize); + int_const_binop if you insist on using trees... + tree arg = TREE_OPERAND (ref, 0); + tree_code code = TREE_CODE (arg); + if (code == COMPONENT_REF) + { + HOST_WIDE_INT off; + if (tree base = get_addr_base_and_unit_offset (ref, )) + up_bound_p1 = fold_build2 (MINUS_EXPR, ssizetype, up_bound_p1, + TYPE_SIZE_UNIT (TREE_TYPE (base))); + else + return; so this gives up on a.b[i].c.d[k] (ok, array_at_struct_end_p will be false). simply not subtracting anyhing instead of returning would be conservatively correct, no? Likewise subtracting the offset of the array for all "previous" variably indexed components with assuming the lowest value for the index. But as above I think compensating for the offset of the array within the object is academic ... ;) + else if (code == STRING_CST) + up_bound_p1 = build_int_cst (ssizetype, TREE_STRING_LENGTH (arg)); that one is more interesting -- why's the TYPE_DOMAIN of the STRING_CST lacking a max value? Iff we use build_string_literal it should have the proper type. Thanks, Richard. > > Thanks > Martin
[PATCH] enhance -Warray-bounds to handle strings and excessive indices
While testing my latest -Wrestrict changes I noticed a number of opportunities to improve the -Warray-bounds warning. Attached is a patch that implements a solution for the following subset of these: PR tree-optimization/82596 - missing -Warray-bounds on an out-of bounds index into string literal PR tree-optimization/82588 - missing -Warray-bounds on an excessively large index PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds inner indices The patch also adds more detail to the -Warray-bounds diagnostics to make it easier to see the cause of the problem. Richard, since the changes touch tree-vrp.c, I look in particular for your comments. Thanks Martin PR tree-optimization/82596 - missing -Warray-bounds on an out-of-bounds index into string literal PR tree-optimization/82588 - missing -Warray-bounds on a excessively large index PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds inner indic gcc/ChangeLog: PR tree-optimization/82596 PR tree-optimization/82588 PR tree-optimization/82583 * tree-vrp.c (check_array_ref): Handle flexible array members, string literals, and inner indices. (search_for_addr_array): Add detail to diagnostics. gcc/testsuite/ChangeLog: PR tree-optimization/82596 PR tree-optimization/82588 PR tree-optimization/82583 * c-c++-common/Warray-bounds.c: New test. * gcc.dg/Warray-bounds-11.c: Adjust. diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index 2c86b8e..88cce15 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -42,6 +42,7 @@ along with GCC; see the file COPYING3. If not see #include "gimple-iterator.h" #include "gimple-walk.h" #include "tree-cfg.h" +#include "tree-dfa.h" #include "tree-ssa-loop-manip.h" #include "tree-ssa-loop-niter.h" #include "tree-ssa-loop.h" @@ -64,6 +65,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-cfgcleanup.h" #include "stringpool.h" #include "attribs.h" +#include "builtins.h" #define VR_INITIALIZER { VR_UNDEFINED, NULL_TREE, NULL_TREE, NULL } @@ -6675,26 +6677,51 @@ check_array_ref (location_t location, tree ref, bool ignore_off_by_one) low_sub = up_sub = TREE_OPERAND (ref, 1); up_bound = array_ref_up_bound (ref); - /* Can not check flexible arrays. */ if (!up_bound - || TREE_CODE (up_bound) != INTEGER_CST) -return; + || (warn_array_bounds < 2 + && array_at_struct_end_p (ref))) +{ + /* Accesses to trailing arrays via pointers may access storage + beyond the types array bounds. For such arrays, or for flexible + array members as well as for other arrays of an unknown size, + replace the upper bound with a more permissive one that assumes + the size of the largest object is SSIZE_MAX. */ + tree eltype = TREE_TYPE (ref); + tree eltsize = TYPE_SIZE_UNIT (eltype); + tree maxbound = TYPE_MAX_VALUE (ssizetype); + up_bound_p1 = fold_build2 (TRUNC_DIV_EXPR, ssizetype, maxbound, eltsize); + + tree arg = TREE_OPERAND (ref, 0); + tree_code code = TREE_CODE (arg); + if (code == COMPONENT_REF) + { + HOST_WIDE_INT off; + if (tree base = get_addr_base_and_unit_offset (ref, )) + up_bound_p1 = fold_build2 (MINUS_EXPR, ssizetype, up_bound_p1, + TYPE_SIZE_UNIT (TREE_TYPE (base))); + else + return; + } + else if (code == STRING_CST) + up_bound_p1 = build_int_cst (ssizetype, TREE_STRING_LENGTH (arg)); - /* Accesses to trailing arrays via pointers may access storage - beyond the types array bounds. */ - if (warn_array_bounds < 2 - && array_at_struct_end_p (ref)) -return; + up_bound = int_const_binop (MINUS_EXPR, up_bound_p1, + build_int_cst (ssizetype, 1)); +} + else +up_bound_p1 = int_const_binop (PLUS_EXPR, up_bound, + build_int_cst (TREE_TYPE (up_bound), 1)); low_bound = array_ref_low_bound (ref); - up_bound_p1 = int_const_binop (PLUS_EXPR, up_bound, - build_int_cst (TREE_TYPE (up_bound), 1)); + + tree artype = TREE_TYPE (TREE_OPERAND (ref, 0)); /* Empty array. */ if (tree_int_cst_equal (low_bound, up_bound_p1)) { warning_at (location, OPT_Warray_bounds, - "array subscript is above array bounds"); + "array subscript %E is above array bounds of %qT", + low_bound, artype); TREE_NO_WARNING (ref) = 1; } @@ -6718,7 +6745,8 @@ check_array_ref (location_t location, tree ref, bool ignore_off_by_one) && tree_int_cst_le (low_sub, low_bound)) { warning_at (location, OPT_Warray_bounds, - "array subscript is outside array bounds"); + "array subscript [%E, %E] is outside array bounds of %qT", + low_sub, up_sub, artype); TREE_NO_WARNING (ref) = 1; } } @@ -6734,7 +6762,8 @@ check_array_ref (location_t location, tree ref, bool ignore_off_by_one) fprintf (dump_file, "\n"); } warning_at (location, OPT_Warray_bounds, - "array subscript is above array bounds"); + "array subscript %E is above array