Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)
On 12/14/2017 12:04 PM, Jeff Law wrote: On 12/14/2017 11:55 AM, Jakub Jelinek wrote: On Thu, Dec 14, 2017 at 11:51:26AM -0700, Martin Sebor wrote: Well, it would be nice to get sanitizers diagnose this at runtime. If we know the array length at compile time, simply compare after the strlen call the result and fail if it returns something above it. Or replace the strlen call with strnlen for the compile time known size and add instrumentation if strnlen returns the second argument. Sure, that sounds like a useful enhancement. I'll look into adding it as a follow-on patch unless you feel that it needs to be part of the same package. The problem is if we'll need changes to libubsan for that (which we'll likely do), then those need to be upstreamed, and e.g. my attempts to upstream simple patch to diagnose noreturn function returns is suspended upstream because clang doesn't have that support (and I have no interest in adding to to clang). In theory we could have some GCC only file in there, but then we'd be ABI incompatible with them. So defer the sanitization side until Clang catches up? I've committed the patch as is in r255790. If I have some spare cycles I'll see if the instrumentation is possible without changing libubsan. Martin
Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)
On 12/15/2017 10:29 AM, Martin Sebor wrote: On 12/15/2017 09:17 AM, Richard Biener wrote: On December 15, 2017 4:58:14 PM GMT+01:00, Martin Seborwrote: On 12/15/2017 01:48 AM, Richard Biener wrote: On Thu, Dec 14, 2017 at 5:01 PM, Martin Sebor wrote: On 12/14/2017 03:43 AM, Richard Biener wrote: On Wed, Dec 13, 2017 at 4:47 AM, Martin Sebor wrote: On 12/12/2017 05:35 PM, Jeff Law wrote: On 12/12/2017 01:15 PM, Martin Sebor wrote: Bug 83373 - False positive reported by -Wstringop-overflow, is another example of warning triggered by a missed optimization opportunity, this time in the strlen pass. The optimization is discussed in pr78450 - strlen(s) return value can be assumed to be less than the size of s. The gist of it is that the result of strlen(array) can be assumed to be less than the size of the array (except in the corner case of last struct members). To avoid the false positive the attached patch adds this optimization to the strlen pass. Although the patch passes bootstrap and regression tests for all front-ends I'm not sure the way it determines the upper bound of the range is 100% correct for languages with arrays with a non-zero lower bound. Maybe it's just not as tight as it could be. What about something hideous like struct fu { char x1[10]; char x2[10]; int avoid_trailing_array; } Where objects stored in x1 are not null terminated. Are we in the realm of undefined behavior at that point (I hope so)? Yes, this is undefined. Pointer arithmetic (either direct or via standard library functions) is only defined for pointers to the same object or subobject. So even something like memcpy (pfu->x1, pfu->x1 + 10, 10); is undefined. There's nothing undefined here - computing the pointer pointing to one-after-the-last element of an array is valid (you are just not allowed to dereference it). Right, and memcpy dereferences it, so it's undefined. That's interpretation of the standard that I don't share. It's not an interpretation. It's a basic rule of the languages that the standards are explicit about. In C11 you will find this specified in detail in 6.5.6, paragraph 7 and 8 (of particular relevance to your question below is p7: "a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one.") I know. Also, if I have struct f { int i; int j; }; and a int * that points to the j member you say I have no standard conforming way to get at a pointer to the i member from this, right? Correct. See above. Because the pointer points to an 'int' object. But it also points within a struct f object! So at least maybe (int *)((char *)p - offsetof (struct f, j)) should be valid? No, not really. It works in practice but it's not well-defined. It doesn't matter how you get at the result. What matters is what you start with. As Jeff said, to derive a pointer to distinct suobjects of a larger object you need to start with a pointer to the larger object and treat it as an array of chars. That's obviously not constraints people use C and C++ with so I see no way to enforce this within gimple. There's code out there that relies on all sorts of undefined behavior. It's a judgment call in each instance as to how much of such code exists and how important it is. In this case, I'd expect it to be confined to low-level software like OS kernels and such whose authors use C as a more convenient assembly language to talk directly to the hardware. Programmers in other domains are usually more conscious of the requirements and limited guarantees of the language and less willing to make assumptions based on what this or that processor lets them get away with. That being said, it certainly is possible to enforce this constraint within GIMPLE. My -Wrestrict patch does it to an extent for the memory and string built-ins. The -Warray-bounds patch I submitted for offsets does it for all other expressions. Neither patch exposed any such code in the Linux kernel, so it doesn't look like abuses of this sort are common even in low-level code. Let me correct myself on this last point. I forgot I had disabled the offset checking for memcpy. With it enabled, the enhanced -Warray-bounds warning finds about a dozen distinct instances of memcpy calls in the Linux kernel that cross the subobject boundary. Most of them involve struct ieee80211_hdr and its adjacent unsigned char array members. I think it would be worthwhile to do a more careful review and, if this turns out to be the only (or predominant) use case, consider tightening up GCC checkers to allow only it but not cases involving members of any other types (especially non-arrays). The most troublesome of those being variations on: struct S { ... char array[N]; void (*pfunc)(void); ... }; extern struct S *ps; memcpy (ps->array, data, N + X);
Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)
On 12/15/2017 09:17 AM, Richard Biener wrote: On December 15, 2017 4:58:14 PM GMT+01:00, Martin Seborwrote: On 12/15/2017 01:48 AM, Richard Biener wrote: On Thu, Dec 14, 2017 at 5:01 PM, Martin Sebor wrote: On 12/14/2017 03:43 AM, Richard Biener wrote: On Wed, Dec 13, 2017 at 4:47 AM, Martin Sebor wrote: On 12/12/2017 05:35 PM, Jeff Law wrote: On 12/12/2017 01:15 PM, Martin Sebor wrote: Bug 83373 - False positive reported by -Wstringop-overflow, is another example of warning triggered by a missed optimization opportunity, this time in the strlen pass. The optimization is discussed in pr78450 - strlen(s) return value can be assumed to be less than the size of s. The gist of it is that the result of strlen(array) can be assumed to be less than the size of the array (except in the corner case of last struct members). To avoid the false positive the attached patch adds this optimization to the strlen pass. Although the patch passes bootstrap and regression tests for all front-ends I'm not sure the way it determines the upper bound of the range is 100% correct for languages with arrays with a non-zero lower bound. Maybe it's just not as tight as it could be. What about something hideous like struct fu { char x1[10]; char x2[10]; int avoid_trailing_array; } Where objects stored in x1 are not null terminated. Are we in the realm of undefined behavior at that point (I hope so)? Yes, this is undefined. Pointer arithmetic (either direct or via standard library functions) is only defined for pointers to the same object or subobject. So even something like memcpy (pfu->x1, pfu->x1 + 10, 10); is undefined. There's nothing undefined here - computing the pointer pointing to one-after-the-last element of an array is valid (you are just not allowed to dereference it). Right, and memcpy dereferences it, so it's undefined. That's interpretation of the standard that I don't share. It's not an interpretation. It's a basic rule of the languages that the standards are explicit about. In C11 you will find this specified in detail in 6.5.6, paragraph 7 and 8 (of particular relevance to your question below is p7: "a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one.") I know. Also, if I have struct f { int i; int j; }; and a int * that points to the j member you say I have no standard conforming way to get at a pointer to the i member from this, right? Correct. See above. Because the pointer points to an 'int' object. But it also points within a struct f object! So at least maybe (int *)((char *)p - offsetof (struct f, j)) should be valid? No, not really. It works in practice but it's not well-defined. It doesn't matter how you get at the result. What matters is what you start with. As Jeff said, to derive a pointer to distinct suobjects of a larger object you need to start with a pointer to the larger object and treat it as an array of chars. That's obviously not constraints people use C and C++ with so I see no way to enforce this within gimple. There's code out there that relies on all sorts of undefined behavior. It's a judgment call in each instance as to how much of such code exists and how important it is. In this case, I'd expect it to be confined to low-level software like OS kernels and such whose authors use C as a more convenient assembly language to talk directly to the hardware. Programmers in other domains are usually more conscious of the requirements and limited guarantees of the language and less willing to make assumptions based on what this or that processor lets them get away with. That being said, it certainly is possible to enforce this constraint within GIMPLE. My -Wrestrict patch does it to an extent for the memory and string built-ins. The -Warray-bounds patch I submitted for offsets does it for all other expressions. Neither patch exposed any such code in the Linux kernel, so it doesn't look like abuses of this sort are common even in low-level code. Martin
Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)
On December 15, 2017 4:58:14 PM GMT+01:00, Martin Seborwrote: >On 12/15/2017 01:48 AM, Richard Biener wrote: >> On Thu, Dec 14, 2017 at 5:01 PM, Martin Sebor >wrote: >>> On 12/14/2017 03:43 AM, Richard Biener wrote: On Wed, Dec 13, 2017 at 4:47 AM, Martin Sebor >wrote: > > On 12/12/2017 05:35 PM, Jeff Law wrote: >> >> >> On 12/12/2017 01:15 PM, Martin Sebor wrote: >>> >>> >>> Bug 83373 - False positive reported by -Wstringop-overflow, is >>> another example of warning triggered by a missed optimization >>> opportunity, this time in the strlen pass. The optimization >>> is discussed in pr78450 - strlen(s) return value can be assumed >>> to be less than the size of s. The gist of it is that the >result >>> of strlen(array) can be assumed to be less than the size of >>> the array (except in the corner case of last struct members). >>> >>> To avoid the false positive the attached patch adds this >>> optimization to the strlen pass. Although the patch passes >>> bootstrap and regression tests for all front-ends I'm not sure >>> the way it determines the upper bound of the range is 100% >>> correct for languages with arrays with a non-zero lower bound. >>> Maybe it's just not as tight as it could be. >> >> >> What about something hideous like >> >> struct fu { >> char x1[10]; >> char x2[10]; >> int avoid_trailing_array; >> } >> >> Where objects stored in x1 are not null terminated. Are we in >the realm >> of undefined behavior at that point (I hope so)? > > > > Yes, this is undefined. Pointer arithmetic (either direct or > via standard library functions) is only defined for pointers > to the same object or subobject. So even something like > > memcpy (pfu->x1, pfu->x1 + 10, 10); > > is undefined. There's nothing undefined here - computing the pointer pointing to one-after-the-last element of an array is valid (you are just not allowed to dereference it). >>> >>> >>> Right, and memcpy dereferences it, so it's undefined. >> >> That's interpretation of the standard that I don't share. > >It's not an interpretation. It's a basic rule of the languages >that the standards are explicit about. In C11 you will find >this specified in detail in 6.5.6, paragraph 7 and 8 (of >particular relevance to your question below is p7: "a pointer >to an object that is not an element of an array behaves the same >as a pointer to the first element of an array of length one.") I know. >> Also, if I have struct f { int i; int j; }; and a int * that points >> to the j member you say I have no standard conforming way >> to get at a pointer to the i member from this, right? > >Correct. See above. > >> Because >> the pointer points to an 'int' object. But it also points within >> a struct f object! So at least maybe (int *)((char *)p - offsetof >> (struct f, j)) >> should be valid? > >No, not really. It works in practice but it's not well-defined. >It doesn't matter how you get at the result. What matters is >what you start with. As Jeff said, to derive a pointer to >distinct suobjects of a larger object you need to start with >a pointer to the larger object and treat it as an array of >chars. That's obviously not constraints people use C and C++ with so I see no way to enforce this within gimple. >> This means that pfu->x1 + 10 is a valid pointer >> into *pfu no matter what you say and you can dereference it. > >No. > >As another hopefully more convincing example consider a multi- >dimensional array A[2][2]. The value of the offset of A[i][j] >is sizeof A[i] + j. With that, the offset of A[1][0] is >sizeof A[1] + 0, and so would be the offset of A[0][2]. But >that doesn't make A[0][2] a valid reference to an element of >A (because A[0] has only two elements, A[0][0] and A[0][1]), >or [0] + 2 a derefernceable pointer. It's a pointer that >points just past the last element of the array A[0]. That >there's another array right after A[0] (namely A[1]) is >immaterial, same as in the struct f example above. I know. Dependence analysis relies on this. We've had bugs in the past with gcc itself introducing such bogus references. Richard. > >Martin
Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)
On 12/15/2017 01:48 AM, Richard Biener wrote: On Thu, Dec 14, 2017 at 5:01 PM, Martin Seborwrote: On 12/14/2017 03:43 AM, Richard Biener wrote: On Wed, Dec 13, 2017 at 4:47 AM, Martin Sebor wrote: On 12/12/2017 05:35 PM, Jeff Law wrote: On 12/12/2017 01:15 PM, Martin Sebor wrote: Bug 83373 - False positive reported by -Wstringop-overflow, is another example of warning triggered by a missed optimization opportunity, this time in the strlen pass. The optimization is discussed in pr78450 - strlen(s) return value can be assumed to be less than the size of s. The gist of it is that the result of strlen(array) can be assumed to be less than the size of the array (except in the corner case of last struct members). To avoid the false positive the attached patch adds this optimization to the strlen pass. Although the patch passes bootstrap and regression tests for all front-ends I'm not sure the way it determines the upper bound of the range is 100% correct for languages with arrays with a non-zero lower bound. Maybe it's just not as tight as it could be. What about something hideous like struct fu { char x1[10]; char x2[10]; int avoid_trailing_array; } Where objects stored in x1 are not null terminated. Are we in the realm of undefined behavior at that point (I hope so)? Yes, this is undefined. Pointer arithmetic (either direct or via standard library functions) is only defined for pointers to the same object or subobject. So even something like memcpy (pfu->x1, pfu->x1 + 10, 10); is undefined. There's nothing undefined here - computing the pointer pointing to one-after-the-last element of an array is valid (you are just not allowed to dereference it). Right, and memcpy dereferences it, so it's undefined. That's interpretation of the standard that I don't share. It's not an interpretation. It's a basic rule of the languages that the standards are explicit about. In C11 you will find this specified in detail in 6.5.6, paragraph 7 and 8 (of particular relevance to your question below is p7: "a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one.") Also, if I have struct f { int i; int j; }; and a int * that points to the j member you say I have no standard conforming way to get at a pointer to the i member from this, right? Correct. See above. Because the pointer points to an 'int' object. But it also points within a struct f object! So at least maybe (int *)((char *)p - offsetof (struct f, j)) should be valid? No, not really. It works in practice but it's not well-defined. It doesn't matter how you get at the result. What matters is what you start with. As Jeff said, to derive a pointer to distinct suobjects of a larger object you need to start with a pointer to the larger object and treat it as an array of chars. This means that pfu->x1 + 10 is a valid pointer into *pfu no matter what you say and you can dereference it. No. As another hopefully more convincing example consider a multi- dimensional array A[2][2]. The value of the offset of A[i][j] is sizeof A[i] + j. With that, the offset of A[1][0] is sizeof A[1] + 0, and so would be the offset of A[0][2]. But that doesn't make A[0][2] a valid reference to an element of A (because A[0] has only two elements, A[0][0] and A[0][1]), or [0] + 2 a derefernceable pointer. It's a pointer that points just past the last element of the array A[0]. That there's another array right after A[0] (namely A[1]) is immaterial, same as in the struct f example above. Martin
Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)
On Thu, Dec 14, 2017 at 5:01 PM, Martin Seborwrote: > On 12/14/2017 03:43 AM, Richard Biener wrote: >> >> On Wed, Dec 13, 2017 at 4:47 AM, Martin Sebor wrote: >>> >>> On 12/12/2017 05:35 PM, Jeff Law wrote: On 12/12/2017 01:15 PM, Martin Sebor wrote: > > > Bug 83373 - False positive reported by -Wstringop-overflow, is > another example of warning triggered by a missed optimization > opportunity, this time in the strlen pass. The optimization > is discussed in pr78450 - strlen(s) return value can be assumed > to be less than the size of s. The gist of it is that the result > of strlen(array) can be assumed to be less than the size of > the array (except in the corner case of last struct members). > > To avoid the false positive the attached patch adds this > optimization to the strlen pass. Although the patch passes > bootstrap and regression tests for all front-ends I'm not sure > the way it determines the upper bound of the range is 100% > correct for languages with arrays with a non-zero lower bound. > Maybe it's just not as tight as it could be. What about something hideous like struct fu { char x1[10]; char x2[10]; int avoid_trailing_array; } Where objects stored in x1 are not null terminated. Are we in the realm of undefined behavior at that point (I hope so)? >>> >>> >>> >>> Yes, this is undefined. Pointer arithmetic (either direct or >>> via standard library functions) is only defined for pointers >>> to the same object or subobject. So even something like >>> >>> memcpy (pfu->x1, pfu->x1 + 10, 10); >>> >>> is undefined. >> >> >> There's nothing undefined here - computing the pointer pointing >> to one-after-the-last element of an array is valid (you are just >> not allowed to dereference it). > > > Right, and memcpy dereferences it, so it's undefined. That's interpretation of the standard that I don't share. Also, if I have struct f { int i; int j; }; and a int * that points to the j member you say I have no standard conforming way to get at a pointer to the i member from this, right? Because the pointer points to an 'int' object. But it also points within a struct f object! So at least maybe (int *)((char *)p - offsetof (struct f, j)) should be valid? This means that pfu->x1 + 10 is a valid pointer into *pfu no matter what you say and you can dereference it. Richard. > Martin >
Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)
On 12/14/2017 11:55 AM, Jakub Jelinek wrote: > On Thu, Dec 14, 2017 at 11:51:26AM -0700, Martin Sebor wrote: >>> Well, it would be nice to get sanitizers diagnose this at runtime. If we >>> know the array length at compile time, simply compare after the strlen >>> call the result and fail if it returns something above it. Or replace >>> the strlen call with strnlen for the compile time known size and add >>> instrumentation if strnlen returns the second argument. >> >> Sure, that sounds like a useful enhancement. I'll look into >> adding it as a follow-on patch unless you feel that it needs >> to be part of the same package. > > The problem is if we'll need changes to libubsan for that (which we'll > likely do), then those need to be upstreamed, and e.g. my attempts > to upstream simple patch to diagnose noreturn function returns is suspended > upstream because clang doesn't have that support (and I have no interest > in adding to to clang). > > In theory we could have some GCC only file in there, but then we'd be ABI > incompatible with them. So defer the sanitization side until Clang catches up? jeff
Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)
On Thu, Dec 14, 2017 at 11:51:26AM -0700, Martin Sebor wrote: > > Well, it would be nice to get sanitizers diagnose this at runtime. If we > > know the array length at compile time, simply compare after the strlen > > call the result and fail if it returns something above it. Or replace > > the strlen call with strnlen for the compile time known size and add > > instrumentation if strnlen returns the second argument. > > Sure, that sounds like a useful enhancement. I'll look into > adding it as a follow-on patch unless you feel that it needs > to be part of the same package. The problem is if we'll need changes to libubsan for that (which we'll likely do), then those need to be upstreamed, and e.g. my attempts to upstream simple patch to diagnose noreturn function returns is suspended upstream because clang doesn't have that support (and I have no interest in adding to to clang). In theory we could have some GCC only file in there, but then we'd be ABI incompatible with them. Jakub
Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)
On 12/14/2017 09:18 AM, Jakub Jelinek wrote: On Thu, Dec 14, 2017 at 09:13:21AM -0700, Jeff Law wrote: Although I would prefer not to, I suppose if letting strlen cross the boundaries of subobjects was considered an important use to accommodate in limited cases the optimization could be disabled for member arrays declared with the new nonstring attribute (while still issuing a warning for it as GCC does today). Another alternative (if the above use case is considered important enough) might be to suppress the optimization for member character arrays that are immediately followed by another such array. History tells us that there will be someone out there that does this kind of thing -- the question is how pervasive is it. My suspicion is that it is not common. Given that I don't expect those uses to be common, the only thing that should break is non-conforming code and we have a (new) warning for such code my inclination is to go forward. So I'm OK with the patch. I'd give folks till Monday to chime in with dissenting opinions. Well, it would be nice to get sanitizers diagnose this at runtime. If we know the array length at compile time, simply compare after the strlen call the result and fail if it returns something above it. Or replace the strlen call with strnlen for the compile time known size and add instrumentation if strnlen returns the second argument. Sure, that sounds like a useful enhancement. I'll look into adding it as a follow-on patch unless you feel that it needs to be part of the same package. Martin
Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)
On Thu, Dec 14, 2017 at 09:13:21AM -0700, Jeff Law wrote: > > Although I would prefer not to, I suppose if letting strlen cross > > the boundaries of subobjects was considered an important use to > > accommodate in limited cases the optimization could be disabled > > for member arrays declared with the new nonstring attribute (while > > still issuing a warning for it as GCC does today). > > > > Another alternative (if the above use case is considered important > > enough) might be to suppress the optimization for member character > > arrays that are immediately followed by another such array. > History tells us that there will be someone out there that does this > kind of thing -- the question is how pervasive is it. My suspicion is > that it is not common. > > Given that I don't expect those uses to be common, the only thing that > should break is non-conforming code and we have a (new) warning for such > code my inclination is to go forward. > > So I'm OK with the patch. I'd give folks till Monday to chime in with > dissenting opinions. Well, it would be nice to get sanitizers diagnose this at runtime. If we know the array length at compile time, simply compare after the strlen call the result and fail if it returns something above it. Or replace the strlen call with strnlen for the compile time known size and add instrumentation if strnlen returns the second argument. Jakub
Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)
On 12/12/2017 08:47 PM, Martin Sebor wrote: >> >> struct fu { >> char x1[10]; >> char x2[10]; >> int avoid_trailing_array; >> } >> >> Where objects stored in x1 are not null terminated. Are we in the realm >> of undefined behavior at that point (I hope so)? > > Yes, this is undefined. Pointer arithmetic (either direct or > via standard library functions) is only defined for pointers > to the same object or subobject. So even something like That's what I expected -- I just wanted to be sure there wasn't some kind of escape clause that would allow one to compose an object of that nature, then use pfu->x1 to read/write pfu->x2. > > memcpy (pfu->x1, pfu->x1 + 10, 10); > > is undefined. _FORTIFY_SOURCE allows it for raw memory functions > because some low level code copies regions of the same object that > span two or more members (I've seen an example or two in the Linux > kernel but, IIRC, nowhere else). With the patch, using memchr > would be the only way to get away with it. Presumably the right way to go about things in this situation is to use a pointer to the outer object. > > Other than that, the new -Wstringop-truncation warning is designed > to prevent creating unterminated character arrays and the manual > suggests using attribute nonstring when it's necessary. The > -Wrestrict warning I'm about to check in also complains about > forming invalid pointers by built-ins with restrict-qualified > arguments (although it won't diagnose the above). Right. The concern is that it's an new option and folks likely haven't cleaned up their codebases for these kinds of issues. > > Although I would prefer not to, I suppose if letting strlen cross > the boundaries of subobjects was considered an important use to > accommodate in limited cases the optimization could be disabled > for member arrays declared with the new nonstring attribute (while > still issuing a warning for it as GCC does today). > > Another alternative (if the above use case is considered important > enough) might be to suppress the optimization for member character > arrays that are immediately followed by another such array. History tells us that there will be someone out there that does this kind of thing -- the question is how pervasive is it. My suspicion is that it is not common. Given that I don't expect those uses to be common, the only thing that should break is non-conforming code and we have a (new) warning for such code my inclination is to go forward. So I'm OK with the patch. I'd give folks till Monday to chime in with dissenting opinions. Jeff
Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)
On 12/14/2017 03:43 AM, Richard Biener wrote: On Wed, Dec 13, 2017 at 4:47 AM, Martin Seborwrote: On 12/12/2017 05:35 PM, Jeff Law wrote: On 12/12/2017 01:15 PM, Martin Sebor wrote: Bug 83373 - False positive reported by -Wstringop-overflow, is another example of warning triggered by a missed optimization opportunity, this time in the strlen pass. The optimization is discussed in pr78450 - strlen(s) return value can be assumed to be less than the size of s. The gist of it is that the result of strlen(array) can be assumed to be less than the size of the array (except in the corner case of last struct members). To avoid the false positive the attached patch adds this optimization to the strlen pass. Although the patch passes bootstrap and regression tests for all front-ends I'm not sure the way it determines the upper bound of the range is 100% correct for languages with arrays with a non-zero lower bound. Maybe it's just not as tight as it could be. What about something hideous like struct fu { char x1[10]; char x2[10]; int avoid_trailing_array; } Where objects stored in x1 are not null terminated. Are we in the realm of undefined behavior at that point (I hope so)? Yes, this is undefined. Pointer arithmetic (either direct or via standard library functions) is only defined for pointers to the same object or subobject. So even something like memcpy (pfu->x1, pfu->x1 + 10, 10); is undefined. There's nothing undefined here - computing the pointer pointing to one-after-the-last element of an array is valid (you are just not allowed to dereference it). Right, and memcpy dereferences it, so it's undefined. Martin
Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)
On Wed, Dec 13, 2017 at 4:47 AM, Martin Seborwrote: > On 12/12/2017 05:35 PM, Jeff Law wrote: >> >> On 12/12/2017 01:15 PM, Martin Sebor wrote: >>> >>> Bug 83373 - False positive reported by -Wstringop-overflow, is >>> another example of warning triggered by a missed optimization >>> opportunity, this time in the strlen pass. The optimization >>> is discussed in pr78450 - strlen(s) return value can be assumed >>> to be less than the size of s. The gist of it is that the result >>> of strlen(array) can be assumed to be less than the size of >>> the array (except in the corner case of last struct members). >>> >>> To avoid the false positive the attached patch adds this >>> optimization to the strlen pass. Although the patch passes >>> bootstrap and regression tests for all front-ends I'm not sure >>> the way it determines the upper bound of the range is 100% >>> correct for languages with arrays with a non-zero lower bound. >>> Maybe it's just not as tight as it could be. >> >> What about something hideous like >> >> struct fu { >> char x1[10]; >> char x2[10]; >> int avoid_trailing_array; >> } >> >> Where objects stored in x1 are not null terminated. Are we in the realm >> of undefined behavior at that point (I hope so)? > > > Yes, this is undefined. Pointer arithmetic (either direct or > via standard library functions) is only defined for pointers > to the same object or subobject. So even something like > > memcpy (pfu->x1, pfu->x1 + 10, 10); > > is undefined. There's nothing undefined here - computing the pointer pointing to one-after-the-last element of an array is valid (you are just not allowed to dereference it). > _FORTIFY_SOURCE allows it for raw memory functions > because some low level code copies regions of the same object that > span two or more members (I've seen an example or two in the Linux > kernel but, IIRC, nowhere else). With the patch, using memchr > would be the only way to get away with it. > > Other than that, the new -Wstringop-truncation warning is designed > to prevent creating unterminated character arrays and the manual > suggests using attribute nonstring when it's necessary. The > -Wrestrict warning I'm about to check in also complains about > forming invalid pointers by built-ins with restrict-qualified > arguments (although it won't diagnose the above). > > Although I would prefer not to, I suppose if letting strlen cross > the boundaries of subobjects was considered an important use to > accommodate in limited cases the optimization could be disabled > for member arrays declared with the new nonstring attribute (while > still issuing a warning for it as GCC does today). > > Another alternative (if the above use case is considered important > enough) might be to suppress the optimization for member character > arrays that are immediately followed by another such array. > > Martin
Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)
On 12/13/2017 12:25 AM, Bernhard Reutner-Fischer wrote: On 12 December 2017 21:15:25 CET, Martin Seborwrote: Tested on x86_64-linux. I assume this test worked even before this patch. Of the tests added by the patch, strlenopt-37.c passes without the compiler changes and strlenopt-36.c fails. Both are expected. -37.c verifies the optimization doesn't happen when it shouldn't happen and -36.c verifies it does take place when it's safe. The regression test for the warning fails without the patch and passes with the patch applied. Thus: s/oveflow/overflow/ Thanks. I'll fix that if/when the patch is approved. Martin thanks,
Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)
On 12 December 2017 21:15:25 CET, Martin Seborwrote: > >Tested on x86_64-linux. I assume this test worked even before this patch. Thus: s/oveflow/overflow/ thanks,
Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)
On 12/12/2017 05:35 PM, Jeff Law wrote: On 12/12/2017 01:15 PM, Martin Sebor wrote: Bug 83373 - False positive reported by -Wstringop-overflow, is another example of warning triggered by a missed optimization opportunity, this time in the strlen pass. The optimization is discussed in pr78450 - strlen(s) return value can be assumed to be less than the size of s. The gist of it is that the result of strlen(array) can be assumed to be less than the size of the array (except in the corner case of last struct members). To avoid the false positive the attached patch adds this optimization to the strlen pass. Although the patch passes bootstrap and regression tests for all front-ends I'm not sure the way it determines the upper bound of the range is 100% correct for languages with arrays with a non-zero lower bound. Maybe it's just not as tight as it could be. What about something hideous like struct fu { char x1[10]; char x2[10]; int avoid_trailing_array; } Where objects stored in x1 are not null terminated. Are we in the realm of undefined behavior at that point (I hope so)? Yes, this is undefined. Pointer arithmetic (either direct or via standard library functions) is only defined for pointers to the same object or subobject. So even something like memcpy (pfu->x1, pfu->x1 + 10, 10); is undefined. _FORTIFY_SOURCE allows it for raw memory functions because some low level code copies regions of the same object that span two or more members (I've seen an example or two in the Linux kernel but, IIRC, nowhere else). With the patch, using memchr would be the only way to get away with it. Other than that, the new -Wstringop-truncation warning is designed to prevent creating unterminated character arrays and the manual suggests using attribute nonstring when it's necessary. The -Wrestrict warning I'm about to check in also complains about forming invalid pointers by built-ins with restrict-qualified arguments (although it won't diagnose the above). Although I would prefer not to, I suppose if letting strlen cross the boundaries of subobjects was considered an important use to accommodate in limited cases the optimization could be disabled for member arrays declared with the new nonstring attribute (while still issuing a warning for it as GCC does today). Another alternative (if the above use case is considered important enough) might be to suppress the optimization for member character arrays that are immediately followed by another such array. Martin
Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)
On 12/12/2017 01:15 PM, Martin Sebor wrote: > Bug 83373 - False positive reported by -Wstringop-overflow, is > another example of warning triggered by a missed optimization > opportunity, this time in the strlen pass. The optimization > is discussed in pr78450 - strlen(s) return value can be assumed > to be less than the size of s. The gist of it is that the result > of strlen(array) can be assumed to be less than the size of > the array (except in the corner case of last struct members). > > To avoid the false positive the attached patch adds this > optimization to the strlen pass. Although the patch passes > bootstrap and regression tests for all front-ends I'm not sure > the way it determines the upper bound of the range is 100% > correct for languages with arrays with a non-zero lower bound. > Maybe it's just not as tight as it could be. What about something hideous like struct fu { char x1[10]; char x2[10]; int avoid_trailing_array; } Where objects stored in x1 are not null terminated. Are we in the realm of undefined behavior at that point (I hope so)? jeff