Re: [PATCH] have __builtin_object_size handle POINTER_PLUS with non-const offset (pr 77608)

2016-12-02 Thread Richard Biener
On Thu, Dec 1, 2016 at 6:58 PM, Martin Sebor  wrote:
>> Sure - but then you maybe instead want to check for op being in
>> range [0, max-of-signed-type-of-op] instead?  So similar to
>> expr_not_equal_to add a expr_in_range helper?
>>
>> Your function returns true for sizetype vars even if it might be
>> effectively signed, like for
>>
>>  sizetype i_1 = -4;
>>  i_2 = i_1 + 1;
>>
>> operand_unsigned_p (i) returns true.  I suppose you may have

(*)

>> meant
>>
>> +static bool
>> +operand_unsigned_p (tree op)
>> +{
>> +  if (TREE_CODE (op) == SSA_NAME)
>> +{
>> +  gimple *def = SSA_NAME_DEF_STMT (op);
>> +  if (is_gimple_assign (def))
>> +   {
>> + tree_code code = gimple_assign_rhs_code (def);
>> + if (code == NOP_EXPR
>>&& TYPE_PRECISION (TREE_TYPE (op)) > TYPE_PRECISION
>> (TREE_TYPE (gimple_assign_rhs1 (def
>>   return tree_expr_nonnegative_p (gimple_assign_rhs1 (def)));
>> +   }
>> +}
>> +
>> +  return false;
>> +}
>>
>> ?  because only if you do see a cast and that cast is widening from an
>> nonnegative number
>> the final one will be unsigned (if interpreted as signed number).
>
>
> I don't think this is what I want.  Here's a test case that works
> with my function but not with the suggested modification:
>
>char d[4];
>long f (unsigned long i)
>{
>  return __builtin_object_size (d + i + 1, 0);
>}
>
> Here, the size I'm looking for is (at most) 3 no matter what value
> i has.  Am I missing a case where my function will do the wrong
> thing?

See above (*)

I know what you are trying to do (retro-actively make value-ranges have
a split variable / constant part).  But I don't think that doing it the way you
do it is a reasonable maintainable way.  Others may disagree.

>> You might want to use offset_ints here (see mem_ref_offset for example)
>
>
> Okay, I'll see if I can switch to offset_int.
>
>>

 + gimple *def = SSA_NAME_DEF_STMT (off);
 + if (is_gimple_assign (def))
 +   {
 + tree_code code = gimple_assign_rhs_code (def);
 + if (code == PLUS_EXPR)
 +   {
 + /* Handle offset in the form VAR + CST where VAR's
 type
 +is unsigned so the offset must be the greater of
 +OFFRANGE[0] and CST.  This assumes the PLUS_EXPR
 +is in a canonical form with CST second.  */
 + tree rhs2 = gimple_assign_rhs2 (def);

 err, what?  What about overflow?  Aren't you just trying to decompose
 'off' into a variable and a constant part here and somehow extracting a
 range for the variable part?  So why not just do that?
>>>
>>>
>>>
>>> Sorry, what about overflow?
>>>
>>> The purpose of this code is to handle cases of the form
>>>
>>>& PTR [range (MIN, MAX)] + CST
>>
>>
>> what if MAX + CST overflows?
>
>
> The code doesn't look at MAX, only MIN is considered.  It extracts
> both but only actually uses MAX to see if it's dealing with a range
> or a constant.  Does that resolve your concern?

But if MAX overflows then it might be smaller than MIN and thus you
cannot conclude that [min, max] + CST is [min+CST, UNKNWON]
but rather it's [UNKNOWN, UNKNOWN] (if you disregard the actual
value of MAX).

>>>char d[7];
>>>
>>>#define bos(p, t) __builtin_object_size (p, t)
>>>
>>>long f (unsigned i)
>>>{
>>>  if (2 < i) i = 2;
>>>
>>>  char *p = [i] + 3;
>>>
>>>  return bos (p, 0);
>>>}
>>
>>
>> I'm sure that doesn't work as you match for PLUS_EXPR.
>
>
> Sorry, I'm not sure what you mean.

I mean that p = [i] + 3; is not represented as a PLUS_EXPR
but as a POINTER_PLUS_EXPR.  All pointer arithmetic is using
POINTER_PLUS_EXPR.

>  The above evaluates to 4 with
> the patch because i cannot be less than zero (otherwise [i] would
> be invalid/undefined) so the type-0 size we want (the maximum) is
> [7] - ([0] + 3) or 4.
>
>> Maybe simply ignore VR_ANTI_RANGEs for now then?
>
>
> Yes, that makes sense.
>
>>> The code above is based on the observation that an anti-range is
>>> often used to represent the full subrange of a narrower signed type
>>> like signed char (as ~[128, -129]).  I haven't been able to create
>>> an anti-range like ~[5, 9]. When/how would a range like that come
>>> about (so I can test it and implement the above correctly)?
>>
>>
>> if (a < 4)
>>   if (a > 8)
>> b = a;
>>
>> then b should have ~[5, 9]
>
>
> Right :)  I have figured out by know by know how to create an anti
> range in general.  What I meant is that I haven't had luck creating
> them in a way that the tree-object-size pass could see (I'm guessing
> because EVRP doesn't understand relational expressions).  So given
> this modified example from above:
>
> char d[9];
>
> #define bos(p, t) __builtin_object_size (p, t)
>
> long f (unsigned a)
> {
>unsigned b = 0;
>
>if (a < 4)
>  

Re: [PATCH] have __builtin_object_size handle POINTER_PLUS with non-const offset (pr 77608)

2016-12-01 Thread Martin Sebor

Sure - but then you maybe instead want to check for op being in
range [0, max-of-signed-type-of-op] instead?  So similar to
expr_not_equal_to add a expr_in_range helper?

Your function returns true for sizetype vars even if it might be
effectively signed, like for

 sizetype i_1 = -4;
 i_2 = i_1 + 1;

operand_unsigned_p (i) returns true.  I suppose you may have
meant

+static bool
+operand_unsigned_p (tree op)
+{
+  if (TREE_CODE (op) == SSA_NAME)
+{
+  gimple *def = SSA_NAME_DEF_STMT (op);
+  if (is_gimple_assign (def))
+   {
+ tree_code code = gimple_assign_rhs_code (def);
+ if (code == NOP_EXPR
   && TYPE_PRECISION (TREE_TYPE (op)) > TYPE_PRECISION
(TREE_TYPE (gimple_assign_rhs1 (def
  return tree_expr_nonnegative_p (gimple_assign_rhs1 (def)));
+   }
+}
+
+  return false;
+}

?  because only if you do see a cast and that cast is widening from an
nonnegative number
the final one will be unsigned (if interpreted as signed number).


I don't think this is what I want.  Here's a test case that works
with my function but not with the suggested modification:

   char d[4];
   long f (unsigned long i)
   {
 return __builtin_object_size (d + i + 1, 0);
   }

Here, the size I'm looking for is (at most) 3 no matter what value
i has.  Am I missing a case where my function will do the wrong
thing?


You might want to use offset_ints here (see mem_ref_offset for example)


Okay, I'll see if I can switch to offset_int.





+ gimple *def = SSA_NAME_DEF_STMT (off);
+ if (is_gimple_assign (def))
+   {
+ tree_code code = gimple_assign_rhs_code (def);
+ if (code == PLUS_EXPR)
+   {
+ /* Handle offset in the form VAR + CST where VAR's type
+is unsigned so the offset must be the greater of
+OFFRANGE[0] and CST.  This assumes the PLUS_EXPR
+is in a canonical form with CST second.  */
+ tree rhs2 = gimple_assign_rhs2 (def);

err, what?  What about overflow?  Aren't you just trying to decompose
'off' into a variable and a constant part here and somehow extracting a
range for the variable part?  So why not just do that?



Sorry, what about overflow?

The purpose of this code is to handle cases of the form

   & PTR [range (MIN, MAX)] + CST


what if MAX + CST overflows?


The code doesn't look at MAX, only MIN is considered.  It extracts
both but only actually uses MAX to see if it's dealing with a range
or a constant.  Does that resolve your concern?


   char d[7];

   #define bos(p, t) __builtin_object_size (p, t)

   long f (unsigned i)
   {
 if (2 < i) i = 2;

 char *p = [i] + 3;

 return bos (p, 0);
   }


I'm sure that doesn't work as you match for PLUS_EXPR.


Sorry, I'm not sure what you mean.  The above evaluates to 4 with
the patch because i cannot be less than zero (otherwise [i] would
be invalid/undefined) so the type-0 size we want (the maximum) is
[7] - ([0] + 3) or 4.


Maybe simply ignore VR_ANTI_RANGEs for now then?


Yes, that makes sense.


The code above is based on the observation that an anti-range is
often used to represent the full subrange of a narrower signed type
like signed char (as ~[128, -129]).  I haven't been able to create
an anti-range like ~[5, 9]. When/how would a range like that come
about (so I can test it and implement the above correctly)?


if (a < 4)
  if (a > 8)
b = a;

then b should have ~[5, 9]


Right :)  I have figured out by know by know how to create an anti
range in general.  What I meant is that I haven't had luck creating
them in a way that the tree-object-size pass could see (I'm guessing
because EVRP doesn't understand relational expressions).  So given
this modified example from above:

char d[9];

#define bos(p, t) __builtin_object_size (p, t)

long f (unsigned a)
{
   unsigned b = 0;

   if (a < 4)
 if (a > 8)
   b = a;

   char *p = [b];
   return bos (p, 0);
}

The value ranges after Early VRP are:

_1: VARYING
b_2: VARYING
a_4(D): VARYING
p_6: ~[0B, 0B]
_8: VARYING

But with the removal of the anti-range code this will be a moot
point.


Maybe the poor range info i a consequence of the pass only benefiting
from EVRP and not VRP?


The range of 'p' is indeed not known (we only represent integer bound ranges).
You seem to want the range of p - [0] here, something that is not present
in the IL.


Yes, that's effectively what this patch does.  Approximate pointer
ranges.


It's just something I haven't had time to work on yet and with the
close of stage 1 approaching I wanted to put out this version for
review.  Do you view this enhancement as prerequisite for approving
the patch or is it something that you'd be fine with adding later?


I find the patch adds quite some ad-hoc ugliness to a pass that is
already complex and nearly impossible to understand.


I'm sorry it looks ugly to you.  I'm afraid I'm not yet familiar

Re: [PATCH] have __builtin_object_size handle POINTER_PLUS with non-const offset (pr 77608)

2016-11-28 Thread Jeff Law

On 11/24/2016 05:11 AM, Richard Biener wrote:




where CST is unsigned implying that the lower bound of the offset
is the greater of CST and MIN.  For instance, in the following it
determines that bos(p, 0) is 4 (and if the 3 were greater than 7
and overflowed the addition the result would be zero).  I'm not
sure I understand what you suggest I do differently to make this
work.

   char d[7];

   #define bos(p, t) __builtin_object_size (p, t)

   long f (unsigned i)
   {
 if (2 < i) i = 2;

 char *p = [i] + 3;

 return bos (p, 0);
   }


I'm sure that doesn't work as you match for PLUS_EXPR.




+  else if (range_type == VR_ANTI_RANGE)
+   {
+ offrange[0] = max.to_uhwi () + 1;
+ offrange[1] = min.to_uhwi () - 1;
+ return true;
+   }

first of all, how do you know it fits uhwi?  Second, from ~[5, 9] you get
[10, 4] !?  That looks bogus (and contrary to the function comment
postcondition)



I admit I have some trouble working with anti-ranges.  It's also
difficult to fully exercise them in this pass because it only runs
after EVRP but not after VRP1 (except with -g), so only limited
range information is available. (I'm hoping to eventually change
it but moving the passes broke a test in a way that seemed too
complex to fix for this project).


Maybe simply ignore VR_ANTI_RANGEs for now then?
That's where I'd start.  I can see how handling an anti-range might be 
useful, but I don't think it's as important as the rest of the stuff 
we're trying to catch here.


So I'd echo Richi's suggestion, punt VR_ANTI_RANGE for now.




The code above is based on the observation that an anti-range is
often used to represent the full subrange of a narrower signed type
like signed char (as ~[128, -129]).  I haven't been able to create
an anti-range like ~[5, 9]. When/how would a range like that come
about (so I can test it and implement the above correctly)?


if (a < 4)
  if (a > 8)
b = a;

then b should have ~[5, 9]

Right.



Note a trick VRP uses internally is to treat an anti-range as
the union of two ranges.  ~[X, Y] == [MIN, X-1] u [Y+1, MAX].
That essentially removes anti-range handling from VRPs
propagation - not sure if it would help your case.

It certainly helps Andrew, but he's working on a totally different problem.






+  else if (range_type == VR_VARYING)
+   {
+ gimple *def = SSA_NAME_DEF_STMT (off);
+ if (is_gimple_assign (def))
+   {
+ tree_code code = gimple_assign_rhs_code (def);
+ if (code == NOP_EXPR)
+   {

please trust range-info instead of doing your own little VRP here.
VR_VARYING -> return false.



I would prefer to rely on the range information and not have to work
around it like I do here but, unfortunately, it doesn't always appear
to be available.  For example, in the following test case:

I find myself in agreement with Richi on this.

I *would* suggest taking those cases where you are enhancing the results 
you get from VRP and turning those into xfailed's testcases.  Those 
tests represent future work :-)




get_range_info doesn't accept pointers at all (perhaps that's what
you meant by "VRP on pointers is limited").  In Gimple, [i] + 5
is represented as just that (i.e., _1 = [i_3]; p_6 = _1 + 5;) and
so to get the right answer for bos([i] + 5) the patch jumps though
some hoops.  But again, I could very well be missing something that's
obvious to you so if you think that's the case I'd be happy to simplify
the code if you point me in the right direction.


Yes, get_range_info is limited.  IMHO of you want to enhance object-size
to cover ranges by implicitely working on a different IL then it probably
should be re-written with that in mind.  The EVRP pass provides a
good template for how to re-use the VRP machinery and decomposing
the lattice a bit further into BASE + range shouldn't be difficult.

I'm leaving it to others if we have to have this improvement for GCC 7
(bos has its own share of know issues besides missing features).
It falls into "it'd be nice, but it's not critical".  Incrementally 
improving b_o_s is seen as a high value target because doing so improves 
our ability to statically detect buffer overflows.


The question is can we improve b_o_s without making the pass even more 
incomprehensible.


Seems like Martin has a follow-up patch, so I'll wait to see that.
jeff


Re: [PATCH] have __builtin_object_size handle POINTER_PLUS with non-const offset (pr 77608)

2016-11-24 Thread Richard Biener
On Fri, Nov 11, 2016 at 4:56 PM, Martin Sebor  wrote:
> Thanks for the review and comments!

First of all sorry for the late response.

>>
>> @@ -158,14 +170,149 @@ compute_object_offset (const_tree expr, const_tree
>> var)
>>return size_binop (code, base, off);
>>  }
>>
>> +static bool
>> +operand_unsigned_p (tree op)
>> +{
>> +  if (TREE_CODE (op) == SSA_NAME)
>>
>> new functions need a comment.  But maybe you want to use
>> tree_expr_nonnegative_p
>> to also allow signed but known positive ones?
>
>
> Let me add a comment.
>
> operand_unsigned_p returns true if the type of the original offset
> before conversion to sizetype is unsigned.  tree_expr_nonnegative_p
> returns true if the argument's type is unsigned, which is always
> the case here.

Sure - but then you maybe instead want to check for op being in
range [0, max-of-signed-type-of-op] instead?  So similar to
expr_not_equal_to add a expr_in_range helper?

Your function returns true for sizetype vars even if it might be
effectively signed, like for

 sizetype i_1 = -4;
 i_2 = i_1 + 1;

operand_unsigned_p (i) returns true.  I suppose you may have
meant

+static bool
+operand_unsigned_p (tree op)
+{
+  if (TREE_CODE (op) == SSA_NAME)
+{
+  gimple *def = SSA_NAME_DEF_STMT (op);
+  if (is_gimple_assign (def))
+   {
+ tree_code code = gimple_assign_rhs_code (def);
+ if (code == NOP_EXPR
   && TYPE_PRECISION (TREE_TYPE (op)) > TYPE_PRECISION
(TREE_TYPE (gimple_assign_rhs1 (def
  return tree_expr_nonnegative_p (gimple_assign_rhs1 (def)));
+   }
+}
+
+  return false;
+}

?  because only if you do see a cast and that cast is widening from an
nonnegative number
the final one will be unsigned (if interpreted as signed number).

>>
>> +/* Fill the 2-element OFFRANGE array with the range of values OFF
>> +   is known to be in.  Postcodition: OFFRANGE[0] <= OFFRANGE[1].  */
>> +
>> +static bool
>> +get_offset_range (tree off, HOST_WIDE_INT offrange[2])
>> +{
>> +  STRIP_NOPS (off);
>>
>> why strip nops (even sign changes!) here?
>
>
> That might be a leftover from an earlier/failed attempt to simplify
> things that I forgot to remove.  Let me do that in a followup patch.
> Unless I misunderstand your comment there are no sign changes (AFAIK)
> because the offset is always represented as sizetype.
>
>> Why below convert things
>> via to_uhwi when offrange is of type signed HOST_WIDE_INT[2]?
>
>
> The offset is always represented as sizetype but the code treats
> it as signed because in reality it can be negative.  That said,
> I don't find dealing with ranges very intuitive so I could very
> well be missing something and there may be a better way to code
> this.  I welcome suggestions.

You might want to use offset_ints here (see mem_ref_offset for example)

>>
>> + gimple *def = SSA_NAME_DEF_STMT (off);
>> + if (is_gimple_assign (def))
>> +   {
>> + tree_code code = gimple_assign_rhs_code (def);
>> + if (code == PLUS_EXPR)
>> +   {
>> + /* Handle offset in the form VAR + CST where VAR's type
>> +is unsigned so the offset must be the greater of
>> +OFFRANGE[0] and CST.  This assumes the PLUS_EXPR
>> +is in a canonical form with CST second.  */
>> + tree rhs2 = gimple_assign_rhs2 (def);
>>
>> err, what?  What about overflow?  Aren't you just trying to decompose
>> 'off' into a variable and a constant part here and somehow extracting a
>> range for the variable part?  So why not just do that?
>
>
> Sorry, what about overflow?
>
> The purpose of this code is to handle cases of the form
>
>& PTR [range (MIN, MAX)] + CST

what if MAX + CST overflows?

Note pointer plus int is a POINTER_PLUS_EXPR, not a PLUS_EXPR.
Only for a POINTER_PLUS_EXPR you might argue that overflow is
undefined.

> where CST is unsigned implying that the lower bound of the offset
> is the greater of CST and MIN.  For instance, in the following it
> determines that bos(p, 0) is 4 (and if the 3 were greater than 7
> and overflowed the addition the result would be zero).  I'm not
> sure I understand what you suggest I do differently to make this
> work.
>
>char d[7];
>
>#define bos(p, t) __builtin_object_size (p, t)
>
>long f (unsigned i)
>{
>  if (2 < i) i = 2;
>
>  char *p = [i] + 3;
>
>  return bos (p, 0);
>}

I'm sure that doesn't work as you match for PLUS_EXPR.

>>
>>
>> +  else if (range_type == VR_ANTI_RANGE)
>> +   {
>> + offrange[0] = max.to_uhwi () + 1;
>> + offrange[1] = min.to_uhwi () - 1;
>> + return true;
>> +   }
>>
>> first of all, how do you know it fits uhwi?  Second, from ~[5, 9] you get
>> [10, 4] !?  That looks bogus (and contrary to the function comment
>> postcondition)
>
>
> I admit I have some trouble working with anti-ranges.  It's also
> difficult to fully 

Re: [PATCH] have __builtin_object_size handle POINTER_PLUS with non-const offset (pr 77608)

2016-11-11 Thread Martin Sebor

Thanks for the review and comments!



@@ -158,14 +170,149 @@ compute_object_offset (const_tree expr, const_tree var)
   return size_binop (code, base, off);
 }

+static bool
+operand_unsigned_p (tree op)
+{
+  if (TREE_CODE (op) == SSA_NAME)

new functions need a comment.  But maybe you want to use tree_expr_nonnegative_p
to also allow signed but known positive ones?


Let me add a comment.

operand_unsigned_p returns true if the type of the original offset
before conversion to sizetype is unsigned.  tree_expr_nonnegative_p
returns true if the argument's type is unsigned, which is always
the case here.



+/* Fill the 2-element OFFRANGE array with the range of values OFF
+   is known to be in.  Postcodition: OFFRANGE[0] <= OFFRANGE[1].  */
+
+static bool
+get_offset_range (tree off, HOST_WIDE_INT offrange[2])
+{
+  STRIP_NOPS (off);

why strip nops (even sign changes!) here?


That might be a leftover from an earlier/failed attempt to simplify
things that I forgot to remove.  Let me do that in a followup patch.
Unless I misunderstand your comment there are no sign changes (AFAIK)
because the offset is always represented as sizetype.


Why below convert things
via to_uhwi when offrange is of type signed HOST_WIDE_INT[2]?


The offset is always represented as sizetype but the code treats
it as signed because in reality it can be negative.  That said,
I don't find dealing with ranges very intuitive so I could very
well be missing something and there may be a better way to code
this.  I welcome suggestions.



+ gimple *def = SSA_NAME_DEF_STMT (off);
+ if (is_gimple_assign (def))
+   {
+ tree_code code = gimple_assign_rhs_code (def);
+ if (code == PLUS_EXPR)
+   {
+ /* Handle offset in the form VAR + CST where VAR's type
+is unsigned so the offset must be the greater of
+OFFRANGE[0] and CST.  This assumes the PLUS_EXPR
+is in a canonical form with CST second.  */
+ tree rhs2 = gimple_assign_rhs2 (def);

err, what?  What about overflow?  Aren't you just trying to decompose
'off' into a variable and a constant part here and somehow extracting a
range for the variable part?  So why not just do that?


Sorry, what about overflow?

The purpose of this code is to handle cases of the form

   & PTR [range (MIN, MAX)] + CST

where CST is unsigned implying that the lower bound of the offset
is the greater of CST and MIN.  For instance, in the following it
determines that bos(p, 0) is 4 (and if the 3 were greater than 7
and overflowed the addition the result would be zero).  I'm not
sure I understand what you suggest I do differently to make this
work.

   char d[7];

   #define bos(p, t) __builtin_object_size (p, t)

   long f (unsigned i)
   {
 if (2 < i) i = 2;

 char *p = [i] + 3;

 return bos (p, 0);
   }


+  else if (range_type == VR_ANTI_RANGE)
+   {
+ offrange[0] = max.to_uhwi () + 1;
+ offrange[1] = min.to_uhwi () - 1;
+ return true;
+   }

first of all, how do you know it fits uhwi?  Second, from ~[5, 9] you get
[10, 4] !?  That looks bogus (and contrary to the function comment
postcondition)


I admit I have some trouble working with anti-ranges.  It's also
difficult to fully exercise them in this pass because it only runs
after EVRP but not after VRP1 (except with -g), so only limited
range information is available. (I'm hoping to eventually change
it but moving the passes broke a test in a way that seemed too
complex to fix for this project).

The code above is based on the observation that an anti-range is
often used to represent the full subrange of a narrower signed type
like signed char (as ~[128, -129]).  I haven't been able to create
an anti-range like ~[5, 9]. When/how would a range like that come
about (so I can test it and implement the above correctly)?



+  else if (range_type == VR_VARYING)
+   {
+ gimple *def = SSA_NAME_DEF_STMT (off);
+ if (is_gimple_assign (def))
+   {
+ tree_code code = gimple_assign_rhs_code (def);
+ if (code == NOP_EXPR)
+   {

please trust range-info instead of doing your own little VRP here.
VR_VARYING -> return false.


I would prefer to rely on the range information and not have to work
around it like I do here but, unfortunately, it doesn't always appear
to be available.  For example, in the following test case:

   char d[130];

   #define bos(p, t) __builtin_object_size (p, t)

   void f (void*);

   void g (signed char i)
   {
  char *p =  [i] + 128;
 f(__builtin___memset_chk (p, '*', 2, bos (p, 0)));   // okay

 f(__builtin___memset_chk (p, '*', 3, bos (p, 0)));   // overflow
   }

the range type is VR_VARYING but the code makes it possible to diagnose
the overflow in the second call to memset.

Maybe the poor range info i a consequence of the pass only benefiting
from EVRP and not 

Re: [PATCH] have __builtin_object_size handle POINTER_PLUS with non-const offset (pr 77608)

2016-11-10 Thread Richard Biener
On Tue, Nov 8, 2016 at 5:03 AM, Martin Sebor  wrote:
> It's taken me longer than I expected to finally get back to this
> project.  Sorry about the delay.
>
>   https://gcc.gnu.org/ml/gcc-patches/2016-09/msg01110.html
>
> Attached is an updated patch with this enhancement and reflecting
> you previous comment.
>
> Besides running the GCC test suite I tested the patch by building
> Binutils and the Linux kernel.  It found one stpcpy-related overflow
> in Binutils that I'm looking into and reduced by one the number of
> problems reported by the -Wformat-length option in the kernel (I
> haven't yet checked which one it eliminated).
>
> Although I'm not done investigating the Binutils problem I'm posting
> the patch for review now to allow for comments before stage 1 ends.

@@ -158,14 +170,149 @@ compute_object_offset (const_tree expr, const_tree var)
   return size_binop (code, base, off);
 }

+static bool
+operand_unsigned_p (tree op)
+{
+  if (TREE_CODE (op) == SSA_NAME)

new functions need a comment.  But maybe you want to use tree_expr_nonnegative_p
to also allow signed but known positive ones?

+/* Fill the 2-element OFFRANGE array with the range of values OFF
+   is known to be in.  Postcodition: OFFRANGE[0] <= OFFRANGE[1].  */
+
+static bool
+get_offset_range (tree off, HOST_WIDE_INT offrange[2])
+{
+  STRIP_NOPS (off);

why strip nops (even sign changes!) here?  Why below convert things
via to_uhwi when offrange is of type signed HOST_WIDE_INT[2]?

+ gimple *def = SSA_NAME_DEF_STMT (off);
+ if (is_gimple_assign (def))
+   {
+ tree_code code = gimple_assign_rhs_code (def);
+ if (code == PLUS_EXPR)
+   {
+ /* Handle offset in the form VAR + CST where VAR's type
+is unsigned so the offset must be the greater of
+OFFRANGE[0] and CST.  This assumes the PLUS_EXPR
+is in a canonical form with CST second.  */
+ tree rhs2 = gimple_assign_rhs2 (def);

err, what?  What about overflow?  Aren't you just trying to decompose
'off' into a variable and a constant part here and somehow extracting a
range for the variable part?  So why not just do that?

+  else if (range_type == VR_ANTI_RANGE)
+   {
+ offrange[0] = max.to_uhwi () + 1;
+ offrange[1] = min.to_uhwi () - 1;
+ return true;
+   }

first of all, how do you know it fits uhwi?  Second, from ~[5, 9] you get
[10, 4] !?  That looks bogus (and contrary to the function comment
postcondition)

+  else if (range_type == VR_VARYING)
+   {
+ gimple *def = SSA_NAME_DEF_STMT (off);
+ if (is_gimple_assign (def))
+   {
+ tree_code code = gimple_assign_rhs_code (def);
+ if (code == NOP_EXPR)
+   {

please trust range-info instead of doing your own little VRP here.
VR_VARYING -> return false.

stopping review here noting that other parts of the compiler
split constant parts from variable parts and it looks like this is
what you want to do here?  That is, enhance

static vec object_sizes[4];

and cache a SSA-NAME, constant offset pair in addition?  Or just a range
(range of that SSA name + offset), if that's good enough -- wait, that's
what you get from get_range_info!

So I'm not sure where the whole complication in the patch comes from...

Possibly from the fact that VRP on pointers is limited and thus [i] + 5
doesn't get you a range for i + 5?

Richard.

> Martin
>
> PS The tests added in the patch (but nothing else) depend on
> the changes in the patch for c/53562:
>
>   https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00483.html
>


Re: [PATCH] have __builtin_object_size handle POINTER_PLUS with non-const offset (pr 77608)

2016-11-07 Thread Martin Sebor

It's taken me longer than I expected to finally get back to this
project.  Sorry about the delay.

  https://gcc.gnu.org/ml/gcc-patches/2016-09/msg01110.html

Attached is an updated patch with this enhancement and reflecting
you previous comment.

Besides running the GCC test suite I tested the patch by building
Binutils and the Linux kernel.  It found one stpcpy-related overflow
in Binutils that I'm looking into and reduced by one the number of
problems reported by the -Wformat-length option in the kernel (I
haven't yet checked which one it eliminated).

Although I'm not done investigating the Binutils problem I'm posting
the patch for review now to allow for comments before stage 1 ends.

Martin

PS The tests added in the patch (but nothing else) depend on
the changes in the patch for c/53562:

  https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00483.html

PR middle-end/77608 - missing protection on trivially detectable runtime buffer overflow

gcc/ChangeLog:
2016-11-07  Martin Sebor  

	PR middle-end/77608
	* tree-object-size.c (nonconst_offsets): New global.
	(compute_object_offset): New function.
	(addr_object_size): Add an argument.
	(compute_builtin_object_size): Rename...
	(internal_object_size): to this.
	(expr_object_size): Adjust.
	(merge_object_sizes): Handle non-constant offset.
	(plus_stmt_object_size): Same.
	(collect_object_sizes_for): Change to return bool.
	(init_object_sizes): Initialize nonconst_offsets.
	(fini_object_sizes): Release nonconst_offsets.

gcc/testsuite/ChangeLog:
2016-11-07  Martin Sebor  

	PR middle-end/77608
	* gcc.c-torture/execute/builtins/lib/chk.c: Add debugging output.
	* gcc.c-torture/execute/builtins/mempcpy-chk.c: Add debugging output.
	(test5): Allow test to emit __mempcpy_chk.
	* gcc.dg/builtin-object-size-18.c: New test.
	* gcc.dg/builtin-stringop-chk-3.c: New test.
	* gcc.dg/pr77608.c: New test.

diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/lib/chk.c b/gcc/testsuite/gcc.c-torture/execute/builtins/lib/chk.c
index b19d7bf..6966b41 100644
--- a/gcc/testsuite/gcc.c-torture/execute/builtins/lib/chk.c
+++ b/gcc/testsuite/gcc.c-torture/execute/builtins/lib/chk.c
@@ -5,6 +5,15 @@
 
 extern void abort (void);
 
+const char *testfunc;
+int testline;
+
+#define abort()\
+  (__builtin_printf ("%s:%i: %s: test failure in %s on line %i\n",	\
+		 __FILE__, __LINE__, __FUNCTION__,			\
+		 testfunc, testline),\
+   __builtin_abort ())
+
 extern int inside_main;
 void *chk_fail_buf[256] __attribute__((aligned (16)));
 volatile int chk_fail_allowed, chk_calls;
diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy-chk.c b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy-chk.c
index 7a1737c..b014aff 100644
--- a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy-chk.c
+++ b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy-chk.c
@@ -9,6 +9,15 @@ extern void *memcpy (void *, const void *, size_t);
 extern void *mempcpy (void *, const void *, size_t);
 extern int memcmp (const void *, const void *, size_t);
 
+extern const char *testfunc;
+extern int testline;
+
+#define memcpy(d, s, n)\
+  (testfunc = __func__, testline = __LINE__, memcpy (d, s, n))
+
+#define mempcpy(d, s, n)			\
+  (testfunc = __func__, testline = __LINE__, mempcpy (d, s, n))
+
 #include "chk.h"
 
 const char s1[] = "123";
@@ -17,6 +26,8 @@ volatile char *s2 = "defg"; /* prevent constant propagation to happen when whole
 volatile char *s3 = "FGH"; /* prevent constant propagation to happen when whole program assumptions are made.  */
 volatile size_t l1 = 1; /* prevent constant propagation to happen when whole program assumptions are made.  */
 
+#define abort() (__builtin_printf ("failure on line %i\n", __LINE__), __builtin_abort ())
+
 void
 __attribute__((noinline))
 test1 (void)
@@ -326,6 +337,8 @@ test4 (void)
   char buf3[20];
 
   chk_fail_allowed = 1;
+  mempcpy_disallowed = 0;
+
   /* Runtime checks.  */
   if (__builtin_setjmp (chk_fail_buf) == 0)
 {
@@ -343,7 +356,9 @@ test4 (void)
   vx = mempcpy ([19], "ab", 2);
   abort ();
 }
+
   chk_fail_allowed = 0;
+  mempcpy_disallowed = 1;
 }
 
 #ifndef MAX_OFFSET
@@ -377,6 +392,12 @@ test5 (void)
   int off1, off2, len, i;
   char *p, *q, c;
 
+  /* The call to mempcpy below, even though it's safe, may result in
+ a call to __mempcpy_chk when the (maximum) size of the destination
+ is determined.  */
+  int mempcpy_disallowed_save = mempcpy_disallowed;
+  mempcpy_disallowed = 0;
+
   for (off1 = 0; off1 < MAX_OFFSET; off1++)
 for (off2 = 0; off2 < MAX_OFFSET; off2++)
   for (len = 1; len < MAX_COPY; len++)
@@ -410,6 +431,8 @@ test5 (void)
 	if (*q != 'a')
 	  abort ();
 	}
+
+  mempcpy_disallowed = mempcpy_disallowed_save;
 }
 
 #define TESTSIZE 80
diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-18.c b/gcc/testsuite/gcc.dg/builtin-object-size-18.c
new file mode 100644
index 000..b1a5666
--- /dev/null
+++ 

Re: [PATCH] have __builtin_object_size handle POINTER_PLUS with non-const offset

2016-09-16 Thread Martin Sebor

On 09/16/2016 04:29 AM, Richard Biener wrote:

On Fri, Sep 16, 2016 at 5:29 AM, Martin Sebor  wrote:

__builtin_object_size fails for POINTER_PLUS expressions involving
non-constant offsets into objects of known size, causing GCC to fail
to detect (and add instrumentation to prevent) buffer overflow in
some basic cases such as the following:

  void f (unsigned i)
  {
char d [3];
memcpy (d + i, "abcdef", 5);
  }

Since the size of the destination object is known, the call to memcpy
is guaranteed to write past the end of it regardless of the value of
the offset.

The attached patch enhances __builtin_object_size to handle this case
by returning the size of the whole object as the maximum and the size
of the object minus T_MAX for the type of the offset T as the minimum.

The patch also adds handling of ranges even though only very few cases
benefit from it because the VRP pass runs after the object size pass.
The one case that does appear to profit is when the value of the offset
is constrained by its type, as in

  char a [1000];

  unsigned g (unsigned char i)
  {
char *p = a + i;
return __builtin_object_size (p, 2);
  }

Here get_range_info () lets __builtin_object_size determine that
the minimum number of bytes between (a + i) and (a + sizeof s) is
sizeof a - UCHAR_MAX.

The patch results in 64 more checking calls in a Binutils build than
before.


I believe that you can't simply use the min/max of the ranges the way you
do nor can you assume zero for the minimum size as op1 might be
negative (for POINTER_PLUS_EXPR you need to interpret the offset
as signed).

Unless I completely misremember how tree-object-size.c works, of course.

Say,

char a[1000];

unsigned g (signed char i)
{
  char *p = [10] + i;
  return __builtin_object_size (p, 1);
}

range for i will be [-128, 127] but we'd like to return 1000 here I think.


Thanks for the example.  I agree that 1000 is the correct result
in type < 2.  Curiously, the range reported for this case is
actually the anti-range ~[128, -129], and because the patch
didn't handle those the result was 990 (i.e., the same as
__builtin_object_size([10], type) for type < 2).

I think the way to handle cases like this in (type < 2) might be
by computing the size of the whole object first (i.e., ignoring
the array subscript) and then adjusting the result down according
to the bounds of the range.  Let me work on that.

Thanks
Martin



Re: [PATCH] have __builtin_object_size handle POINTER_PLUS with non-const offset

2016-09-16 Thread Jakub Jelinek
On Fri, Sep 16, 2016 at 12:29:49PM +0200, Richard Biener wrote:
> > PS What would be a good way to arrange for the VRP pass to run before
> > the object size pass so that the latter can benefit more from range
> > information?  As an experiment I added another instance of the VRP
> > pass before the object size pass in passes.def and that worked, but
> > I suspect that running VRP a third time isn't optimal.

As I said in PR77606, I have strong doubts about desirability of using VRP
info for __builtin_object_size computation.  VRP is an optimization that
assumes undefined behavior does not happen, __builtin_object_size is
typically used to catch undefined behavior, so pretty much assumes undefined
behavior can happen.  So, the __builtin_object_size computations should
prove no other value can appear in any program invocation, rather than
just any valid program invocation.  Sure, it is a best effort, with
possibility to return "unknown" or conservatively correct answers, but using
VRP info is IMHO already too much.

Jakub


Re: [PATCH] have __builtin_object_size handle POINTER_PLUS with non-const offset

2016-09-16 Thread Richard Biener
On Fri, Sep 16, 2016 at 5:29 AM, Martin Sebor  wrote:
> __builtin_object_size fails for POINTER_PLUS expressions involving
> non-constant offsets into objects of known size, causing GCC to fail
> to detect (and add instrumentation to prevent) buffer overflow in
> some basic cases such as the following:
>
>   void f (unsigned i)
>   {
> char d [3];
> memcpy (d + i, "abcdef", 5);
>   }
>
> Since the size of the destination object is known, the call to memcpy
> is guaranteed to write past the end of it regardless of the value of
> the offset.
>
> The attached patch enhances __builtin_object_size to handle this case
> by returning the size of the whole object as the maximum and the size
> of the object minus T_MAX for the type of the offset T as the minimum.
>
> The patch also adds handling of ranges even though only very few cases
> benefit from it because the VRP pass runs after the object size pass.
> The one case that does appear to profit is when the value of the offset
> is constrained by its type, as in
>
>   char a [1000];
>
>   unsigned g (unsigned char i)
>   {
> char *p = a + i;
> return __builtin_object_size (p, 2);
>   }
>
> Here get_range_info () lets __builtin_object_size determine that
> the minimum number of bytes between (a + i) and (a + sizeof s) is
> sizeof a - UCHAR_MAX.
>
> The patch results in 64 more checking calls in a Binutils build than
> before.

I believe that you can't simply use the min/max of the ranges the way you
do nor can you assume zero for the minimum size as op1 might be
negative (for POINTER_PLUS_EXPR you need to interpret the offset
as signed).

Unless I completely misremember how tree-object-size.c works, of course.

Say,

char a[1000];

unsigned g (signed char i)
{
  char *p = [10] + i;
  return __builtin_object_size (p, 1);
}

range for i will be [-128, 127] but we'd like to return 1000 here I think.

Richard.

> Martin
>
> PS What would be a good way to arrange for the VRP pass to run before
> the object size pass so that the latter can benefit more from range
> information?  As an experiment I added another instance of the VRP
> pass before the object size pass in passes.def and that worked, but
> I suspect that running VRP a third time isn't optimal.


Re: [PATCH] have __builtin_object_size handle POINTER_PLUS with non-const offset

2016-09-16 Thread kugan

Hi,

On 16/09/16 13:29, Martin Sebor wrote:

__builtin_object_size fails for POINTER_PLUS expressions involving
non-constant offsets into objects of known size, causing GCC to fail
to detect (and add instrumentation to prevent) buffer overflow in
some basic cases such as the following:

   void f (unsigned i)
   {
 char d [3];
 memcpy (d + i, "abcdef", 5);
   }

Since the size of the destination object is known, the call to memcpy
is guaranteed to write past the end of it regardless of the value of
the offset.

The attached patch enhances __builtin_object_size to handle this case
by returning the size of the whole object as the maximum and the size
of the object minus T_MAX for the type of the offset T as the minimum.

The patch also adds handling of ranges even though only very few cases
benefit from it because the VRP pass runs after the object size pass.
The one case that does appear to profit is when the value of the offset
is constrained by its type, as in

   char a [1000];

   unsigned g (unsigned char i)
   {
 char *p = a + i;
 return __builtin_object_size (p, 2);
   }

Here get_range_info () lets __builtin_object_size determine that
the minimum number of bytes between (a + i) and (a + sizeof s) is
sizeof a - UCHAR_MAX.

The patch results in 64 more checking calls in a Binutils build than
before.

Martin

PS What would be a good way to arrange for the VRP pass to run before
the object size pass so that the latter can benefit more from range
information?  As an experiment I added another instance of the VRP
pass before the object size pass in passes.def and that worked, but
I suspect that running VRP a third time isn't optimal.


I am working on early-vrp and that might be of interest to you.
https://gcc.gnu.org/ml/gcc-patches/2016-09/msg00993.html

Thanks,
Kugan