Re: [PATCH] integrate sprintf pass into strlen (PR 83431)

2019-08-27 Thread Martin Sebor

On 8/27/19 3:09 AM, Christophe Lyon wrote:

On Fri, 23 Aug 2019 at 04:14, Jeff Law  wrote:


On 8/12/19 4:09 PM, Martin Sebor wrote:



gcc-83431.diff

PR tree-optimization/83431 - -Wformat-truncation may incorrectly report 
truncation

gcc/ChangeLog:

   PR c++/83431
   * gimple-ssa-sprintf.c (pass_data_sprintf_length): Remove object.
   (sprintf_dom_walker): Remove class.
   (get_int_range): Make argument const.
   (directive::fmtfunc, directive::set_precision): Same.
   (format_none): Same.
   (build_intmax_type_nodes): Same.
   (adjust_range_for_overflow): Same.
   (format_floating): Same.
   (format_character): Same.
   (format_string): Same.
   (format_plain): Same.
   (get_int_range): Cast away constness.
   (format_integer): Same.
   (get_string_length): Call get_range_strlen_dynamic.  Handle
   null lendata.maxbound.
   (should_warn_p): Adjust argument scope qualifier.
   (maybe_warn): Same.
   (format_directive): Same.
   (parse_directive): Same.
   (is_call_safe): Same.
   (try_substitute_return_value): Same.
   (sprintf_dom_walker::handle_printf_call): Rename...
   (handle_printf_call): ...to this.  Initialize target to host charmap
   here instead of in pass_sprintf_length::execute.
   (struct call_info): Make global.
   (sprintf_dom_walker::compute_format_length): Make global.
   (sprintf_dom_walker::handle_gimple_call): Same.
   * passes.def (pass_sprintf_length): Replace with pass_strlen.
   * print-rtl.c (print_pattern): Reduce the number of spaces to
   avoid -Wformat-truncation.
   * tree-pass.h (make_pass_warn_printf): New function.
   * tree-ssa-strlen.c (strlen_optimize): New variable.
   (get_string_length): Add comments.
   (get_range_strlen_dynamic): New function.
   (check_and_optimize_call): New function.
   (handle_integral_assign): New function.
   (strlen_check_and_optimize_stmt): Factor code out into
   strlen_check_and_optimize_call and handle_integral_assign.
   (strlen_dom_walker::evrp): New member.
   (strlen_dom_walker::before_dom_children): Use evrp member.
   (strlen_dom_walker::after_dom_children): Use evrp member.
   (printf_strlen_execute): New function.
   (pass_strlen::gate): Update to handle printf calls.
   (dump_strlen_info): New function.
   (pass_data_warn_printf): New variable.
   (pass_warn_printf): New class.
   * tree-ssa-strlen.h (get_range_strlen_dynamic): Declare.
   (handle_printf_call): Same.

gcc/testsuite/ChangeLog:

   PR c++/83431
   * gcc.dg/strlenopt-63.c: New test.
   * gcc.dg/pr79538.c: Adjust text of expected warning.
   * gcc.dg/pr81292-1.c: Adjust pass name.
   * gcc.dg/pr81292-2.c: Same.
   * gcc.dg/pr81703.c: Same.
   * gcc.dg/strcmpopt_2.c: Same.
   * gcc.dg/strcmpopt_3.c: Same.
   * gcc.dg/strcmpopt_4.c: Same.
   * gcc.dg/strlenopt-1.c: Same.
   * gcc.dg/strlenopt-10.c: Same.
   * gcc.dg/strlenopt-11.c: Same.
   * gcc.dg/strlenopt-13.c: Same.
   * gcc.dg/strlenopt-14g.c: Same.
   * gcc.dg/strlenopt-14gf.c: Same.
   * gcc.dg/strlenopt-15.c: Same.
   * gcc.dg/strlenopt-16g.c: Same.
   * gcc.dg/strlenopt-17g.c: Same.
   * gcc.dg/strlenopt-18g.c: Same.
   * gcc.dg/strlenopt-19.c: Same.
   * gcc.dg/strlenopt-1f.c: Same.
   * gcc.dg/strlenopt-2.c: Same.
   * gcc.dg/strlenopt-20.c: Same.
   * gcc.dg/strlenopt-21.c: Same.
   * gcc.dg/strlenopt-22.c: Same.
   * gcc.dg/strlenopt-22g.c: Same.
   * gcc.dg/strlenopt-24.c: Same.
   * gcc.dg/strlenopt-25.c: Same.
   * gcc.dg/strlenopt-26.c: Same.
   * gcc.dg/strlenopt-27.c: Same.
   * gcc.dg/strlenopt-28.c: Same.
   * gcc.dg/strlenopt-29.c: Same.
   * gcc.dg/strlenopt-2f.c: Same.
   * gcc.dg/strlenopt-3.c: Same.
   * gcc.dg/strlenopt-30.c: Same.
   * gcc.dg/strlenopt-31g.c: Same.
   * gcc.dg/strlenopt-32.c: Same.
   * gcc.dg/strlenopt-33.c: Same.
   * gcc.dg/strlenopt-33g.c: Same.
   * gcc.dg/strlenopt-34.c: Same.
   * gcc.dg/strlenopt-35.c: Same.
   * gcc.dg/strlenopt-4.c: Same.
   * gcc.dg/strlenopt-48.c: Same.
   * gcc.dg/strlenopt-49.c: Same.
   * gcc.dg/strlenopt-4g.c: Same.
   * gcc.dg/strlenopt-4gf.c: Same.
   * gcc.dg/strlenopt-5.c: Same.
   * gcc.dg/strlenopt-50.c: Same.
   * gcc.dg/strlenopt-51.c: Same.
   * gcc.dg/strlenopt-52.c: Same.
   * gcc.dg/strlenopt-53.c: Same.
   * gcc.dg/strlenopt-54.c: Same.
   * gcc.dg/strlenopt-55.c: Same.
   * gcc.dg/strlenopt-56.c: Same.
   * gcc.dg/strlenopt-6.c: Same.
   * gcc.dg/strlenopt-61.c: Same.
   * gcc.dg/strlenopt-7.c: Same.
   * gcc.dg/strlenopt-8.c: Same.
   * gcc.dg/strlenopt-9.c: Same.
   * gcc.dg/strlenopt.h (snprintf, snprintf): Declare.
   * gcc.dg/tree-ssa/builtin-snprintf-6.c: New test.
   * gcc.dg/tre

Re: [PATCH] integrate sprintf pass into strlen (PR 83431)

2019-08-27 Thread Christophe Lyon
On Fri, 23 Aug 2019 at 04:14, Jeff Law  wrote:
>
> On 8/12/19 4:09 PM, Martin Sebor wrote:
>
> >
> > gcc-83431.diff
> >
> > PR tree-optimization/83431 - -Wformat-truncation may incorrectly report 
> > truncation
> >
> > gcc/ChangeLog:
> >
> >   PR c++/83431
> >   * gimple-ssa-sprintf.c (pass_data_sprintf_length): Remove object.
> >   (sprintf_dom_walker): Remove class.
> >   (get_int_range): Make argument const.
> >   (directive::fmtfunc, directive::set_precision): Same.
> >   (format_none): Same.
> >   (build_intmax_type_nodes): Same.
> >   (adjust_range_for_overflow): Same.
> >   (format_floating): Same.
> >   (format_character): Same.
> >   (format_string): Same.
> >   (format_plain): Same.
> >   (get_int_range): Cast away constness.
> >   (format_integer): Same.
> >   (get_string_length): Call get_range_strlen_dynamic.  Handle
> >   null lendata.maxbound.
> >   (should_warn_p): Adjust argument scope qualifier.
> >   (maybe_warn): Same.
> >   (format_directive): Same.
> >   (parse_directive): Same.
> >   (is_call_safe): Same.
> >   (try_substitute_return_value): Same.
> >   (sprintf_dom_walker::handle_printf_call): Rename...
> >   (handle_printf_call): ...to this.  Initialize target to host charmap
> >   here instead of in pass_sprintf_length::execute.
> >   (struct call_info): Make global.
> >   (sprintf_dom_walker::compute_format_length): Make global.
> >   (sprintf_dom_walker::handle_gimple_call): Same.
> >   * passes.def (pass_sprintf_length): Replace with pass_strlen.
> >   * print-rtl.c (print_pattern): Reduce the number of spaces to
> >   avoid -Wformat-truncation.
> >   * tree-pass.h (make_pass_warn_printf): New function.
> >   * tree-ssa-strlen.c (strlen_optimize): New variable.
> >   (get_string_length): Add comments.
> >   (get_range_strlen_dynamic): New function.
> >   (check_and_optimize_call): New function.
> >   (handle_integral_assign): New function.
> >   (strlen_check_and_optimize_stmt): Factor code out into
> >   strlen_check_and_optimize_call and handle_integral_assign.
> >   (strlen_dom_walker::evrp): New member.
> >   (strlen_dom_walker::before_dom_children): Use evrp member.
> >   (strlen_dom_walker::after_dom_children): Use evrp member.
> >   (printf_strlen_execute): New function.
> >   (pass_strlen::gate): Update to handle printf calls.
> >   (dump_strlen_info): New function.
> >   (pass_data_warn_printf): New variable.
> >   (pass_warn_printf): New class.
> >   * tree-ssa-strlen.h (get_range_strlen_dynamic): Declare.
> >   (handle_printf_call): Same.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   PR c++/83431
> >   * gcc.dg/strlenopt-63.c: New test.
> >   * gcc.dg/pr79538.c: Adjust text of expected warning.
> >   * gcc.dg/pr81292-1.c: Adjust pass name.
> >   * gcc.dg/pr81292-2.c: Same.
> >   * gcc.dg/pr81703.c: Same.
> >   * gcc.dg/strcmpopt_2.c: Same.
> >   * gcc.dg/strcmpopt_3.c: Same.
> >   * gcc.dg/strcmpopt_4.c: Same.
> >   * gcc.dg/strlenopt-1.c: Same.
> >   * gcc.dg/strlenopt-10.c: Same.
> >   * gcc.dg/strlenopt-11.c: Same.
> >   * gcc.dg/strlenopt-13.c: Same.
> >   * gcc.dg/strlenopt-14g.c: Same.
> >   * gcc.dg/strlenopt-14gf.c: Same.
> >   * gcc.dg/strlenopt-15.c: Same.
> >   * gcc.dg/strlenopt-16g.c: Same.
> >   * gcc.dg/strlenopt-17g.c: Same.
> >   * gcc.dg/strlenopt-18g.c: Same.
> >   * gcc.dg/strlenopt-19.c: Same.
> >   * gcc.dg/strlenopt-1f.c: Same.
> >   * gcc.dg/strlenopt-2.c: Same.
> >   * gcc.dg/strlenopt-20.c: Same.
> >   * gcc.dg/strlenopt-21.c: Same.
> >   * gcc.dg/strlenopt-22.c: Same.
> >   * gcc.dg/strlenopt-22g.c: Same.
> >   * gcc.dg/strlenopt-24.c: Same.
> >   * gcc.dg/strlenopt-25.c: Same.
> >   * gcc.dg/strlenopt-26.c: Same.
> >   * gcc.dg/strlenopt-27.c: Same.
> >   * gcc.dg/strlenopt-28.c: Same.
> >   * gcc.dg/strlenopt-29.c: Same.
> >   * gcc.dg/strlenopt-2f.c: Same.
> >   * gcc.dg/strlenopt-3.c: Same.
> >   * gcc.dg/strlenopt-30.c: Same.
> >   * gcc.dg/strlenopt-31g.c: Same.
> >   * gcc.dg/strlenopt-32.c: Same.
> >   * gcc.dg/strlenopt-33.c: Same.
> >   * gcc.dg/strlenopt-33g.c: Same.
> >   * gcc.dg/strlenopt-34.c: Same.
> >   * gcc.dg/strlenopt-35.c: Same.
> >   * gcc.dg/strlenopt-4.c: Same.
> >   * gcc.dg/strlenopt-48.c: Same.
> >   * gcc.dg/strlenopt-49.c: Same.
> >   * gcc.dg/strlenopt-4g.c: Same.
> >   * gcc.dg/strlenopt-4gf.c: Same.
> >   * gcc.dg/strlenopt-5.c: Same.
> >   * gcc.dg/strlenopt-50.c: Same.
> >   * gcc.dg/strlenopt-51.c: Same.
> >   * gcc.dg/strlenopt-52.c: Same.
> >   * gcc.dg/strlenopt-53.c: Same.
> >   * gcc.dg/strlenopt-54.c: Same.
> >   * gcc.dg/strlenopt-55.c: Same.
> >   * gcc.dg/strlenopt-56.c: Same.
> >   * gcc.d

Re: [PATCH] integrate sprintf pass into strlen (PR 83431)

2019-08-22 Thread Jeff Law
On 8/12/19 4:09 PM, Martin Sebor wrote:

> 
> gcc-83431.diff
> 
> PR tree-optimization/83431 - -Wformat-truncation may incorrectly report 
> truncation
> 
> gcc/ChangeLog:
> 
>   PR c++/83431
>   * gimple-ssa-sprintf.c (pass_data_sprintf_length): Remove object.
>   (sprintf_dom_walker): Remove class.
>   (get_int_range): Make argument const.
>   (directive::fmtfunc, directive::set_precision): Same.
>   (format_none): Same.
>   (build_intmax_type_nodes): Same.
>   (adjust_range_for_overflow): Same.
>   (format_floating): Same.
>   (format_character): Same.
>   (format_string): Same.
>   (format_plain): Same.
>   (get_int_range): Cast away constness.
>   (format_integer): Same.
>   (get_string_length): Call get_range_strlen_dynamic.  Handle
>   null lendata.maxbound.
>   (should_warn_p): Adjust argument scope qualifier.
>   (maybe_warn): Same.
>   (format_directive): Same.
>   (parse_directive): Same.
>   (is_call_safe): Same.
>   (try_substitute_return_value): Same.
>   (sprintf_dom_walker::handle_printf_call): Rename...
>   (handle_printf_call): ...to this.  Initialize target to host charmap
>   here instead of in pass_sprintf_length::execute.
>   (struct call_info): Make global.
>   (sprintf_dom_walker::compute_format_length): Make global.
>   (sprintf_dom_walker::handle_gimple_call): Same.
>   * passes.def (pass_sprintf_length): Replace with pass_strlen.
>   * print-rtl.c (print_pattern): Reduce the number of spaces to
>   avoid -Wformat-truncation.
>   * tree-pass.h (make_pass_warn_printf): New function.
>   * tree-ssa-strlen.c (strlen_optimize): New variable.
>   (get_string_length): Add comments.
>   (get_range_strlen_dynamic): New function.
>   (check_and_optimize_call): New function.
>   (handle_integral_assign): New function.
>   (strlen_check_and_optimize_stmt): Factor code out into
>   strlen_check_and_optimize_call and handle_integral_assign.
>   (strlen_dom_walker::evrp): New member.
>   (strlen_dom_walker::before_dom_children): Use evrp member.
>   (strlen_dom_walker::after_dom_children): Use evrp member.
>   (printf_strlen_execute): New function.
>   (pass_strlen::gate): Update to handle printf calls.
>   (dump_strlen_info): New function.
>   (pass_data_warn_printf): New variable.
>   (pass_warn_printf): New class.
>   * tree-ssa-strlen.h (get_range_strlen_dynamic): Declare.
>   (handle_printf_call): Same.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR c++/83431
>   * gcc.dg/strlenopt-63.c: New test.
>   * gcc.dg/pr79538.c: Adjust text of expected warning.
>   * gcc.dg/pr81292-1.c: Adjust pass name.
>   * gcc.dg/pr81292-2.c: Same.
>   * gcc.dg/pr81703.c: Same.
>   * gcc.dg/strcmpopt_2.c: Same.
>   * gcc.dg/strcmpopt_3.c: Same.
>   * gcc.dg/strcmpopt_4.c: Same.
>   * gcc.dg/strlenopt-1.c: Same.
>   * gcc.dg/strlenopt-10.c: Same.
>   * gcc.dg/strlenopt-11.c: Same.
>   * gcc.dg/strlenopt-13.c: Same.
>   * gcc.dg/strlenopt-14g.c: Same.
>   * gcc.dg/strlenopt-14gf.c: Same.
>   * gcc.dg/strlenopt-15.c: Same.
>   * gcc.dg/strlenopt-16g.c: Same.
>   * gcc.dg/strlenopt-17g.c: Same.
>   * gcc.dg/strlenopt-18g.c: Same.
>   * gcc.dg/strlenopt-19.c: Same.
>   * gcc.dg/strlenopt-1f.c: Same.
>   * gcc.dg/strlenopt-2.c: Same.
>   * gcc.dg/strlenopt-20.c: Same.
>   * gcc.dg/strlenopt-21.c: Same.
>   * gcc.dg/strlenopt-22.c: Same.
>   * gcc.dg/strlenopt-22g.c: Same.
>   * gcc.dg/strlenopt-24.c: Same.
>   * gcc.dg/strlenopt-25.c: Same.
>   * gcc.dg/strlenopt-26.c: Same.
>   * gcc.dg/strlenopt-27.c: Same.
>   * gcc.dg/strlenopt-28.c: Same.
>   * gcc.dg/strlenopt-29.c: Same.
>   * gcc.dg/strlenopt-2f.c: Same.
>   * gcc.dg/strlenopt-3.c: Same.
>   * gcc.dg/strlenopt-30.c: Same.
>   * gcc.dg/strlenopt-31g.c: Same.
>   * gcc.dg/strlenopt-32.c: Same.
>   * gcc.dg/strlenopt-33.c: Same.
>   * gcc.dg/strlenopt-33g.c: Same.
>   * gcc.dg/strlenopt-34.c: Same.
>   * gcc.dg/strlenopt-35.c: Same.
>   * gcc.dg/strlenopt-4.c: Same.
>   * gcc.dg/strlenopt-48.c: Same.
>   * gcc.dg/strlenopt-49.c: Same.
>   * gcc.dg/strlenopt-4g.c: Same.
>   * gcc.dg/strlenopt-4gf.c: Same.
>   * gcc.dg/strlenopt-5.c: Same.
>   * gcc.dg/strlenopt-50.c: Same.
>   * gcc.dg/strlenopt-51.c: Same.
>   * gcc.dg/strlenopt-52.c: Same.
>   * gcc.dg/strlenopt-53.c: Same.
>   * gcc.dg/strlenopt-54.c: Same.
>   * gcc.dg/strlenopt-55.c: Same.
>   * gcc.dg/strlenopt-56.c: Same.
>   * gcc.dg/strlenopt-6.c: Same.
>   * gcc.dg/strlenopt-61.c: Same.
>   * gcc.dg/strlenopt-7.c: Same.
>   * gcc.dg/strlenopt-8.c: Same.
>   * gcc.dg/strlenopt-9.c: Same.
>   * gcc.dg/strlenopt.h (snprintf, snprintf): Declare.
>   * gcc.dg/tree-ssa/builtin-snprin

Re: [PATCH] integrate sprintf pass into strlen (PR 83431)

2019-08-09 Thread Jeff Law
On 7/10/19 5:54 PM, Martin Sebor wrote:
>> So if I'm reading things correctly, it appears gimple-ssa-sprintf.c is
>> no longer a distinct pass.  Instead it co-exists with the strlen pass.
>> Right?
> 
> Yes.  strlen just calls into sprintf to handle the calls.
OK.  Just wanted to make sure I understood it's structure at the highest
level.


> 
>>> diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
>>> index a0934bcaf87..b05e4050f1d 100644
>>> --- a/gcc/gimple-ssa-sprintf.c
>>> +++ b/gcc/gimple-ssa-sprintf.c
>>> @@ -683,7 +618,7 @@ fmtresult::type_max_digits (tree type, int base)
>>>     static bool
>>>   get_int_range (tree, HOST_WIDE_INT *, HOST_WIDE_INT *, bool,
>>> HOST_WIDE_INT,
>>> -   class vr_values *vr_values);
>>> +   const class vr_values *vr_values);
>> FWIW, I think this is something *I* could do a lot better at.
>> Specifically I think we're not supposed to be writing the "class" here
>> and I'm not as good as I should be with marking things const.  Thanks
>> for cleaning up my lack of consts.
> 
> I think you did the best you could given the APIs you had to work
> with There's still plenty of room to improve const-correctness but
> it involves changing other APIs outsid strlen/sprintf.
I wasn't necessarily referring to any of the strlen/sprintf code when I
made that comment.  I was thinking more about DOM and jump threading
where I think I've got extraneous "class" all over the place.  And I
can't recall ever auditing for const-correctness.  Both are probably
worth fixing.


> 
>>> diff --git a/gcc/passes.def b/gcc/passes.def
>>> index 9a5b0cd554a..637e228f988 100644
>>> --- a/gcc/passes.def
>>> +++ b/gcc/passes.def
>>> @@ -42,7 +42,7 @@ along with GCC; see the file COPYING3.  If not see
>>>     NEXT_PASS (pass_build_cfg);
>>>     NEXT_PASS (pass_warn_function_return);
>>>     NEXT_PASS (pass_expand_omp);
>>> -  NEXT_PASS (pass_sprintf_length, false);
>>> +  NEXT_PASS (pass_strlen, false);
>> So this is something we discussed a bit on the phone.  This is very
>> early in the pipeline -- before we've gone into SSA form.
>>
>> I'm a bit concerned that we're running strlen that early without some
>> kind of auditing of whether or not the strlen pass can safely run that
>> early.  Similarly have we done any review for issues that might arise
>> from running strlen more than once?  I guess in some small way
>> encapsulating the state into a class like my unsubmitted patch does
>> would help there.
> 
> The strlen optimization machinery only runs once.  The code avoids
> running it when the pass is invoked early and only calls into sprintf
> to do format checking.
OK.  Thanks for clarifying.  That's probably why we have the someone
unusual gating tests.


> 
>>
>> More generally I think we concluded that the placement of sprintf this
>> early was driven by the early placement of walloca.  I don't want to
>> open a huge can of worms here, but do we really need to run this early
>> in the pipeline?
> 
> We decided to run it early when optimization is disabled because
> there's a good amount of code that can be checked even without
> ranges and string lengths (especially at the conservative level
> 2 setting when we consider the largest integers and array sizes
> instead of values or string lengths).
> 
> For example, this is diagnosed for the potential buffer overflow
> at -Wformat-overflow=2 even without optimization:
> 
>   char a[8], s[4];
> 
>   void f (int i)
>   {
>     __builtin_sprintf (a, "%s = %i", s, i);
>   }
> 
>   warning: ‘%i’ directive writing between 1 and 11 bytes into a region
> of size between 2 and 5 [-Wformat-overflow=]
That does sound familiar.  But ISTM in a non-optimized case we could
still just run the late one and get the warning.  It would seem the
problem with that is the late pass is inside the pass_all_optimizations
in passes.def.

We'd probably have to close the pass_all_optimizations, do the sprintf
checking, then open a new pass_all_optimizations_something for the rest
of the pipeline currently under pass_all_optimizations.

Seems out of scope for now, but worth remembering.


> 
>> Nit: Use NULL rather than null.  I think this happens in more than one
>> place in your patch.  Similarly I think we generally use NUL rather than
>> nul when referring to a single character.
> The term is a "null pointer."  NULL is a C macro that has in C++ 11
> been superseded by nullptr.  I don't mind using NUL character but
> I also don't think it matters.  No one will be confused about what
> either means.
It's more about existing conventions and consistency in the codebase.
We've used NULL to refer to "null pointer" for decades.

> It's easy enough to add here.  But I know I've introduced other
> algorithms that recurse on SSA_NAME_DEF_STMT, and I'm quite sure
> others predate those.  To make a difference I think we need to
> review at least the one most likely to be exposed to this problem
> and introduce the same limit there.  I could probably fix the

Re: [PATCH] integrate sprintf pass into strlen (PR 83431)

2019-08-09 Thread Jeff Law
On 7/1/19 7:47 PM, Martin Sebor wrote:
> 
> Jeff, I looked into your question/suggestion for me last week
> when we spoke, to introduce some sort of a recursion limit for
> get_range_strlen_dynamic.  It's easily doable but before we go
> down that path I did some testing to see how bad it can get and
> to compare it with get_range_strlen.  Here are the results for
> a few packages.  The dept is the maximum recursion depth, success
> and fail are the numbers of successful and failed calls to
> the function:
> 
>   binutils-gdb:
>   depth   success fail
>     get_range_strlen:   319  8302    21621
>     get_range_strlen_dynamic:    41  1503  161
>   gcc:
>     get_range_strlen:    46  7211    11365
>     get_range_strlen_dynamic:    23 10272   12
>   glibc:
>     get_range_strlen:    76  2840    11422
>     get_range_strlen_dynamic:    51  1186   46
>   elfutils:
>     get_range_strlen:    33  1198 2516
>     get_range_strlen_dynamic:    31   685   36
>   kernel:
>     get_range_strlen:    25  5299    11698
>     get_range_strlen_dynamic:    31  9911  122
> 
> Except the first case of get_range_strlen (I haven't looked into
> it yet), it doesn't seem too bad, and with just one exception it's
> better than get_range_strlen.  Let me know if you think it's worth
> adding a parameter (I assume we'd use it for both functions) and
> what to set it to.
So I know you've added a limiter, but just to close on this topic.  I
generally agree with Richi on this -- we've regularly seen instances
where these pathological cases occur in the wild.  For the most part I
wouldn't consider any of the codebases above to be good at finding those
pathological cases.  They are reasonably good for finding the inflection
point for diminishing gains though.

For codes which walk PHI nodes Bradly Lucier's testcases are often
useful (there's many in BZ in various states).  His testcases tend to
have both lots of PHI nodes and PHI nodes with many arguments.  I don't
recall if the depth of any given use-def chain in those tests are
terribly long though.

One could ponder going through the old compile-time hogs in the BZ
database and using that to build some kind of testsuite for these kinds
of issues.  I question if we could reliably run them from the dejagnu
suite without significant cost, but they might be useful when trying to
evaluate stuff like if we've got reasonable clamps on recursive walks,
absurd loop nests, etc.


jeff


Re: [PATCH] integrate sprintf pass into strlen (PR 83431)

2019-07-11 Thread Martin Sebor

...

We've already touched on the need to limit the backwards walk out of
this code.  Limiting based on the product of # phi args * number of phis
visited, recursion depth, or number of SSA_NAME_DEF_STMTs visited all
seem reasonable to me.

I do think Richi's suggestion for figuring out a suitable inflection
point is reasonable.


It's easy enough to add here.  But I know I've introduced other
algorithms that recurse on SSA_NAME_DEF_STMT, and I'm quite sure
others predate those.  To make a difference I think we need to
review at least the one most likely to be exposed to this problem
and introduce the same limit there.  I could probably fix the ones
I wrote reasonably quickly, but making the change to the others
would be much more of a project.  I looked to see how pervasive
this might be and here is just a small sampling of things that
jumped out at me in a quick grep search:

   *  compute_builtin_object_size (for _FORTIFY_SOURCE)
   *  compute_objsize (for -Wstringop-overflow)
   *  get_range_strlen
   *  maybe_fold_{and,or}_comparisons in gimple-fold.c
   *  -Warray-bounds (computing an offset into an object)
   *  -Wrestrict (computing an offset into an object)
   *  -Wreturn-local-addr (is_addr_local)
   *  -Wuninitialized (collect_phi_def_edges)


I don't see the recursion in maybe_fold_{and,or}_comparisons.


I didn't study any of these very carefully, I just looked for uses
of SSA_NAME_DEF_STMT around recursive calls.  same_bool_comparison_p
looks like it calls itself with that result.  I could be wrong.



The others all smell like they might be yours ;)  (besides object-size
maybe but that one is limited by having a cache - hopefully also
used when not used in the compute-everything mode).


It doesn't matter who introduced them.  What I'm saying is that
if the recursion is a problem it should be limited everywhere,
not just in one place.  A change with global scope like that would
best be made independently of other unrelated changes, to minimize
the risk of introducing bugs and then the difficulty of debugging
and fixing them.


Given how wide-spread this technique seems to be, if the recursion
is in fact a problem it's just as likely (if not more) to come up
in the folder or in BOS or some other place as it is here.  So if
it needs fixing it seems it should be done as its own project and
everywhere (or as close as we can get), and not as part of this
integration.


It's never an excuse to add new cases though and adding a limit
is _really_ simple.


An "excuse?"  I'm just explaining that I would prefer to introduce
the limit independently of the integration.

Sure, it is simple in that one place.  I said that above.  It's
also probably relatively simple in each of the other instances
(at least those I'm familiar with).  It's just cleaner and safer
to add it independently of other changes, that's all.

Anyway, I posted a simple patch to introduce the limit.  I'm also
changing the integrated pass to make use of it once it's committed.
If it's considered necessary I'm also willing to adjust the rest
of the algorithms I introduced to respect the limit.

Thanks
Martin


Re: [PATCH] integrate sprintf pass into strlen (PR 83431)

2019-07-11 Thread Jakub Jelinek
On Thu, Jul 11, 2019 at 11:11:58AM +0200, Richard Biener wrote:
> The others all smell like they might be yours ;)  (besides object-size
> maybe but that one is limited by having a cache - hopefully also
> used when not used in the compute-everything mode).

objsz doesn't really have a compute everything mode, it has a mode where the
caches are available and a mode where they aren't.  If they are available,
it will compute whatever it needs to and anything needed to compute that and
cache.

If they aren't available, for optimize > 0 it will just punt on
SSA_NAMEs (defers until it is done with the caches), for -O0 it looks
through SSA_NAME_DEF_STMTs of POINTER_PLUS_EXPR with constant as second
argument, and that one indeed is inbounded, so if one has a million of
POINTER_PLUS_EXPRs stmts, each adding 1 to the previous SSA_NAME, yes, it
will be a pain if every one of those SSA_NAMEs is then passed to some
__builtin_object_size too.  Guess we could add a limit there too, no reason
to look through more than a dozen or two at most.

Jakub


Re: [PATCH] integrate sprintf pass into strlen (PR 83431)

2019-07-11 Thread Richard Biener
On Thu, Jul 11, 2019 at 1:54 AM Martin Sebor  wrote:
>
> On 7/2/19 4:38 PM, Jeff Law wrote:
> > On 7/1/19 7:47 PM, Martin Sebor wrote:
> >> Attached is a more complete solution that fully resolves the bug
> >> report by avoiding a warning in cases like:
> >>
> >>char a[32], b[8];
> >>
> >>void f (void)
> >>{
> >>  if (strlen (a) < sizeof b - 2)
> >>snprintf (b, sizeof b, "b=%s", a); // no -Wformat-truncation
> >>}
> >>
> >> It does that by having get_range_strlen_dynamic use the EVRP
> >> information for non-constant strlen calls: EVRP has recorded
> >> that the result is less sizeof b - 2 and so the function
> >> returns this limited range of lengths to snprintf which can
> >> then avoid warning.  It also improves the checking and can
> >> find latent bugs it missed before (like the off-by-one in
> >> print-rtl.c).
> >>
> >> Besides improving the accuracy of the -Wformat-overflow and
> >> truncation warnings this can also result in better code.
> >> So far this only benefits snprintf but there may be other
> >> opportunities to string functions as well (e.g., strcmp or
> >> memcmp).
> >>
> >> Jeff, I looked into your question/suggestion for me last week
> >> when we spoke, to introduce some sort of a recursion limit for
> >> get_range_strlen_dynamic.  It's easily doable but before we go
> >> down that path I did some testing to see how bad it can get and
> >> to compare it with get_range_strlen.  Here are the results for
> >> a few packages.  The dept is the maximum recursion depth, success
> >> and fail are the numbers of successful and failed calls to
> >> the function:
> >>
> >>binutils-gdb:
> >>depth   success fail
> >>  get_range_strlen:   319  830221621
> >>  get_range_strlen_dynamic:41  1503  161
> >>gcc:
> >>  get_range_strlen:46  721111365
> >>  get_range_strlen_dynamic:23 10272   12
> >>glibc:
> >>  get_range_strlen:76  284011422
> >>  get_range_strlen_dynamic:51  1186   46
> >>elfutils:
> >>  get_range_strlen:33  1198 2516
> >>  get_range_strlen_dynamic:31   685   36
> >>kernel:
> >>  get_range_strlen:25  529911698
> >>  get_range_strlen_dynamic:31  9911  122
> >>
> >> Except the first case of get_range_strlen (I haven't looked into
> >> it yet), it doesn't seem too bad, and with just one exception it's
> >> better than get_range_strlen.  Let me know if you think it's worth
> >> adding a parameter (I assume we'd use it for both functions) and
> >> what to set it to.
> >>
> >> On 6/11/19 5:26 PM, Martin Sebor wrote:
> >>> The sprintf and strlen passes both work with strings but
> >>> run independently of one another and don't share state.  As
> >>> a result, lengths of strings dynamically created by functions
> >>> that are available to the strlen pass are not available to
> >>> sprintf.  Conversely, lengths of strings formatted by
> >>> the sprintf functions are not made available to the strlen
> >>> pass.  The result is less efficient code, poor diagnostics,
> >>> and ultimately less than optimal user experience.
> >>>
> >>> The attached patch is the first step toward rectifying this
> >>> design problem.  It integrates the two passes into one and
> >>> exposes the string length data managed by the strlen pass to
> >>> the sprintf "module."  (It does not expose any sprintf data
> >>> to the strlen pass yet.)
> >>>
> >>> The sprintf pass invocations in passes.def have been replaced
> >>> with those of strlen.  The first "early" invocation is only
> >>> effective for the sprintf module to enable warnings without
> >>> optimization.  The second invocation is "late" and enables
> >>> both warnings and the sprintf and strlen optimizations unless
> >>> explicitly disabled via -fno-optimize-strlen.
> >>>
> >>> Since the strlen optimization is the second invocation of
> >>> the pass tests that scan the strlen dump have been adjusted
> >>> to look for the "strlen2" dump file.
> >>>
> >>> The changes in the patch are mostly mechanical.  The one new
> >>> "feature" worth calling out is the get_range_strlen_dynamic
> >>> function.  It's analogous to get_range_strlen in gimple-fold.c
> >>> except that it makes use of the strlen "dynamically" obtained
> >>> string length info.  In cases when the info is not available
> >>> the former calls the latter.
> >>>
> >>> The other new functions in tree-ssa-strlen.c are
> >>> check_and_optimize_call and handle_integral_assign: I added
> >>> them only to keep the check_and_optimize_stmt function from
> >>> getting excessively long and hard to follow.  Otherwise,
> >>> the code in the functions is unchanged.
> >>>
> >>> There are a number of further enhancements to consider as
> >>> the next steps:
> >>>*  enhance the strlen module to determine string length range
> >>>   information f

Re: [PATCH] integrate sprintf pass into strlen (PR 83431)

2019-07-10 Thread Martin Sebor

On 7/2/19 4:38 PM, Jeff Law wrote:

On 7/1/19 7:47 PM, Martin Sebor wrote:

Attached is a more complete solution that fully resolves the bug
report by avoiding a warning in cases like:

   char a[32], b[8];

   void f (void)
   {
 if (strlen (a) < sizeof b - 2)
   snprintf (b, sizeof b, "b=%s", a); // no -Wformat-truncation
   }

It does that by having get_range_strlen_dynamic use the EVRP
information for non-constant strlen calls: EVRP has recorded
that the result is less sizeof b - 2 and so the function
returns this limited range of lengths to snprintf which can
then avoid warning.  It also improves the checking and can
find latent bugs it missed before (like the off-by-one in
print-rtl.c).

Besides improving the accuracy of the -Wformat-overflow and
truncation warnings this can also result in better code.
So far this only benefits snprintf but there may be other
opportunities to string functions as well (e.g., strcmp or
memcmp).

Jeff, I looked into your question/suggestion for me last week
when we spoke, to introduce some sort of a recursion limit for
get_range_strlen_dynamic.  It's easily doable but before we go
down that path I did some testing to see how bad it can get and
to compare it with get_range_strlen.  Here are the results for
a few packages.  The dept is the maximum recursion depth, success
and fail are the numbers of successful and failed calls to
the function:

   binutils-gdb:
   depth   success fail
 get_range_strlen:   319  830221621
 get_range_strlen_dynamic:41  1503  161
   gcc:
 get_range_strlen:46  721111365
 get_range_strlen_dynamic:23 10272   12
   glibc:
 get_range_strlen:76  284011422
 get_range_strlen_dynamic:51  1186   46
   elfutils:
 get_range_strlen:33  1198 2516
 get_range_strlen_dynamic:31   685   36
   kernel:
 get_range_strlen:25  529911698
 get_range_strlen_dynamic:31  9911  122

Except the first case of get_range_strlen (I haven't looked into
it yet), it doesn't seem too bad, and with just one exception it's
better than get_range_strlen.  Let me know if you think it's worth
adding a parameter (I assume we'd use it for both functions) and
what to set it to.

On 6/11/19 5:26 PM, Martin Sebor wrote:

The sprintf and strlen passes both work with strings but
run independently of one another and don't share state.  As
a result, lengths of strings dynamically created by functions
that are available to the strlen pass are not available to
sprintf.  Conversely, lengths of strings formatted by
the sprintf functions are not made available to the strlen
pass.  The result is less efficient code, poor diagnostics,
and ultimately less than optimal user experience.

The attached patch is the first step toward rectifying this
design problem.  It integrates the two passes into one and
exposes the string length data managed by the strlen pass to
the sprintf "module."  (It does not expose any sprintf data
to the strlen pass yet.)

The sprintf pass invocations in passes.def have been replaced
with those of strlen.  The first "early" invocation is only
effective for the sprintf module to enable warnings without
optimization.  The second invocation is "late" and enables
both warnings and the sprintf and strlen optimizations unless
explicitly disabled via -fno-optimize-strlen.

Since the strlen optimization is the second invocation of
the pass tests that scan the strlen dump have been adjusted
to look for the "strlen2" dump file.

The changes in the patch are mostly mechanical.  The one new
"feature" worth calling out is the get_range_strlen_dynamic
function.  It's analogous to get_range_strlen in gimple-fold.c
except that it makes use of the strlen "dynamically" obtained
string length info.  In cases when the info is not available
the former calls the latter.

The other new functions in tree-ssa-strlen.c are
check_and_optimize_call and handle_integral_assign: I added
them only to keep the check_and_optimize_stmt function from
getting excessively long and hard to follow.  Otherwise,
the code in the functions is unchanged.

There are a number of further enhancements to consider as
the next steps:
   *  enhance the strlen module to determine string length range
  information from integer variable to which it was previously
  assigned (necessary to fully resolve pr83431 and pr90625),
   *  make the sprintf/snprintf string length results directly
  available to the strlen module,
   *  enhance the sprintf module to optimize snprintf(0, 0, fmt)
  calls with fmt strings consisting solely of plain characters
  and %s directives to series of strlen calls on the arguments,
   *  and more.

Martin



gcc-83431.diff

PR tree-optimization/83431 - -Wformat-truncation may incorrectly report 
truncation

gcc/ChangeLog:

PR c++/83431
* gi

Re: [PATCH] integrate sprintf pass into strlen (PR 83431)

2019-07-10 Thread Martin Sebor

On 7/2/19 3:59 AM, Richard Biener wrote:

On Tue, Jul 2, 2019 at 3:48 AM Martin Sebor  wrote:


Attached is a more complete solution that fully resolves the bug
report by avoiding a warning in cases like:

char a[32], b[8];

void f (void)
{
  if (strlen (a) < sizeof b - 2)
snprintf (b, sizeof b, "b=%s", a); // no -Wformat-truncation
}

It does that by having get_range_strlen_dynamic use the EVRP
information for non-constant strlen calls: EVRP has recorded
that the result is less sizeof b - 2 and so the function
returns this limited range of lengths to snprintf which can
then avoid warning.  It also improves the checking and can
find latent bugs it missed before (like the off-by-one in
print-rtl.c).

Besides improving the accuracy of the -Wformat-overflow and
truncation warnings this can also result in better code.
So far this only benefits snprintf but there may be other
opportunities to string functions as well (e.g., strcmp or
memcmp).

Jeff, I looked into your question/suggestion for me last week
when we spoke, to introduce some sort of a recursion limit for
get_range_strlen_dynamic.  It's easily doable but before we go
down that path I did some testing to see how bad it can get and
to compare it with get_range_strlen.  Here are the results for
a few packages.  The dept is the maximum recursion depth, success
and fail are the numbers of successful and failed calls to
the function:

binutils-gdb:
depth   success fail
  get_range_strlen:   319  830221621
  get_range_strlen_dynamic:41  1503  161
gcc:
  get_range_strlen:46  721111365
  get_range_strlen_dynamic:23 10272   12
glibc:
  get_range_strlen:76  284011422
  get_range_strlen_dynamic:51  1186   46
elfutils:
  get_range_strlen:33  1198 2516
  get_range_strlen_dynamic:31   685   36
kernel:
  get_range_strlen:25  529911698
  get_range_strlen_dynamic:31  9911  122

Except the first case of get_range_strlen (I haven't looked into
it yet), it doesn't seem too bad, and with just one exception it's
better than get_range_strlen.  Let me know if you think it's worth
adding a parameter (I assume we'd use it for both functions) and
what to set it to.


I think we want to avoid adding code to GCC that might turn out
quadratic for artificial testcases. History tells us that eventually
somebody will find a real-world testcase that hits it.  I would
suggest to not place a limit on the depth but on the number
of SSA_NAME_DEF_STMTs visited.  Not sure to what this
would turn out but I guess using a limit around 500 would work?


None of my builds comes even close that number so setting the limit
wouldn't have any adverse impact on any of them.  But there are
quite a number of other algorithms in GCC that recursively chase
SSE_NAME_DEF_STMTs, including in gimple-fold.c.  It would be
interesting to see data for those.


For the data above what's the biggest depth / highest number
of stmts we visit when we are able to compute a useful limit
and what is it when including failure?  The individual number
of successes/fails are not too interesting.  Maybe even provide
a histogram for the successful cases depth/stmt count?


I changed the code to count SSA_NAMEs and PHI arguments (and do
other things).  The most I see for those are these in GDB/Binutils:

binutils/objcopy.c:5694:1: note: successful get_range_strlen 
(‘output_target’): depth:211, ssa_names:9, phi_args:207, 
result:{0,18446744073709551615,18446744073709551615}
binutils/objcopy.c:5694:1: note: successful get_range_strlen 
(‘input_target’): depth:104, ssa_names:4, phi_args:101, 
result:{0,18446744073709551615,18446744073709551615}
binutils/readelf.c:1152:1: note: successful get_range_strlen (‘rtype’): 
depth:189, ssa_names:73, phi_args:118, result:{10,18446744073709551615,25}
binutils/readelf.c:1152:1: note: successful get_range_strlen (‘rtype’): 
depth:201, ssa_names:77, phi_args:129, result:{10,18446744073709551615,25}


The first two work very hard only to find the string length and
array size are unbounded.  The latter two determine the string
is between 10 and 24 characters (i.e., it's stored in char[25]).

These are the only four with a three-digit depth.  There are
463 two-digit depths (ssa_names/phi args), and 9,642 single
digit ones in the GDB/Binutils build.

In GCC there are 92 two-digit depths and 17,448 single-digit.

Failures are usually very quick (with just one exception all
under 10 SSA_NAMEs and PHI args.)


Otherwise I like the merging - can you annotate the passes.def
entries with a comment indicating what the pass parameter
designates?  Like /* do optimize */?


Sure.



Since we have a parallelization GSoC project running not
adding more global state to passes would be nice as well,
strlen has a lot of it that could b

Re: [PATCH] integrate sprintf pass into strlen (PR 83431)

2019-07-02 Thread Jeff Law
On 7/1/19 7:47 PM, Martin Sebor wrote:
> Attached is a more complete solution that fully resolves the bug
> report by avoiding a warning in cases like:
> 
>   char a[32], b[8];
> 
>   void f (void)
>   {
> if (strlen (a) < sizeof b - 2)
>   snprintf (b, sizeof b, "b=%s", a); // no -Wformat-truncation
>   }
> 
> It does that by having get_range_strlen_dynamic use the EVRP
> information for non-constant strlen calls: EVRP has recorded
> that the result is less sizeof b - 2 and so the function
> returns this limited range of lengths to snprintf which can
> then avoid warning.  It also improves the checking and can
> find latent bugs it missed before (like the off-by-one in
> print-rtl.c).
> 
> Besides improving the accuracy of the -Wformat-overflow and
> truncation warnings this can also result in better code.
> So far this only benefits snprintf but there may be other
> opportunities to string functions as well (e.g., strcmp or
> memcmp).
> 
> Jeff, I looked into your question/suggestion for me last week
> when we spoke, to introduce some sort of a recursion limit for
> get_range_strlen_dynamic.  It's easily doable but before we go
> down that path I did some testing to see how bad it can get and
> to compare it with get_range_strlen.  Here are the results for
> a few packages.  The dept is the maximum recursion depth, success
> and fail are the numbers of successful and failed calls to
> the function:
> 
>   binutils-gdb:
>   depth   success fail
> get_range_strlen:   319  830221621
> get_range_strlen_dynamic:41  1503  161
>   gcc:
> get_range_strlen:46  721111365
> get_range_strlen_dynamic:23 10272   12
>   glibc:
> get_range_strlen:76  284011422
> get_range_strlen_dynamic:51  1186   46
>   elfutils:
> get_range_strlen:33  1198 2516
> get_range_strlen_dynamic:31   685   36
>   kernel:
> get_range_strlen:25  529911698
> get_range_strlen_dynamic:31  9911  122
> 
> Except the first case of get_range_strlen (I haven't looked into
> it yet), it doesn't seem too bad, and with just one exception it's
> better than get_range_strlen.  Let me know if you think it's worth
> adding a parameter (I assume we'd use it for both functions) and
> what to set it to.
> 
> On 6/11/19 5:26 PM, Martin Sebor wrote:
>> The sprintf and strlen passes both work with strings but
>> run independently of one another and don't share state.  As
>> a result, lengths of strings dynamically created by functions
>> that are available to the strlen pass are not available to
>> sprintf.  Conversely, lengths of strings formatted by
>> the sprintf functions are not made available to the strlen
>> pass.  The result is less efficient code, poor diagnostics,
>> and ultimately less than optimal user experience.
>>
>> The attached patch is the first step toward rectifying this
>> design problem.  It integrates the two passes into one and
>> exposes the string length data managed by the strlen pass to
>> the sprintf "module."  (It does not expose any sprintf data
>> to the strlen pass yet.)
>>
>> The sprintf pass invocations in passes.def have been replaced
>> with those of strlen.  The first "early" invocation is only
>> effective for the sprintf module to enable warnings without
>> optimization.  The second invocation is "late" and enables
>> both warnings and the sprintf and strlen optimizations unless
>> explicitly disabled via -fno-optimize-strlen.
>>
>> Since the strlen optimization is the second invocation of
>> the pass tests that scan the strlen dump have been adjusted
>> to look for the "strlen2" dump file.
>>
>> The changes in the patch are mostly mechanical.  The one new
>> "feature" worth calling out is the get_range_strlen_dynamic
>> function.  It's analogous to get_range_strlen in gimple-fold.c
>> except that it makes use of the strlen "dynamically" obtained
>> string length info.  In cases when the info is not available
>> the former calls the latter.
>>
>> The other new functions in tree-ssa-strlen.c are
>> check_and_optimize_call and handle_integral_assign: I added
>> them only to keep the check_and_optimize_stmt function from
>> getting excessively long and hard to follow.  Otherwise,
>> the code in the functions is unchanged.
>>
>> There are a number of further enhancements to consider as
>> the next steps:
>>   *  enhance the strlen module to determine string length range
>>  information from integer variable to which it was previously
>>  assigned (necessary to fully resolve pr83431 and pr90625),
>>   *  make the sprintf/snprintf string length results directly
>>  available to the strlen module,
>>   *  enhance the sprintf module to optimize snprintf(0, 0, fmt)
>>  calls with fmt strings consisting solely of plain characters
>>  and %s directives to series of strlen calls on the 

Re: [PATCH] integrate sprintf pass into strlen (PR 83431)

2019-07-02 Thread Jeff Law
On 7/2/19 3:59 AM, Richard Biener wrote:
> On Tue, Jul 2, 2019 at 3:48 AM Martin Sebor  wrote:
>>
>> Attached is a more complete solution that fully resolves the bug
>> report by avoiding a warning in cases like:
>>
>>char a[32], b[8];
>>
>>void f (void)
>>{
>>  if (strlen (a) < sizeof b - 2)
>>snprintf (b, sizeof b, "b=%s", a); // no -Wformat-truncation
>>}
>>
>> It does that by having get_range_strlen_dynamic use the EVRP
>> information for non-constant strlen calls: EVRP has recorded
>> that the result is less sizeof b - 2 and so the function
>> returns this limited range of lengths to snprintf which can
>> then avoid warning.  It also improves the checking and can
>> find latent bugs it missed before (like the off-by-one in
>> print-rtl.c).
>>
>> Besides improving the accuracy of the -Wformat-overflow and
>> truncation warnings this can also result in better code.
>> So far this only benefits snprintf but there may be other
>> opportunities to string functions as well (e.g., strcmp or
>> memcmp).
>>
>> Jeff, I looked into your question/suggestion for me last week
>> when we spoke, to introduce some sort of a recursion limit for
>> get_range_strlen_dynamic.  It's easily doable but before we go
>> down that path I did some testing to see how bad it can get and
>> to compare it with get_range_strlen.  Here are the results for
>> a few packages.  The dept is the maximum recursion depth, success
>> and fail are the numbers of successful and failed calls to
>> the function:
>>
>>binutils-gdb:
>>depth   success fail
>>  get_range_strlen:   319  830221621
>>  get_range_strlen_dynamic:41  1503  161
>>gcc:
>>  get_range_strlen:46  721111365
>>  get_range_strlen_dynamic:23 10272   12
>>glibc:
>>  get_range_strlen:76  284011422
>>  get_range_strlen_dynamic:51  1186   46
>>elfutils:
>>  get_range_strlen:33  1198 2516
>>  get_range_strlen_dynamic:31   685   36
>>kernel:
>>  get_range_strlen:25  529911698
>>  get_range_strlen_dynamic:31  9911  122
>>
>> Except the first case of get_range_strlen (I haven't looked into
>> it yet), it doesn't seem too bad, and with just one exception it's
>> better than get_range_strlen.  Let me know if you think it's worth
>> adding a parameter (I assume we'd use it for both functions) and
>> what to set it to.
> 
> I think we want to avoid adding code to GCC that might turn out
> quadratic for artificial testcases. History tells us that eventually
> somebody will find a real-world testcase that hits it.  I would
> suggest to not place a limit on the depth but on the number
> of SSA_NAME_DEF_STMTs visited.  Not sure to what this
> would turn out but I guess using a limit around 500 would work?
> For the data above what's the biggest depth / highest number
> of stmts we visit when we are able to compute a useful limit
> and what is it when including failure?  The individual number
> of successes/fails are not too interesting.  Maybe even provide
> a histogram for the successful cases depth/stmt count?
Martin and I were kicking this around last week based on some feedback
from Jakub.  I agree that we want to avoid quadratic, even for
artificial testcases.  Whatever the solution finally is, we probably
want to do something very similar if not the same in the patch to detect
returning addresses in the local stack.

In an ideal world we'd have some kind of generic walker with the
limiter; I suspect there's more instances of this backwards walk lurking.

> 
> Since we have a parallelization GSoC project running not
> adding more global state to passes would be nice as well,
> strlen has a lot of it that could be in a per invocation state
> class.  Not a requirement for this patch though.
I have an independent patch which does some of this.  It's from about
this time last year and had to be put on the back burner.  It pulls
ssa_ver_to_stridx, max_stridx, strinfo_pool, stridx_to_strinfo,
strlenloc, strlen_to_stridx and stridx_obstack into the class along with
a lot of the support routines.

I guess it's time to resurrect that work :-)


jeff


Re: [PATCH] integrate sprintf pass into strlen (PR 83431)

2019-07-02 Thread Richard Biener
On Tue, Jul 2, 2019 at 3:48 AM Martin Sebor  wrote:
>
> Attached is a more complete solution that fully resolves the bug
> report by avoiding a warning in cases like:
>
>char a[32], b[8];
>
>void f (void)
>{
>  if (strlen (a) < sizeof b - 2)
>snprintf (b, sizeof b, "b=%s", a); // no -Wformat-truncation
>}
>
> It does that by having get_range_strlen_dynamic use the EVRP
> information for non-constant strlen calls: EVRP has recorded
> that the result is less sizeof b - 2 and so the function
> returns this limited range of lengths to snprintf which can
> then avoid warning.  It also improves the checking and can
> find latent bugs it missed before (like the off-by-one in
> print-rtl.c).
>
> Besides improving the accuracy of the -Wformat-overflow and
> truncation warnings this can also result in better code.
> So far this only benefits snprintf but there may be other
> opportunities to string functions as well (e.g., strcmp or
> memcmp).
>
> Jeff, I looked into your question/suggestion for me last week
> when we spoke, to introduce some sort of a recursion limit for
> get_range_strlen_dynamic.  It's easily doable but before we go
> down that path I did some testing to see how bad it can get and
> to compare it with get_range_strlen.  Here are the results for
> a few packages.  The dept is the maximum recursion depth, success
> and fail are the numbers of successful and failed calls to
> the function:
>
>binutils-gdb:
>depth   success fail
>  get_range_strlen:   319  830221621
>  get_range_strlen_dynamic:41  1503  161
>gcc:
>  get_range_strlen:46  721111365
>  get_range_strlen_dynamic:23 10272   12
>glibc:
>  get_range_strlen:76  284011422
>  get_range_strlen_dynamic:51  1186   46
>elfutils:
>  get_range_strlen:33  1198 2516
>  get_range_strlen_dynamic:31   685   36
>kernel:
>  get_range_strlen:25  529911698
>  get_range_strlen_dynamic:31  9911  122
>
> Except the first case of get_range_strlen (I haven't looked into
> it yet), it doesn't seem too bad, and with just one exception it's
> better than get_range_strlen.  Let me know if you think it's worth
> adding a parameter (I assume we'd use it for both functions) and
> what to set it to.

I think we want to avoid adding code to GCC that might turn out
quadratic for artificial testcases. History tells us that eventually
somebody will find a real-world testcase that hits it.  I would
suggest to not place a limit on the depth but on the number
of SSA_NAME_DEF_STMTs visited.  Not sure to what this
would turn out but I guess using a limit around 500 would work?
For the data above what's the biggest depth / highest number
of stmts we visit when we are able to compute a useful limit
and what is it when including failure?  The individual number
of successes/fails are not too interesting.  Maybe even provide
a histogram for the successful cases depth/stmt count?

Otherwise I like the merging - can you annotate the passes.def
entries with a comment indicating what the pass parameter
designates?  Like /* do optimize */?

Since we have a parallelization GSoC project running not
adding more global state to passes would be nice as well,
strlen has a lot of it that could be in a per invocation state
class.  Not a requirement for this patch though.

Besides the limiting no further comments from me - I'll leave
the details to Jeff.

Thanks a lot for doing this!
Richard.

>
> On 6/11/19 5:26 PM, Martin Sebor wrote:
> > The sprintf and strlen passes both work with strings but
> > run independently of one another and don't share state.  As
> > a result, lengths of strings dynamically created by functions
> > that are available to the strlen pass are not available to
> > sprintf.  Conversely, lengths of strings formatted by
> > the sprintf functions are not made available to the strlen
> > pass.  The result is less efficient code, poor diagnostics,
> > and ultimately less than optimal user experience.
> >
> > The attached patch is the first step toward rectifying this
> > design problem.  It integrates the two passes into one and
> > exposes the string length data managed by the strlen pass to
> > the sprintf "module."  (It does not expose any sprintf data
> > to the strlen pass yet.)
> >
> > The sprintf pass invocations in passes.def have been replaced
> > with those of strlen.  The first "early" invocation is only
> > effective for the sprintf module to enable warnings without
> > optimization.  The second invocation is "late" and enables
> > both warnings and the sprintf and strlen optimizations unless
> > explicitly disabled via -fno-optimize-strlen.
> >
> > Since the strlen optimization is the second invocation of
> > the pass tests that scan the strlen dump have been adjusted
> > to look for the "str

Re: [PATCH] integrate sprintf pass into strlen (PR 83431)

2019-07-01 Thread Martin Sebor

Attached is a more complete solution that fully resolves the bug
report by avoiding a warning in cases like:

  char a[32], b[8];

  void f (void)
  {
if (strlen (a) < sizeof b - 2)
  snprintf (b, sizeof b, "b=%s", a); // no -Wformat-truncation
  }

It does that by having get_range_strlen_dynamic use the EVRP
information for non-constant strlen calls: EVRP has recorded
that the result is less sizeof b - 2 and so the function
returns this limited range of lengths to snprintf which can
then avoid warning.  It also improves the checking and can
find latent bugs it missed before (like the off-by-one in
print-rtl.c).

Besides improving the accuracy of the -Wformat-overflow and
truncation warnings this can also result in better code.
So far this only benefits snprintf but there may be other
opportunities to string functions as well (e.g., strcmp or
memcmp).

Jeff, I looked into your question/suggestion for me last week
when we spoke, to introduce some sort of a recursion limit for
get_range_strlen_dynamic.  It's easily doable but before we go
down that path I did some testing to see how bad it can get and
to compare it with get_range_strlen.  Here are the results for
a few packages.  The dept is the maximum recursion depth, success
and fail are the numbers of successful and failed calls to
the function:

  binutils-gdb:
  depth   success fail
get_range_strlen:   319  830221621
get_range_strlen_dynamic:41  1503  161
  gcc:
get_range_strlen:46  721111365
get_range_strlen_dynamic:23 10272   12
  glibc:
get_range_strlen:76  284011422
get_range_strlen_dynamic:51  1186   46
  elfutils:
get_range_strlen:33  1198 2516
get_range_strlen_dynamic:31   685   36
  kernel:
get_range_strlen:25  529911698
get_range_strlen_dynamic:31  9911  122

Except the first case of get_range_strlen (I haven't looked into
it yet), it doesn't seem too bad, and with just one exception it's
better than get_range_strlen.  Let me know if you think it's worth
adding a parameter (I assume we'd use it for both functions) and
what to set it to.

On 6/11/19 5:26 PM, Martin Sebor wrote:

The sprintf and strlen passes both work with strings but
run independently of one another and don't share state.  As
a result, lengths of strings dynamically created by functions
that are available to the strlen pass are not available to
sprintf.  Conversely, lengths of strings formatted by
the sprintf functions are not made available to the strlen
pass.  The result is less efficient code, poor diagnostics,
and ultimately less than optimal user experience.

The attached patch is the first step toward rectifying this
design problem.  It integrates the two passes into one and
exposes the string length data managed by the strlen pass to
the sprintf "module."  (It does not expose any sprintf data
to the strlen pass yet.)

The sprintf pass invocations in passes.def have been replaced
with those of strlen.  The first "early" invocation is only
effective for the sprintf module to enable warnings without
optimization.  The second invocation is "late" and enables
both warnings and the sprintf and strlen optimizations unless
explicitly disabled via -fno-optimize-strlen.

Since the strlen optimization is the second invocation of
the pass tests that scan the strlen dump have been adjusted
to look for the "strlen2" dump file.

The changes in the patch are mostly mechanical.  The one new
"feature" worth calling out is the get_range_strlen_dynamic
function.  It's analogous to get_range_strlen in gimple-fold.c
except that it makes use of the strlen "dynamically" obtained
string length info.  In cases when the info is not available
the former calls the latter.

The other new functions in tree-ssa-strlen.c are
check_and_optimize_call and handle_integral_assign: I added
them only to keep the check_and_optimize_stmt function from
getting excessively long and hard to follow.  Otherwise,
the code in the functions is unchanged.

There are a number of further enhancements to consider as
the next steps:
  *  enhance the strlen module to determine string length range
     information from integer variable to which it was previously
     assigned (necessary to fully resolve pr83431 and pr90625),
  *  make the sprintf/snprintf string length results directly
     available to the strlen module,
  *  enhance the sprintf module to optimize snprintf(0, 0, fmt)
     calls with fmt strings consisting solely of plain characters
     and %s directives to series of strlen calls on the arguments,
  *  and more.

Martin


PR tree-optimization/83431 - -Wformat-truncation may incorrectly report truncation

gcc/ChangeLog:

	PR c++/83431
	* gimple-ssa-sprintf.c (pass_data_sprintf_length): Remove object.
	(sprintf_dom_walker): Remove class.
	(get_int_range): Make argument const.
	

[PATCH] integrate sprintf pass into strlen (PR 83431)

2019-06-11 Thread Martin Sebor

The sprintf and strlen passes both work with strings but
run independently of one another and don't share state.  As
a result, lengths of strings dynamically created by functions
that are available to the strlen pass are not available to
sprintf.  Conversely, lengths of strings formatted by
the sprintf functions are not made available to the strlen
pass.  The result is less efficient code, poor diagnostics,
and ultimately less than optimal user experience.

The attached patch is the first step toward rectifying this
design problem.  It integrates the two passes into one and
exposes the string length data managed by the strlen pass to
the sprintf "module."  (It does not expose any sprintf data
to the strlen pass yet.)

The sprintf pass invocations in passes.def have been replaced
with those of strlen.  The first "early" invocation is only
effective for the sprintf module to enable warnings without
optimization.  The second invocation is "late" and enables
both warnings and the sprintf and strlen optimizations unless
explicitly disabled via -fno-optimize-strlen.

Since the strlen optimization is the second invocation of
the pass tests that scan the strlen dump have been adjusted
to look for the "strlen2" dump file.

The changes in the patch are mostly mechanical.  The one new
"feature" worth calling out is the get_range_strlen_dynamic
function.  It's analogous to get_range_strlen in gimple-fold.c
except that it makes use of the strlen "dynamically" obtained
string length info.  In cases when the info is not available
the former calls the latter.

The other new functions in tree-ssa-strlen.c are
check_and_optimize_call and handle_integral_assign: I added
them only to keep the check_and_optimize_stmt function from
getting excessively long and hard to follow.  Otherwise,
the code in the functions is unchanged.

There are a number of further enhancements to consider as
the next steps:
 *  enhance the strlen module to determine string length range
information from integer variable to which it was previously
assigned (necessary to fully resolve pr83431 and pr90625),
 *  make the sprintf/snprintf string length results directly
available to the strlen module,
 *  enhance the sprintf module to optimize snprintf(0, 0, fmt)
calls with fmt strings consisting solely of plain characters
and %s directives to series of strlen calls on the arguments,
 *  and more.

Martin
PR tree-optimization/83431 - -Wformat-truncation may incorrectly report truncation

gcc/ChangeLog:

	PR c++/83431
	* gimple-ssa-sprintf.c (pass_data_sprintf_length): Remove object.
	(sprintf_dom_walker): Remove class.
	(get_int_range): Make argument const.
	(directive::fmtfunc, directive::set_precision): Same.
	(format_none): Same.
	(build_intmax_type_nodes): Same.
	(adjust_range_for_overflow): Same.
	(format_floating): Same.
	(format_character): Same.
	(format_string): Same.
	(format_plain): Same.
	(get_int_range): Cast away constness..
	(format_integer): Same.
	(get_string_length): Call get_range_strlen_dynamic.  Handle
	null lendata.maxbound.
	(should_warn_p): Adjust argument scope qualifier.
	(maybe_warn): Same.
	(format_directive): Same.
	(parse_directive): Same.
	(is_call_safe): Same.
	(try_substitute_return_value): Same.
	(sprintf_dom_walker::handle_printf_call): Rename...
	(handle_printf_call): to this.  Initialize target to host charmap
	here instead of in pass_sprintf_length::execute.
	(struct call_info): Make global.
	(sprintf_dom_walker::compute_format_length): Make global.
	(sprintf_dom_walker::handle_gimple_call): Same.
	* passes.def (pass_sprintf_length): Replace with pass_strlen.
	* tree-ssa-strlen.c (strlen_optimize): New variable.
	(get_string_length): Add comments.
	(get_range_strlen_dynamic): New functions.
	(check_and_optimize_call): New function.
	(handle_integral_assign): New function.
	(strlen_check_and_optimize_stmt): Rename...
	(check_and_optimize_stmt): ...to this.  Factor code out into
	check_and_optimize_call and handle_integral_assign.
	(strlen_dom_walker::evrp): New member.
	(strlen_dom_walker::before_dom_children): Use evrp member.
	(strlen_dom_walker::after_dom_children): Use evrp member.
	(pass_data_strlen): Remove property not satisfied during an early run.
	(pass_strlen::do_optimize): New data member.
	(pass_strlen::set_pass_param): New member function.
	(pass_strlen::gate): Update to handle printf calls.
	(pass_strlen::execute): Initialize loop and scev optimizers.
	* tree-ssa-strlen.h (get_range_strlen_dynamic): Declare.
	(handle_printf_call): Same.

gcc/testsuite/ChangeLog:

	PR c++/83431
	* gcc.dg/strlenopt-63.c: New test.
	* gcc.dg/strlenopt-1.c: Adjust pass name.
	* gcc.dg/strlenopt-10.c: Same.
	* gcc.dg/strlenopt-11.c: Same.
	* gcc.dg/strlenopt-13.c: Same.
	* gcc.dg/strlenopt-14g.c: Same.
	* gcc.dg/strlenopt-14gf.c: Same.
	* gcc.dg/strlenopt-15.c: Same.
	* gcc.dg/strlenopt-16g.c: Same.
	* gcc.dg/strlenopt-17g.c: Same.
	* gcc.dg/strlenopt-18g.c: Same.
	* gcc.dg/strlenopt-19.c: Same.
	* gcc.dg/st