Re: [RFC][PATCH] Speed-up use-after-scope (re-writing to SSA)

2016-12-13 Thread Richard Biener
On Thu, Dec 8, 2016 at 1:51 PM, Martin Liška  wrote:
> On 12/02/2016 01:29 PM, Richard Biener wrote:
>> On Thu, Dec 1, 2016 at 5:30 PM, Martin Liška  wrote:
>>> On 11/23/2016 03:13 PM, Jakub Jelinek wrote:
 On Wed, Nov 23, 2016 at 02:57:07PM +0100, Martin Liška wrote:
> I started review process in libsanitizer: https://reviews.llvm.org/D26965
> And I have a question that was asked in the review: can we distinguish 
> between load and store
> in case of having usage of ASAN_POISON?

 I think with ASAN_POISON it is indeed just loads from after scope that can
 be caught, a store overwrites the variable with a new value and when 
 turning
 the store after we make the var no longer addressable into SSA form, we
 loose information about the out of scope store.  Furthermore, if there is
 first a store and then a read, like:
   if (argc != 12312)
 {
   char my_char;
   ptr = _char;
 }
   *ptr = i + 26;
   return *ptr;
 we don't notice even the read.  Not sure what could be done against that
 though.  I think we'd need to hook into the into-ssa framework, there it
 should know the current value of the variable at the point of the store is
 result of ASAN_POISON and be able to instead of turning that
   my_char = _23;
 into
   my_char_35 = _23;
 turn it into:
   my_char_35 = ASAN_POISON (_23);
 which would represent after scope store into my_char.

 Not really familiar with into-ssa though to know where to do it.

   Jakub

>>>
>>> Richi, may I ask you for help with this question?
>>
>> Probably where we handle the CLOBBER case (rewrite_stmt, maybe_register_def),
>> we do this for -Wuninitialized.
>>
>> Richard.
>
> Thanks for the tip, however as the optimization of memory address store + 
> load happens
> before we rewrite my_char into SSA, it would be probably hard to guess which 
> memory
> stores and loads should be preserved:
>
> use-after-scope-20.c.032t.ccp1:
> main (int argc, char * * argv)
> {
>   int my_char;
>   int * ptr;
>   int _1;
>   int _11;
>
>[0.0%]:
>   if (argc_4(D) != 12312)
> goto ; [0.0%]
>   else
> goto ; [0.0%]
>
>[0.0%]:
>   ASAN_MARK (2, _char, 4);
>   ptr_8 = _char;
>   ASAN_MARK (1, _char, 4);
>
>[0.0%]:
>   # ptr_2 = PHI 
>   _1 = argc_4(D) + 26;
>   *ptr_2 = _1;
>   _11 = *ptr_2;
>   return _11;
>
> }

The SSA renamer sees

   my_char = ASAN_MARK;
   ptr_8 = _char;
   my_char = ASAN_MARK;

?

It does perform a DOM walk when updating the stmts so simply registering the
appropriate current def should do the trick?

> I sent updated version of patch to LLVM phabricator:
> https://reviews.llvm.org/D26965
>
> Hopefully we can cherry pick the patch very soon to our trunk.
>
> M.
>
>>
>>> Thanks,
>>> Martin
>>>
>


Re: [RFC][PATCH] Speed-up use-after-scope (re-writing to SSA)

2016-12-08 Thread Martin Liška
On 12/02/2016 01:29 PM, Richard Biener wrote:
> On Thu, Dec 1, 2016 at 5:30 PM, Martin Liška  wrote:
>> On 11/23/2016 03:13 PM, Jakub Jelinek wrote:
>>> On Wed, Nov 23, 2016 at 02:57:07PM +0100, Martin Liška wrote:
 I started review process in libsanitizer: https://reviews.llvm.org/D26965
 And I have a question that was asked in the review: can we distinguish 
 between load and store
 in case of having usage of ASAN_POISON?
>>>
>>> I think with ASAN_POISON it is indeed just loads from after scope that can
>>> be caught, a store overwrites the variable with a new value and when turning
>>> the store after we make the var no longer addressable into SSA form, we
>>> loose information about the out of scope store.  Furthermore, if there is
>>> first a store and then a read, like:
>>>   if (argc != 12312)
>>> {
>>>   char my_char;
>>>   ptr = _char;
>>> }
>>>   *ptr = i + 26;
>>>   return *ptr;
>>> we don't notice even the read.  Not sure what could be done against that
>>> though.  I think we'd need to hook into the into-ssa framework, there it
>>> should know the current value of the variable at the point of the store is
>>> result of ASAN_POISON and be able to instead of turning that
>>>   my_char = _23;
>>> into
>>>   my_char_35 = _23;
>>> turn it into:
>>>   my_char_35 = ASAN_POISON (_23);
>>> which would represent after scope store into my_char.
>>>
>>> Not really familiar with into-ssa though to know where to do it.
>>>
>>>   Jakub
>>>
>>
>> Richi, may I ask you for help with this question?
> 
> Probably where we handle the CLOBBER case (rewrite_stmt, maybe_register_def),
> we do this for -Wuninitialized.
> 
> Richard.

Thanks for the tip, however as the optimization of memory address store + load 
happens
before we rewrite my_char into SSA, it would be probably hard to guess which 
memory
stores and loads should be preserved:

use-after-scope-20.c.032t.ccp1:
main (int argc, char * * argv)
{
  int my_char;
  int * ptr;
  int _1;
  int _11;

   [0.0%]:
  if (argc_4(D) != 12312)
goto ; [0.0%]
  else
goto ; [0.0%]

   [0.0%]:
  ASAN_MARK (2, _char, 4);
  ptr_8 = _char;
  ASAN_MARK (1, _char, 4);

   [0.0%]:
  # ptr_2 = PHI 
  _1 = argc_4(D) + 26;
  *ptr_2 = _1;
  _11 = *ptr_2;
  return _11;

}

I sent updated version of patch to LLVM phabricator:
https://reviews.llvm.org/D26965

Hopefully we can cherry pick the patch very soon to our trunk.

M.

> 
>> Thanks,
>> Martin
>>



Re: [RFC][PATCH] Speed-up use-after-scope (re-writing to SSA)

2016-12-02 Thread Richard Biener
On Thu, Dec 1, 2016 at 5:30 PM, Martin Liška  wrote:
> On 11/23/2016 03:13 PM, Jakub Jelinek wrote:
>> On Wed, Nov 23, 2016 at 02:57:07PM +0100, Martin Liška wrote:
>>> I started review process in libsanitizer: https://reviews.llvm.org/D26965
>>> And I have a question that was asked in the review: can we distinguish 
>>> between load and store
>>> in case of having usage of ASAN_POISON?
>>
>> I think with ASAN_POISON it is indeed just loads from after scope that can
>> be caught, a store overwrites the variable with a new value and when turning
>> the store after we make the var no longer addressable into SSA form, we
>> loose information about the out of scope store.  Furthermore, if there is
>> first a store and then a read, like:
>>   if (argc != 12312)
>> {
>>   char my_char;
>>   ptr = _char;
>> }
>>   *ptr = i + 26;
>>   return *ptr;
>> we don't notice even the read.  Not sure what could be done against that
>> though.  I think we'd need to hook into the into-ssa framework, there it
>> should know the current value of the variable at the point of the store is
>> result of ASAN_POISON and be able to instead of turning that
>>   my_char = _23;
>> into
>>   my_char_35 = _23;
>> turn it into:
>>   my_char_35 = ASAN_POISON (_23);
>> which would represent after scope store into my_char.
>>
>> Not really familiar with into-ssa though to know where to do it.
>>
>>   Jakub
>>
>
> Richi, may I ask you for help with this question?

Probably where we handle the CLOBBER case (rewrite_stmt, maybe_register_def),
we do this for -Wuninitialized.

Richard.

> Thanks,
> Martin
>


Re: [RFC][PATCH] Speed-up use-after-scope (re-writing to SSA)

2016-12-01 Thread Martin Liška
On 11/23/2016 03:13 PM, Jakub Jelinek wrote:
> On Wed, Nov 23, 2016 at 02:57:07PM +0100, Martin Liška wrote:
>> I started review process in libsanitizer: https://reviews.llvm.org/D26965
>> And I have a question that was asked in the review: can we distinguish 
>> between load and store
>> in case of having usage of ASAN_POISON?
> 
> I think with ASAN_POISON it is indeed just loads from after scope that can
> be caught, a store overwrites the variable with a new value and when turning
> the store after we make the var no longer addressable into SSA form, we
> loose information about the out of scope store.  Furthermore, if there is
> first a store and then a read, like:
>   if (argc != 12312)
> {
>   char my_char;
>   ptr = _char;
> }
>   *ptr = i + 26;
>   return *ptr;
> we don't notice even the read.  Not sure what could be done against that
> though.  I think we'd need to hook into the into-ssa framework, there it
> should know the current value of the variable at the point of the store is
> result of ASAN_POISON and be able to instead of turning that
>   my_char = _23;
> into
>   my_char_35 = _23;
> turn it into:
>   my_char_35 = ASAN_POISON (_23);
> which would represent after scope store into my_char.
> 
> Not really familiar with into-ssa though to know where to do it.
> 
>   Jakub
> 

Richi, may I ask you for help with this question?

Thanks,
Martin



Re: [RFC][PATCH] Speed-up use-after-scope (re-writing to SSA)

2016-11-23 Thread Jakub Jelinek
On Wed, Nov 23, 2016 at 02:57:07PM +0100, Martin Liška wrote:
> I started review process in libsanitizer: https://reviews.llvm.org/D26965
> And I have a question that was asked in the review: can we distinguish 
> between load and store
> in case of having usage of ASAN_POISON?

I think with ASAN_POISON it is indeed just loads from after scope that can
be caught, a store overwrites the variable with a new value and when turning
the store after we make the var no longer addressable into SSA form, we
loose information about the out of scope store.  Furthermore, if there is
first a store and then a read, like:
  if (argc != 12312)
{
  char my_char;
  ptr = _char;
}
  *ptr = i + 26;
  return *ptr;
we don't notice even the read.  Not sure what could be done against that
though.  I think we'd need to hook into the into-ssa framework, there it
should know the current value of the variable at the point of the store is
result of ASAN_POISON and be able to instead of turning that
  my_char = _23;
into
  my_char_35 = _23;
turn it into:
  my_char_35 = ASAN_POISON (_23);
which would represent after scope store into my_char.

Not really familiar with into-ssa though to know where to do it.

Jakub


Re: [RFC][PATCH] Speed-up use-after-scope (re-writing to SSA)

2016-11-23 Thread Martin Liška
I started review process in libsanitizer: https://reviews.llvm.org/D26965
And I have a question that was asked in the review: can we distinguish between 
load and store
in case of having usage of ASAN_POISON?

Load looks as follows:

int
main (int argc, char **argv)
{
  char *ptr;

  if (argc != 12312)
  {
char my_char;
ptr = _char;
  }

  return *ptr;
}

main (int argc, char * * argv)
{
  char my_char;
  int _5;

  :
  if (argc_1(D) != 12312)
goto ;
  else
goto ;

  :
  goto ;

  :
  my_char_8 = ASAN_POISON ();

  :
  # my_char_6 = PHI 
  _5 = (int) my_char_6;
  return _5;

}

however doing a store:
int
main (int argc, char **argv)
{
  char *ptr;

  if (argc != 12312)
  {
char my_char;
ptr = _char;
  }

  *ptr = 0;
  return 0;
}

main (int argc, char * * argv)
{
  :
  if (argc_1(D) != 12312)
goto ;
  else
goto ;

  :
  goto ;

  :
  ASAN_POISON ();

  :
  return 0;

}

leads to a situation, where LHS of ASAN_POISON assignment is identified as 
overwritten and eventually
we see just ASAN_POISON call. This is currently removed in sanopt pass, but I'm 
wondering whether it's
valid optimization or not in this context?

Thanks,
Martin


Re: [RFC][PATCH] Speed-up use-after-scope (re-writing to SSA)

2016-11-22 Thread Martin Liška
On 11/16/2016 05:28 PM, Jakub Jelinek wrote:
> On Wed, Nov 16, 2016 at 05:01:31PM +0100, Martin Liška wrote:
>> +  use_operand_p use_p;
>> +  imm_use_iterator imm_iter;
>> +  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, poisoned_var)
>> +{
>> +  gimple *use = USE_STMT (use_p);
>> +  if (is_gimple_debug (use))
>> +continue;
>> +
>> +  built_in_function b = (recover_p
>> + ? BUILT_IN_ASAN_REPORT_USE_AFTER_SCOPE_NOABORT
>> + : BUILT_IN_ASAN_REPORT_USE_AFTER_SCOPE);
>> +  tree fun = builtin_decl_implicit (b);
>> +  pretty_printer pp;
>> +  pp_tree_identifier (, DECL_NAME (var_decl));
>> +
>> +  gcall *call = gimple_build_call (fun, 2, asan_pp_string (),
>> +   DECL_SIZE_UNIT (var_decl));
>> +  gimple_set_location (call, gimple_location (use));
>> +
>> +  /* The USE can be a gimple PHI node.  If so, insert the call on
>> + all edges leading to the PHI node.  */
>> +  if (is_a  (use))
>> +{
>> +  gphi * phi = dyn_cast (use);
> 
> No space after *.

Done.

> 
>> +  for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
>> +if (gimple_phi_arg_def (phi, i) == poisoned_var)
>> +  {
>> +edge e = gimple_phi_arg_edge (phi, i);
>> +gsi_insert_seq_on_edge (e, call);
>> +*need_commit_edge_insert = true;
> 
> You clearly don't have a sufficient testsuite coverage for this,
> because this won't really work if you have more than one phi
> argument equal to poisoned_var.  Inserting the same gimple stmt
> into multiple places can't really work.  I bet you want to set
> call to NULL after the gsi_insert_seq_on_edge and before that
> call if (call == NULL) { call = gimple_build_call (...); gimple_set_location 
> (...); }
> Or maybe gimple_copy for the 2nd etc. would work too, dunno.

I see, fixed by using gimple_copy functionality.

> 
>> +  }
>> +}
>> +  else
>> +{
>> +  gimple_stmt_iterator gsi = gsi_for_stmt (use);
>> +  gsi_insert_before (, call, GSI_NEW_STMT);
>> +}
>> +}
>> +
>> +  gimple *nop = gimple_build_nop ();
>> +  SSA_NAME_IS_DEFAULT_DEF (poisoned_var) = true;
>> +  SSA_NAME_DEF_STMT (poisoned_var) = nop;
>> +  gsi_replace (iter, nop, GSI_NEW_STMT);
> 
> The last argument of gsi_replace is a bool, not GSI_*.
> But not sure how this will work anyway, I think SSA_NAME_IS_DEFAULT_DEF
> are supposed to have SSA_NAME_DEF_STMT a GIMPLE_NOP that doesn't
> have bb set, while you are putting it into the stmt sequence.
> Shouldn't you just gsi_remove iter instead?

gsi_remove does not work as a SSA name would lost a defining statement. However
setting SSA_NAME_DEF_STMT (poisoned_var) = gimple_build_nop () and removing the 
stmt
works fine. I haven't known that it can't belong to a BB.

Maybe we can add a verifier for that?

diff --git a/gcc/tree-ssa.c b/gcc/tree-ssa.c
index 2d9c62d..8fd4e91 100644
--- a/gcc/tree-ssa.c
+++ b/gcc/tree-ssa.c
@@ -767,6 +767,15 @@ verify_ssa_name (tree ssa_name, bool is_virtual)
   return true;
 }
 
+  if (SSA_NAME_IS_DEFAULT_DEF (ssa_name)
+  && gimple_nop_p (SSA_NAME_DEF_STMT (ssa_name))
+  && gimple_bb (SSA_NAME_DEF_STMT (ssa_name)) != NULL)
+{
+  error ("defining statement of a default name can't belong to a basic "
+"block");
+  return true;
+}
+
   return false;
 }

That can be eventually done independently.

> 
> Otherwise LGTM, but please post the asan patch to llvm-commits
> or through their web review interface.

Good, I'm going to insert the patch to the tool.

Thanks,
Martin

> 
>   Jakub
> 

>From da190584d091eaaa509067918de4f1f77e887484 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 14 Nov 2016 16:49:05 +0100
Subject: [PATCH] use-after-scope: introduce ASAN_POISON internal fn

gcc/ChangeLog:

2016-11-16  Martin Liska  

	* asan.c (asan_expand_poison_ifn): New function.
	* asan.h (asan_expand_poison_ifn): Declare the function.
	* internal-fn.c (expand_ASAN_POISON): New function.
	* internal-fn.def (ASAN_POISON): New internal fn.
	* sanitizer.def (BUILT_IN_ASAN_REPORT_USE_AFTER_SCOPE_NOABORT):
	New built-in.
	(BUILT_IN_ASAN_REPORT_USE_AFTER_SCOPE): Likewise.
	* sanopt.c (pass_sanopt::execute): Expand IFN_ASAN_POISON.
	* tree-ssa.c (is_asan_mark_p): New function.
	(execute_update_addresses_taken): Make local variables as not
	addressable if address of these varibles is just taken by
	ASAN_MARK.

gcc/testsuite/ChangeLog:

2016-11-16  Martin Liska  

	* gcc.dg/asan/use-after-scope-3.c: Run just with -O0.
	* gcc.dg/asan/use-after-scope-9.c: Run just with -O2 and
	change expected output.
---
 gcc/asan.c| 77 ++-
 gcc/asan.h|  1 +
 gcc/internal-fn.c |  7 +++
 gcc/internal-fn.def   |  1 +
 gcc/sanitizer.def   

Re: [RFC][PATCH] Speed-up use-after-scope (re-writing to SSA)

2016-11-16 Thread Jakub Jelinek
On Wed, Nov 16, 2016 at 05:01:31PM +0100, Martin Liška wrote:
> +  use_operand_p use_p;
> +  imm_use_iterator imm_iter;
> +  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, poisoned_var)
> +{
> +  gimple *use = USE_STMT (use_p);
> +  if (is_gimple_debug (use))
> + continue;
> +
> +  built_in_function b = (recover_p
> +  ? BUILT_IN_ASAN_REPORT_USE_AFTER_SCOPE_NOABORT
> +  : BUILT_IN_ASAN_REPORT_USE_AFTER_SCOPE);
> +  tree fun = builtin_decl_implicit (b);
> +  pretty_printer pp;
> +  pp_tree_identifier (, DECL_NAME (var_decl));
> +
> +  gcall *call = gimple_build_call (fun, 2, asan_pp_string (),
> +DECL_SIZE_UNIT (var_decl));
> +  gimple_set_location (call, gimple_location (use));
> +
> +  /* The USE can be a gimple PHI node.  If so, insert the call on
> +  all edges leading to the PHI node.  */
> +  if (is_a  (use))
> + {
> +   gphi * phi = dyn_cast (use);

No space after *.

> +   for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> + if (gimple_phi_arg_def (phi, i) == poisoned_var)
> +   {
> + edge e = gimple_phi_arg_edge (phi, i);
> + gsi_insert_seq_on_edge (e, call);
> + *need_commit_edge_insert = true;

You clearly don't have a sufficient testsuite coverage for this,
because this won't really work if you have more than one phi
argument equal to poisoned_var.  Inserting the same gimple stmt
into multiple places can't really work.  I bet you want to set
call to NULL after the gsi_insert_seq_on_edge and before that
call if (call == NULL) { call = gimple_build_call (...); gimple_set_location 
(...); }
Or maybe gimple_copy for the 2nd etc. would work too, dunno.

> +   }
> + }
> +  else
> + {
> +   gimple_stmt_iterator gsi = gsi_for_stmt (use);
> +   gsi_insert_before (, call, GSI_NEW_STMT);
> + }
> +}
> +
> +  gimple *nop = gimple_build_nop ();
> +  SSA_NAME_IS_DEFAULT_DEF (poisoned_var) = true;
> +  SSA_NAME_DEF_STMT (poisoned_var) = nop;
> +  gsi_replace (iter, nop, GSI_NEW_STMT);

The last argument of gsi_replace is a bool, not GSI_*.
But not sure how this will work anyway, I think SSA_NAME_IS_DEFAULT_DEF
are supposed to have SSA_NAME_DEF_STMT a GIMPLE_NOP that doesn't
have bb set, while you are putting it into the stmt sequence.
Shouldn't you just gsi_remove iter instead?

Otherwise LGTM, but please post the asan patch to llvm-commits
or through their web review interface.

Jakub


Re: [RFC][PATCH] Speed-up use-after-scope (re-writing to SSA)

2016-11-16 Thread Martin Liška
As the patch quite significantly slowed down tramp3d, there's analysis
of # of variables which are poisoned by the sanitizer:

== normal variables ==
   24 B:  348x (5.80%)
   16 B:  273x (4.55%)
8 B:  237x (3.95%)
1 B:  177x (2.95%)
4 B:  119x (1.98%)
   40 B:   89x (1.48%)
  144 B:   83x (1.38%)

== C++ artifical variables ==
1 B: 1325x (22.08%)
8 B:  983x (16.38%)
   24 B:  586x (9.77%)
  144 B:  415x (6.92%)
4 B:  310x (5.17%)
   12 B:  274x (4.57%)
   16 B:  119x (1.98%)

Where sample of C++ artificial can be seen here:

  struct iterator D.608813;
  struct iterator D.369241;

  try
{
  ASAN_MARK (2, , 8);
  _1 = >D.110510._M_impl._M_start;
  __gnu_cxx::__normal_iterator >::__normal_iterator (, _1);
  try
{
  D.608813 = D.369241;
  return D.608813;
}
  finally
{
  ASAN_MARK (1, , 8);
}
}
  catch
{
  <<>>
}

Problem is that these artificial variables (>70% of all in tramp3d) are often 
passed by reference and many functions in tramp3d either mark the argument
as unused, or just dereference. In situations where a reference is not saved, 
these variables should not live in memory. However,
do we have a machinery that can help with that?

My next step would be to adapt sanopt algorithm to catch use-after-scope 
{un}poisoning, however this is a different story that has significant impact
on # of poisoned variables.

Thoughts?
Martin


Re: [RFC][PATCH] Speed-up use-after-scope (re-writing to SSA)

2016-11-16 Thread Martin Liška
On 11/16/2016 02:07 PM, Jakub Jelinek wrote:
> On Wed, Nov 16, 2016 at 01:25:04PM +0100, Martin Liška wrote:
>>  
>> +
>> +/* Expand the ASAN_{LOAD,STORE} builtins.  */
> 
> Stale comment.

Fixed.

> 
>> +
>> +bool
>> +asan_expand_poison_ifn (gimple_stmt_iterator *iter,
>> +bool *need_commit_edge_insert)
>> +{
> ...
>> +  use_operand_p use_p;
>> +  imm_use_iterator imm_iter;
>> +  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, poisoned_var)
>> +{
>> +  gimple *use = USE_STMT (use_p);
>> +
> 
> You want to ignore debug stmts uses here (or reset them).

Likewise.

> 
>> +  built_in_function b = (recover_p
>> + ? BUILT_IN_ASAN_REPORT_USE_AFTER_SCOPE_NOABORT
>> + : BUILT_IN_ASAN_REPORT_USE_AFTER_SCOPE);
>> +  tree fun = builtin_decl_implicit (b);
>> +  pretty_printer pp;
>> +  pp_tree_identifier (, DECL_NAME (var_decl));
>> +
>> +  gcall *call = gimple_build_call (fun, 2, asan_pp_string (),
>> +   DECL_SIZE_UNIT (var_decl));
>> +  gimple_set_location (call, gimple_location (g));
> 
> Is that the location you want?  I mean shouldn't it use gimple_location (use)
> instead?  The bug is on the use, not on the spot where it went out of scope.
> Though the question is what to use if gimple_location (use) is
> UNKNOWN_LOCATION.

I changed the location to gimple_location(use).

> 
>> +
>> +  /* If ASAN_POISON is used in a PHI node, let's insert the call on
>> + the leading to the PHI node BB.  */
> 
> The comment doesn't make sense gramatically to me.

Modified.

> 
>> +  if (is_a  (use))
>> +{
>> +  gphi * phi = dyn_cast (use);
>> +  for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
>> +if (gimple_phi_arg_def (phi, i) == poisoned_var)
>> +  {
>> +edge e = gimple_phi_arg_edge (phi, i);
>> +gsi_insert_seq_on_edge (e, call);
>> +*need_commit_edge_insert = true;
> 
> What if there are multiple PHI args with that use?
> Shouldn't you use just FOR_EACH_USE_ON_STMT or what macros we have?

Well, as I read the macro, I still have to iterate over gphi arguments
to find the proper edge.

> 
>> --- a/libsanitizer/asan/asan_errors.cc
>> +++ b/libsanitizer/asan/asan_errors.cc
>> @@ -279,6 +279,27 @@ void ErrorInvalidPointerPair::Print() {
>>ReportErrorSummary(bug_type, );
>>  }
> 
> As I wrote on IRC, we have to submit this to compiler-rt and only
> if it is accepted, cherry-pick it together with the gcc changes.

Sure, if we are fine with the GCC part, I can suggest the sanitizer changes.

> 
>> --- a/libsanitizer/asan/asan_errors.h
>> +++ b/libsanitizer/asan/asan_errors.h
>> @@ -294,6 +294,24 @@ struct ErrorInvalidPointerPair : ErrorBase {
>>void Print();
>>  };
>>  
>> +struct ErrorUseAfterScope : ErrorBase {
>> +  uptr pc, bp, sp;
>> +  const char *variable_name;
>> +  u32 variable_size;
> 
> Shouldn't this be uptr?

Yep, changed on all places.
I'm attaching second version of the patch. I've tested the patch on linux 
kernel and I can see
>10K places where ASAN_POISON is removed (apparently there's not place where we 
>would expand to
the new API entry point).

Martin

> 
>> +  ErrorUseAfterScope(u32 tid, uptr pc_, uptr bp_, uptr sp_,
>> + const char *variable_name_, u32 variable_size_)
> 
> And here.
> 
>> +// --- ReportUseAfterScope --- {{{1
>> +void ReportUseAfterScope(const char *variable_name, u32 variable_size,
> 
> And here?
> 
>> +void ReportUseAfterScope(const char *variable_name, u32 variable_size,
>> + bool fatal);
> 
> And here?
> 
>   Jakub
> 

>From 9505c31813f224b855c5b2fab6c157e99ce54e59 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 14 Nov 2016 16:49:05 +0100
Subject: [PATCH] use-after-scope: introduce ASAN_POISON internal fn

gcc/ChangeLog:

2016-11-16  Martin Liska  

	* asan.c (asan_expand_poison_ifn): New function.
	* asan.h (asan_expand_poison_ifn): Declare the function.
	* internal-fn.c (expand_ASAN_POISON): New function.
	* internal-fn.def (ASAN_POISON): New internal fn.
	* sanitizer.def (BUILT_IN_ASAN_REPORT_USE_AFTER_SCOPE_NOABORT):
	New built-in.
	(BUILT_IN_ASAN_REPORT_USE_AFTER_SCOPE): Likewise.
	* sanopt.c (pass_sanopt::execute): Expand IFN_ASAN_POISON.
	* tree-ssa.c (is_asan_mark_p): New function.
	(execute_update_addresses_taken): Make local variables as not
	addressable if address of these varibles is just taken by
	ASAN_MARK.

gcc/testsuite/ChangeLog:

2016-11-16  Martin Liska  

	* gcc.dg/asan/use-after-scope-3.c: Run just with -O0.
	* gcc.dg/asan/use-after-scope-9.c: Run just with -O2 and
	change expected output.
---
 gcc/asan.c| 72 ++-
 gcc/asan.h|  1 +
 gcc/internal-fn.c |  7 +++
 gcc/internal-fn.def   |  1 

Re: [RFC][PATCH] Speed-up use-after-scope (re-writing to SSA)

2016-11-16 Thread Jakub Jelinek
On Wed, Nov 16, 2016 at 01:25:04PM +0100, Martin Liška wrote:
>  
> +
> +/* Expand the ASAN_{LOAD,STORE} builtins.  */

Stale comment.

> +
> +bool
> +asan_expand_poison_ifn (gimple_stmt_iterator *iter,
> + bool *need_commit_edge_insert)
> +{
...
> +  use_operand_p use_p;
> +  imm_use_iterator imm_iter;
> +  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, poisoned_var)
> +{
> +  gimple *use = USE_STMT (use_p);
> +

You want to ignore debug stmts uses here (or reset them).

> +  built_in_function b = (recover_p
> +  ? BUILT_IN_ASAN_REPORT_USE_AFTER_SCOPE_NOABORT
> +  : BUILT_IN_ASAN_REPORT_USE_AFTER_SCOPE);
> +  tree fun = builtin_decl_implicit (b);
> +  pretty_printer pp;
> +  pp_tree_identifier (, DECL_NAME (var_decl));
> +
> +  gcall *call = gimple_build_call (fun, 2, asan_pp_string (),
> +DECL_SIZE_UNIT (var_decl));
> +  gimple_set_location (call, gimple_location (g));

Is that the location you want?  I mean shouldn't it use gimple_location (use)
instead?  The bug is on the use, not on the spot where it went out of scope.
Though the question is what to use if gimple_location (use) is
UNKNOWN_LOCATION.

> +
> +  /* If ASAN_POISON is used in a PHI node, let's insert the call on
> +  the leading to the PHI node BB.  */

The comment doesn't make sense gramatically to me.

> +  if (is_a  (use))
> + {
> +   gphi * phi = dyn_cast (use);
> +   for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> + if (gimple_phi_arg_def (phi, i) == poisoned_var)
> +   {
> + edge e = gimple_phi_arg_edge (phi, i);
> + gsi_insert_seq_on_edge (e, call);
> + *need_commit_edge_insert = true;

What if there are multiple PHI args with that use?
Shouldn't you use just FOR_EACH_USE_ON_STMT or what macros we have?

> --- a/libsanitizer/asan/asan_errors.cc
> +++ b/libsanitizer/asan/asan_errors.cc
> @@ -279,6 +279,27 @@ void ErrorInvalidPointerPair::Print() {
>ReportErrorSummary(bug_type, );
>  }

As I wrote on IRC, we have to submit this to compiler-rt and only
if it is accepted, cherry-pick it together with the gcc changes.

> --- a/libsanitizer/asan/asan_errors.h
> +++ b/libsanitizer/asan/asan_errors.h
> @@ -294,6 +294,24 @@ struct ErrorInvalidPointerPair : ErrorBase {
>void Print();
>  };
>  
> +struct ErrorUseAfterScope : ErrorBase {
> +  uptr pc, bp, sp;
> +  const char *variable_name;
> +  u32 variable_size;

Shouldn't this be uptr?

> +  ErrorUseAfterScope(u32 tid, uptr pc_, uptr bp_, uptr sp_,
> + const char *variable_name_, u32 variable_size_)

And here.

> +// --- ReportUseAfterScope --- {{{1
> +void ReportUseAfterScope(const char *variable_name, u32 variable_size,

And here?

> +void ReportUseAfterScope(const char *variable_name, u32 variable_size,
> + bool fatal);

And here?

Jakub


Re: [RFC][PATCH] Speed-up use-after-scope (re-writing to SSA)

2016-11-16 Thread Martin Liška
On 11/16/2016 01:25 PM, Martin Liška wrote:
> Hello
> 
> Following patch is a candidate that re-writes VAR_DECLs that are
> is_gimple_reg_type with:
> my_char_25 = ASAN_POISON ();
> 
> that is eventually transformed to:
> __builtin___asan_report_use_after_scope_noabort ("my_char", 1);
> 
> at places where my_char_25 is used. That introduces a new entry point
> to ASAN runtime, reporting:
> 
> ==18378==ERROR: AddressSanitizer: stack-use-after-scope at pc 0x004007b4 
> bp 0x0001 sp 0x00400603
> ACCESS of size 1 for variable 'my_char' thread T0
> #0 0x400602 in main (/tmp/a.out+0x400602)
> #1 0x7fa6e572d290 in __libc_start_main (/lib64/libc.so.6+0x20290)
> #2 0x400669 in _start (/tmp/a.out+0x400669)
> 
> SUMMARY: AddressSanitizer: stack-use-after-scope (/tmp/a.out+0x400602) in main
> 
> I'm still not sure where exactly do the expansion of ASAN_POISON as some 
> cleanup
> after the transformation would be desired.
> 
> Thoughts?
> Thanks,
> Martin 
> 
> 
> 
> 

There's an example:

int
main (void)
{
  char *ptr;
  {
char my_char;
ptr = _char;
  }

  return *ptr;
}

$ g++ /tmp/use-after-scope-1.c -fsanitize=address -O0 && ./a.out 
=
==16035==ERROR: AddressSanitizer: stack-use-after-scope on address 
0x7ffe76322240 at pc 0x00400848 bp 0x7ffe76322200 sp 0x7ffe763221f8
READ of size 1 at 0x7ffe76322240 thread T0
#0 0x400847 in main (/tmp/a.out+0x400847)
#1 0x7f0005739290 in __libc_start_main (/lib64/libc.so.6+0x20290)
#2 0x4006b9 in _start (/tmp/a.out+0x4006b9)

Address 0x7ffe76322240 is located in stack of thread T0 at offset 32 in frame
#0 0x400786 in main (/tmp/a.out+0x400786)

  This frame has 1 object(s):
[32, 33) 'my_char' <== Memory access at offset 32 is inside this variable
HINT: this may be a false positive if your program uses some custom stack 
unwind mechanism or swapcontext
  (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-scope (/tmp/a.out+0x400847) in main
Shadow bytes around the buggy address:
  0x10004ec5c3f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004ec5c400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004ec5c410: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004ec5c420: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004ec5c430: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x10004ec5c440: 00 00 00 00 f1 f1 f1 f1[f8]f2 f2 f2 f3 f3 f3 f3
  0x10004ec5c450: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004ec5c460: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004ec5c470: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004ec5c480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004ec5c490: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:   00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:   fa
  Freed heap region:   fd
  Stack left redzone:  f1
  Stack mid redzone:   f2
  Stack right redzone: f3
  Stack after return:  f5
  Stack use after scope:   f8
  Global redzone:  f9
  Global init order:   f6
  Poisoned by user:f7
  Container overflow:  fc
  Array cookie:ac
  Intra object redzone:bb
  ASan internal:   fe
  Left alloca redzone: ca
  Right alloca redzone:cb
==16035==ABORTING

$ g++ /tmp/use-after-scope-1.c -fsanitize=address -O2 && ./a.out 
=
==16049==ERROR: AddressSanitizer: stack-use-after-scope at pc 0x00400794 bp 
0x0001 sp 0x004005f3
ACCESS of size 1 for variable 'my_char' thread T0
#0 0x4005f2 in main (/tmp/a.out+0x4005f2)
#1 0x7f883337e290 in __libc_start_main (/lib64/libc.so.6+0x20290)
#2 0x400649 in _start (/tmp/a.out+0x400649)

SUMMARY: AddressSanitizer: stack-use-after-scope (/tmp/a.out+0x4005f2) in main
==16049==ABORTING

Martin


[RFC][PATCH] Speed-up use-after-scope (re-writing to SSA)

2016-11-16 Thread Martin Liška
Hello

Following patch is a candidate that re-writes VAR_DECLs that are
is_gimple_reg_type with:
my_char_25 = ASAN_POISON ();

that is eventually transformed to:
__builtin___asan_report_use_after_scope_noabort ("my_char", 1);

at places where my_char_25 is used. That introduces a new entry point
to ASAN runtime, reporting:

==18378==ERROR: AddressSanitizer: stack-use-after-scope at pc 0x004007b4 bp 
0x0001 sp 0x00400603
ACCESS of size 1 for variable 'my_char' thread T0
#0 0x400602 in main (/tmp/a.out+0x400602)
#1 0x7fa6e572d290 in __libc_start_main (/lib64/libc.so.6+0x20290)
#2 0x400669 in _start (/tmp/a.out+0x400669)

SUMMARY: AddressSanitizer: stack-use-after-scope (/tmp/a.out+0x400602) in main

I'm still not sure where exactly do the expansion of ASAN_POISON as some cleanup
after the transformation would be desired.

Thoughts?
Thanks,
Martin 




>From c115207230a5be979119b6ac6572ae6af2a0ccd7 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 14 Nov 2016 16:49:05 +0100
Subject: [PATCH] use-after-scope: introduce ASAN_POISON internal fn

---
 gcc/asan.c   | 72 +++-
 gcc/asan.h   |  1 +
 gcc/internal-fn.c|  7 
 gcc/internal-fn.def  |  1 +
 gcc/sanitizer.def|  8 +
 gcc/sanopt.c |  9 +
 gcc/tree-ssa.c   | 65 ++--
 libsanitizer/asan/asan_errors.cc | 21 
 libsanitizer/asan/asan_errors.h  | 19 +++
 libsanitizer/asan/asan_report.cc | 10 ++
 libsanitizer/asan/asan_report.h  |  3 ++
 libsanitizer/asan/asan_rtl.cc| 16 +
 12 files changed, 221 insertions(+), 11 deletions(-)

diff --git a/gcc/asan.c b/gcc/asan.c
index 6e93ea3..d7d4267 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -32,8 +32,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "memmodel.h"
 #include "tm_p.h"
+#include "ssa.h"
 #include "stringpool.h"
-#include "tree-vrp.h"
 #include "tree-ssanames.h"
 #include "optabs.h"
 #include "emit-rtl.h"
@@ -2979,6 +2979,76 @@ asan_expand_check_ifn (gimple_stmt_iterator *iter, bool use_calls)
   return true;
 }
 
+
+/* Expand the ASAN_{LOAD,STORE} builtins.  */
+
+bool
+asan_expand_poison_ifn (gimple_stmt_iterator *iter,
+			bool *need_commit_edge_insert)
+{
+  gimple *g = gsi_stmt (*iter);
+  tree poisoned_var = gimple_call_lhs (g);
+  if (!poisoned_var)
+{
+  gsi_remove (iter, true);
+  return true;
+}
+
+  tree var_decl = SSA_NAME_VAR (poisoned_var);
+
+  bool recover_p;
+  if (flag_sanitize & SANITIZE_USER_ADDRESS)
+recover_p = (flag_sanitize_recover & SANITIZE_USER_ADDRESS) != 0;
+  else
+recover_p = (flag_sanitize_recover & SANITIZE_KERNEL_ADDRESS) != 0;
+
+  use_operand_p use_p;
+  imm_use_iterator imm_iter;
+  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, poisoned_var)
+{
+  gimple *use = USE_STMT (use_p);
+
+  built_in_function b = (recover_p
+			 ? BUILT_IN_ASAN_REPORT_USE_AFTER_SCOPE_NOABORT
+			 : BUILT_IN_ASAN_REPORT_USE_AFTER_SCOPE);
+  tree fun = builtin_decl_implicit (b);
+  pretty_printer pp;
+  pp_tree_identifier (, DECL_NAME (var_decl));
+
+  gcall *call = gimple_build_call (fun, 2, asan_pp_string (),
+   DECL_SIZE_UNIT (var_decl));
+  gimple_set_location (call, gimple_location (g));
+
+  /* If ASAN_POISON is used in a PHI node, let's insert the call on
+	 the leading to the PHI node BB.  */
+  if (is_a  (use))
+	{
+	  gphi * phi = dyn_cast (use);
+	  for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+	if (gimple_phi_arg_def (phi, i) == poisoned_var)
+	  {
+		edge e = gimple_phi_arg_edge (phi, i);
+		gsi_insert_seq_on_edge (e, call);
+		*need_commit_edge_insert = true;
+
+		break;
+	  }
+	}
+  else
+	{
+	  gimple_stmt_iterator gsi = gsi_for_stmt (use);
+	  gsi_insert_before (, call, GSI_NEW_STMT);
+	}
+}
+
+  gimple *nop = gimple_build_nop ();
+  SSA_NAME_IS_DEFAULT_DEF (poisoned_var) = true;
+  SSA_NAME_DEF_STMT (poisoned_var) = nop;
+  gsi_replace (iter, nop, GSI_NEW_STMT);
+
+  return false;
+}
+
 /* Instrument the current function.  */
 
 static unsigned int
diff --git a/gcc/asan.h b/gcc/asan.h
index 9cf5904..6c25955 100644
--- a/gcc/asan.h
+++ b/gcc/asan.h
@@ -30,6 +30,7 @@ extern void initialize_sanitizer_builtins (void);
 extern tree asan_dynamic_init_call (bool);
 extern bool asan_expand_check_ifn (gimple_stmt_iterator *, bool);
 extern bool asan_expand_mark_ifn (gimple_stmt_iterator *);
+extern bool asan_expand_poison_ifn (gimple_stmt_iterator *, bool *);
 
 extern gimple_stmt_iterator create_cond_insert_point
  (gimple_stmt_iterator *, bool, bool, bool, basic_block *, basic_block *);
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index ca347c5..17624e8 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -246,6 +246,13 @@ expand_ASAN_MARK (internal_fn, gcall *)