Re: Fix build of jit (was Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v3))

2016-11-08 Thread Jakub Jelinek
On Tue, Nov 08, 2016 at 10:38:23AM +0100, Martin Liška wrote:
> > I believe the 0 here is a bug, I'd think we should be using something like
> > ATTR_TMPURE_NOTHROW_LEAF_LIST that we are using __asan_load* - the functions
> > aren't going to throw, nor call anything in the current TU.  Not 100% sure
> > about the TMPURE, after all they do write/read memory (the shadow one).
> > So maybe ATTR_NOTHROW_LEAF_LIST instead for now?  Martin?
> 
> Yes, 0 is bug. I'm inclining to ATTR_NOTHROW_LEAF_LIST as 
> __asan_{un}poison_stack_memory
> modifies global memory. It would be more safe. I'm also going to change it 
> for ASAN_MARK
> internal function (where ECF_TM_PURE is currently selected).

The TM stuff needs to be eventually resolved with the TM maintainers
(Richard Henderson and Torvald Riegel), the thing is that we can annotate
stuff even in TM regions, tm_pure functions etc.  I believe we have lots of
other TM issues (internal calls and the like) that haven't been addressed.

Jakub


Re: Fix build of jit (was Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v3))

2016-11-08 Thread Martin Liška
On 11/07/2016 05:17 PM, Jakub Jelinek wrote:
> On Mon, Nov 07, 2016 at 11:07:13AM -0500, David Malcolm wrote:
>> The patch (r241896) introduced an error in the build of the jit:
>>
>> ../../src/gcc/jit/jit-builtins.c:62:1: error: invalid conversion from
>> ‘int’ to ‘gcc::jit::built_in_attribute’ [-fpermissive]
>>  };
>>  ^
>>
>> which seems to be due to the "0" for ATTRS in:
>>
>> --- a/gcc/sanitizer.def
>> +++ b/gcc/sanitizer.def
>> @@ -165,6 +165,10 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_BEFORE_DYNAMIC_INIT,
>>  DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_AFTER_DYNAMIC_INIT,
>>"__asan_after_dynamic_init",
>>BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
>> +DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_CLOBBER_N, "__asan_poison_stack_memory",
>> +  BT_FN_VOID_PTR_PTRMODE, 0)
>> +DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_UNCLOBBER_N, 
>> "__asan_unpoison_stack_memory",
>> +  BT_FN_VOID_PTR_PTRMODE, 0)
> 
> I believe the 0 here is a bug, I'd think we should be using something like
> ATTR_TMPURE_NOTHROW_LEAF_LIST that we are using __asan_load* - the functions
> aren't going to throw, nor call anything in the current TU.  Not 100% sure
> about the TMPURE, after all they do write/read memory (the shadow one).
> So maybe ATTR_NOTHROW_LEAF_LIST instead for now?  Martin?

Yes, 0 is bug. I'm inclining to ATTR_NOTHROW_LEAF_LIST as 
__asan_{un}poison_stack_memory
modifies global memory. It would be more safe. I'm also going to change it for 
ASAN_MARK
internal function (where ECF_TM_PURE is currently selected).

I'm testing patch for that.
Martin

> 
>> Is the attached patch OK as a fix? (assuming testing passes)  Or should
>> these builtins have other attrs?  (sorry, am not very familiar with the
>> sanitizer code).
> 
>   Jakub
> 



Re: Fix build of jit (was Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v3))

2016-11-07 Thread Jakub Jelinek
On Mon, Nov 07, 2016 at 11:07:13AM -0500, David Malcolm wrote:
> The patch (r241896) introduced an error in the build of the jit:
> 
> ../../src/gcc/jit/jit-builtins.c:62:1: error: invalid conversion from
> ‘int’ to ‘gcc::jit::built_in_attribute’ [-fpermissive]
>  };
>  ^
> 
> which seems to be due to the "0" for ATTRS in:
> 
> --- a/gcc/sanitizer.def
> +++ b/gcc/sanitizer.def
> @@ -165,6 +165,10 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_BEFORE_DYNAMIC_INIT,
>  DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_AFTER_DYNAMIC_INIT,
> "__asan_after_dynamic_init",
> BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_CLOBBER_N, "__asan_poison_stack_memory",
> +   BT_FN_VOID_PTR_PTRMODE, 0)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_UNCLOBBER_N, 
> "__asan_unpoison_stack_memory",
> +   BT_FN_VOID_PTR_PTRMODE, 0)

I believe the 0 here is a bug, I'd think we should be using something like
ATTR_TMPURE_NOTHROW_LEAF_LIST that we are using __asan_load* - the functions
aren't going to throw, nor call anything in the current TU.  Not 100% sure
about the TMPURE, after all they do write/read memory (the shadow one).
So maybe ATTR_NOTHROW_LEAF_LIST instead for now?  Martin?

> Is the attached patch OK as a fix? (assuming testing passes)  Or should
> these builtins have other attrs?  (sorry, am not very familiar with the
> sanitizer code).

Jakub


Fix build of jit (was Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v3))

2016-11-07 Thread David Malcolm
On Mon, 2016-11-07 at 11:03 +0100, Martin Liška wrote:
> Hello.
> 
> After discussion with Jakub, I'm resending new version of the patch,
> where I changed following:
> 1) gimplify_ctxp->live_switch_vars is used to track variables
> introduced in switch_expr. Every time
>a case_label_expr is seen, these are unpoisoned. It's quite
> conservative, however it covers all
>corner cases on can come up with. Compared to clang, we are much
> more precise in switch statements
>where a variable liveness crosses label boundary.
> 2) I found a bug where ASAN_CHECK was optimized out due to missing
> check of IFN_ASAN_MARK internal fn.
>Test was added for that.
> 3) Multiple switch tests have been added, which is going to be sent
> in upcoming email.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression
> tests (+ asan bootstrap finishes
> successfully).

The patch (r241896) introduced an error in the build of the jit:

../../src/gcc/jit/jit-builtins.c:62:1: error: invalid conversion from
‘int’ to ‘gcc::jit::built_in_attribute’ [-fpermissive]
 };
 ^

which seems to be due to the "0" for ATTRS in:

--- a/gcc/sanitizer.def
+++ b/gcc/sanitizer.def
@@ -165,6 +165,10 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_BEFORE_DYNAMIC_INIT,
 DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_AFTER_DYNAMIC_INIT,
  "__asan_after_dynamic_init",
  BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_CLOBBER_N, "__asan_poison_stack_memory",
+ BT_FN_VOID_PTR_PTRMODE, 0)
+DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_UNCLOBBER_N, 
"__asan_unpoison_stack_memory",
+ BT_FN_VOID_PTR_PTRMODE, 0)

Is the attached patch OK as a fix? (assuming testing passes)  Or should
these builtins have other attrs?  (sorry, am not very familiar with the
sanitizer code).

Dave
From 6db5f9e50dc95f504d33970ee553172bbf400ae7 Mon Sep 17 00:00:00 2001
From: David Malcolm 
Date: Mon, 7 Nov 2016 11:21:20 -0500
Subject: [PATCH] Fix build of jit

gcc/ChangeLog:
	* asan.c (ATTR_NULL): Define.
	* sanitizer.def (BUILT_IN_ASAN_CLOBBER_N): Use ATTR_NULL rather
	than 0.
	(BUILT_IN_ASAN_UNCLOBBER_N): Likewise.
---
 gcc/asan.c| 2 ++
 gcc/sanitizer.def | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/asan.c b/gcc/asan.c
index 1e0ce8d..4a124cb 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -2463,6 +2463,8 @@ initialize_sanitizer_builtins (void)
 #define BT_FN_I16_CONST_VPTR_INT BT_FN_IX_CONST_VPTR_INT[4]
 #define BT_FN_I16_VPTR_I16_INT BT_FN_IX_VPTR_IX_INT[4]
 #define BT_FN_VOID_VPTR_I16_INT BT_FN_VOID_VPTR_IX_INT[4]
+#undef ATTR_NULL
+#define ATTR_NULL 0
 #undef ATTR_NOTHROW_LEAF_LIST
 #define ATTR_NOTHROW_LEAF_LIST ECF_NOTHROW | ECF_LEAF
 #undef ATTR_TMPURE_NOTHROW_LEAF_LIST
diff --git a/gcc/sanitizer.def b/gcc/sanitizer.def
index 1c142e9..596b8b0 100644
--- a/gcc/sanitizer.def
+++ b/gcc/sanitizer.def
@@ -166,9 +166,9 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_AFTER_DYNAMIC_INIT,
 		  "__asan_after_dynamic_init",
 		  BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
 DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_CLOBBER_N, "__asan_poison_stack_memory",
-		  BT_FN_VOID_PTR_PTRMODE, 0)
+		  BT_FN_VOID_PTR_PTRMODE, ATTR_NULL)
 DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_UNCLOBBER_N, "__asan_unpoison_stack_memory",
-		  BT_FN_VOID_PTR_PTRMODE, 0)
+		  BT_FN_VOID_PTR_PTRMODE, ATTR_NULL)
 
 /* Thread Sanitizer */
 DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_INIT, "__tsan_init", 
-- 
1.8.5.3



Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v3)

2016-11-07 Thread Jakub Jelinek
On Mon, Nov 07, 2016 at 11:03:11AM +0100, Martin Liška wrote:
> Hello.
> 
> After discussion with Jakub, I'm resending new version of the patch, where I 
> changed following:
> 1) gimplify_ctxp->live_switch_vars is used to track variables introduced in 
> switch_expr. Every time
>a case_label_expr is seen, these are unpoisoned. It's quite conservative, 
> however it covers all
>corner cases on can come up with. Compared to clang, we are much more 
> precise in switch statements
>where a variable liveness crosses label boundary.
> 2) I found a bug where ASAN_CHECK was optimized out due to missing check of 
> IFN_ASAN_MARK internal fn.
>Test was added for that.
> 3) Multiple switch tests have been added, which is going to be sent in 
> upcoming email.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests (+ 
> asan bootstrap finishes
> successfully).

Ok for trunk.  Hopefully we can resolve the most common cases for switch
incrementally, either still during stage1 or early in stage3.

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v3)

2016-11-07 Thread Martin Liška
Hello.

After discussion with Jakub, I'm resending new version of the patch, where I 
changed following:
1) gimplify_ctxp->live_switch_vars is used to track variables introduced in 
switch_expr. Every time
   a case_label_expr is seen, these are unpoisoned. It's quite conservative, 
however it covers all
   corner cases on can come up with. Compared to clang, we are much more 
precise in switch statements
   where a variable liveness crosses label boundary.
2) I found a bug where ASAN_CHECK was optimized out due to missing check of 
IFN_ASAN_MARK internal fn.
   Test was added for that.
3) Multiple switch tests have been added, which is going to be sent in upcoming 
email.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests (+ 
asan bootstrap finishes
successfully).

Martin
>From 2b37a59dd639ad740fdbd49d57b9f1975fc35046 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 3 May 2016 15:35:22 +0200
Subject: [PATCH 1/2] Introduce -fsanitize-address-use-after-scope

gcc/c-family/ChangeLog:

2016-10-27  Martin Liska  

	* c-warn.c (warn_for_unused_label): Save all labels used
	in goto or in 

gcc/ChangeLog:

2016-10-27  Martin Liska  

	* asan.c (enum asan_check_flags): Move the enum to header file.
	(asan_init_shadow_ptr_types): Make type creation more generic.
	(shadow_mem_size): New function.
	(asan_emit_stack_protection): Use newly added ASAN_SHADOW_GRANULARITY.
	Rewritten stack unpoisoning code.
	(build_shadow_mem_access): Add new argument return_address.
	(instrument_derefs): Instrument local variables if use after scope
	sanitization is enabled.
	(asan_store_shadow_bytes): New function.
	(asan_expand_mark_ifn): Likewise.
	(asan_sanitize_stack_p): Moved from asan_sanitize_stack_p.
	* asan.h (enum asan_mark_flags): Moved here from asan.c
	(asan_protect_stack_decl): Protect all declaration that need
	to live in memory.
	(asan_sanitize_use_after_scope): New function.
	(asan_no_sanitize_address_p): Likewise.
	* cfgexpand.c (partition_stack_vars): Consider
	asan_sanitize_use_after_scope in condition.
	(expand_stack_vars): Likewise.
	* common.opt (-fsanitize-address-use-after-scope): New option.
	* doc/invoke.texi (use-after-scope-direct-emission-threshold):
	Explain the parameter.
	* flag-types.h (enum sanitize_code): Define SANITIZE_USE_AFTER_SCOPE.
	* gimplify.c (build_asan_poison_call_expr): New function.
	(asan_poison_variable): Likewise.
	(gimplify_bind_expr): Generate poisoning/unpoisoning for local
	variables that have address taken.
	(gimplify_decl_expr): Likewise.
	(gimplify_target_expr): Likewise for C++ temporaries.
	(sort_by_decl_uid): New function.
	(gimplify_expr): Unpoison all variables for a label we can jump
	from outside of a scope.
	(gimplify_switch_expr): Unpoison variables defined in the switch
	context.
	(gimplify_function_tree): Clear asan_poisoned_variables.
	(asan_poison_variables): New function.
	(warn_switch_unreachable_r): Handle IFN_ASAN_MARK.
	* internal-fn.c (expand_ASAN_MARK): New function.
	* internal-fn.def (ASAN_MARK): Declare.
	* opts.c (finish_options): Handle -fstack-reuse if
	-fsanitize-address-use-after-scope is enabled.
	(common_handle_option): Enable address sanitization if
	-fsanitize-address-use-after-scope is enabled.
	* params.def (PARAM_USE_AFTER_SCOPE_DIRECT_EMISSION_THRESHOLD):
	New parameter.
	* params.h: Likewise.
	* sancov.c (pass_sanopt::execute): Handle IFN_ASAN_MARK.
	* sanitizer.def: Define __asan_poison_stack_memory and
	__asan_unpoison_stack_memory functions.
	* asan.c (asan_mark_poison_p): New function.
	(transform_statements): Handle asan_mark_poison_p calls.
	* gimple.c (nonfreeing_call_p): Handle IFN_ASAN_MARK.
---
 gcc/asan.c| 302 +-
 gcc/asan.h|  66 +--
 gcc/c-family/c-warn.c |   9 +-
 gcc/cfgexpand.c   |  18 +--
 gcc/common.opt|   3 +
 gcc/doc/invoke.texi   |  15 ++-
 gcc/gimple.c  |   3 +
 gcc/gimplify.c| 234 +++---
 gcc/internal-fn.c |   9 ++
 gcc/internal-fn.def   |   1 +
 gcc/opts.c|  27 -
 gcc/params.def|   6 +
 gcc/params.h  |   2 +
 gcc/sanitizer.def |   4 +
 gcc/sanopt.c  |   3 +
 15 files changed, 603 insertions(+), 99 deletions(-)

diff --git a/gcc/asan.c b/gcc/asan.c
index c6d9240..1e0ce8d 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -245,6 +245,22 @@ static unsigned HOST_WIDE_INT asan_shadow_offset_value;
 static bool asan_shadow_offset_computed;
 static vec sanitized_sections;
 
+/* Return true if STMT is ASAN_MARK poisoning internal function call.  */
+static inline bool
+asan_mark_poison_p (gimple *stmt)
+{
+  return (gimple_call_internal_p (stmt, IFN_ASAN_MARK)
+	  && tree_to_uhwi (gimple_call_arg (stmt, 0)) == ASAN_MARK_CLOBBER);
+
+}
+
+/* Set of variable declarations that are going to be guarded by
+   use-after-scope sanitizer.  */
+
+static hash_set *asan_handled_variables 

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-04 Thread Martin Liška
On 11/04/2016 10:32 AM, Jakub Jelinek wrote:
> Hi!
> 
> On Fri, Nov 04, 2016 at 10:17:31AM +0100, Martin Liška wrote:
>> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
>> index 813777d..86ce793 100644
>> --- a/gcc/gimplify.c
>> +++ b/gcc/gimplify.c
>> @@ -1678,7 +1678,9 @@ warn_switch_unreachable_r (gimple_stmt_iterator 
>> *gsi_p, bool *handled_ops_p,
>>   worse location info.  */
>>if (gimple_try_eval (stmt) == NULL)
>>  {
>> -  wi->info = stmt;
>> +  gimple_stmt_iterator *it = XNEW (gimple_stmt_iterator);
>> +  memcpy (it, gsi_p, sizeof (gimple_stmt_iterator));
> 
> That would be cleaner as *it = *gsi_p;

I know that it's kind of ugly, but as the gimple stmt is not yet added to a BB, 
using
gsi_for_stmt ICEs:

/tmp/use-after-scope-switch.c:12:5: internal compiler error: Segmentation fault
 switch (argc)
 ^~
0xe16a14 crash_signal
../../gcc/toplev.c:338
0xadf890 bb_seq_addr
../../gcc/gimple.h:1658
0xae01dd gsi_start_bb
../../gcc/gimple-iterator.h:129
0xae111f gsi_for_stmt(gimple*)
../../gcc/gimple-iterator.c:617

> That set, I fail to see
> 1) the need to use a gsi pointer in wi->info compared to stmt itself,
>you can gsi_for_stmt cheaply at any time
> 2) why is anything done about this in warn_switch_unreachable_r
>- the problem isn't related to this warning IMHO.  Even
> switch (x)
>   {
>   case 1:
> int x;
> x = 6;
> ptr = 
> break;
>   case 2:
> ptr = 
> *ptr = 7;
> break;
>   }
>has the same issue and there is no switch unreachable code there, but you
>still want for -fsanitize-use-after-scope pretend it is actually:

You're right, that's not handled. I'm wondering whether it's always profitable
because when you do not reach the case 1, you're not doing a poisoning 
operation?

Martin

> x_tmp = x;
> {
>   int x;
>   switch (x_tmp)
> {
> case 1:
>   x = 6;
>   ptr = 
>   break;
> case 2:
>   ptr = 
>   *ptr = 7;
>   break;
> }
> }
> and put ASAN_MARK unpoisoning before GIMPLE_SWITCH.
> 
>   Jakub
> 



Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-04 Thread Jakub Jelinek
Hi!

On Fri, Nov 04, 2016 at 10:17:31AM +0100, Martin Liška wrote:
> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index 813777d..86ce793 100644
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -1678,7 +1678,9 @@ warn_switch_unreachable_r (gimple_stmt_iterator *gsi_p, 
> bool *handled_ops_p,
>worse location info.  */
>if (gimple_try_eval (stmt) == NULL)
>   {
> -   wi->info = stmt;
> +   gimple_stmt_iterator *it = XNEW (gimple_stmt_iterator);
> +   memcpy (it, gsi_p, sizeof (gimple_stmt_iterator));

That would be cleaner as *it = *gsi_p;
That set, I fail to see
1) the need to use a gsi pointer in wi->info compared to stmt itself,
   you can gsi_for_stmt cheaply at any time
2) why is anything done about this in warn_switch_unreachable_r
   - the problem isn't related to this warning IMHO.  Even
switch (x)
  {
  case 1:
int x;
x = 6;
ptr = 
break;
  case 2:
ptr = 
*ptr = 7;
break;
  }
   has the same issue and there is no switch unreachable code there, but you
   still want for -fsanitize-use-after-scope pretend it is actually:
x_tmp = x;
{
  int x;
  switch (x_tmp)
{
case 1:
  x = 6;
  ptr = 
  break;
case 2:
  ptr = 
  *ptr = 7;
  break;
}
}
and put ASAN_MARK unpoisoning before GIMPLE_SWITCH.

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-04 Thread Martin Liška
On 11/02/2016 03:35 PM, Jakub Jelinek wrote:
> On Wed, Nov 02, 2016 at 03:27:42PM +0100, Martin Liška wrote:
>>> So is there anything I should do wrt -Wswitch-unreachable?
>>>
>>> Marek
>>>
>>
>> Probably not. I'm having a patch puts GIMPLE_SWITCH statement to a proper 
>> place
>> in GIMPLE_BIND. Let's see whether such patch can bootstrap and survive 
>> regression
>> tests.
> 
> Please do that only for -fsanitize-use-after-scope, it will likely affect at
> least for -O0 the debugging experience.  For -O0 -fsanitize=address 
> -fsanitize-use-after-scope
> perhaps we could arrange for some extra stmt to have the locus of the
> switch (where we still don't want the vars to appear in scope) and then
> have no locus on the ASAN_MARK and actual GIMPLE_SWITCH or something
> similar.
> 
>   Jakub
> 

I'm sending patch where I put gimple switch statement to a place where all 
BIND_EXPR vars
are unpoisoned. I'm sending diff to a previous version and new version of the 
patch.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. 
Apart from that,
asan bootstrap successfully finished on x86_64-linux-gnu.

Martin
>From 742ba5e3f5eddb069a13bc51347d21a4ec6a45c6 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 3 May 2016 15:35:22 +0200
Subject: [PATCH 1/2] Introduce -fsanitize-address-use-after-scope

gcc/c-family/ChangeLog:

2016-10-27  Martin Liska  

	* c-warn.c (warn_for_unused_label): Save all labels used
	in goto or in 

gcc/ChangeLog:

2016-10-27  Martin Liska  

	* asan.c (enum asan_check_flags): Move the enum to header file.
	(asan_init_shadow_ptr_types): Make type creation more generic.
	(shadow_mem_size): New function.
	(asan_emit_stack_protection): Use newly added ASAN_SHADOW_GRANULARITY.
	Rewritten stack unpoisoning code.
	(build_shadow_mem_access): Add new argument return_address.
	(instrument_derefs): Instrument local variables if use after scope
	sanitization is enabled.
	(asan_store_shadow_bytes): New function.
	(asan_expand_mark_ifn): Likewise.
	(asan_sanitize_stack_p): Moved from asan_sanitize_stack_p.
	* asan.h (enum asan_mark_flags): Moved here from asan.c
	(asan_protect_stack_decl): Protect all declaration that need
	to live in memory.
	(asan_sanitize_use_after_scope): New function.
	(asan_no_sanitize_address_p): Likewise.
	* cfgexpand.c (partition_stack_vars): Consider
	asan_sanitize_use_after_scope in condition.
	(expand_stack_vars): Likewise.
	* common.opt (-fsanitize-address-use-after-scope): New option.
	* doc/invoke.texi (use-after-scope-direct-emission-threshold):
	Explain the parameter.
	* flag-types.h (enum sanitize_code): Define SANITIZE_USE_AFTER_SCOPE.
	* gimplify.c (build_asan_poison_call_expr): New function.
	(asan_poison_variable): Likewise.
	(gimplify_bind_expr): Generate poisoning/unpoisoning for local
	variables that have address taken.
	(gimplify_decl_expr): Likewise.
	(gimplify_target_expr): Likewise for C++ temporaries.
	(sort_by_decl_uid): New function.
	(gimplify_expr): Unpoison all variables for a label we can jump
	from outside of a scope.
	(gimplify_function_tree): Clear asan_poisoned_variables.
	* internal-fn.c (expand_ASAN_MARK): New function.
	* internal-fn.def (ASAN_MARK): Declare.
	* opts.c (finish_options): Handle -fstack-reuse if
	-fsanitize-address-use-after-scope is enabled.
	(common_handle_option): Enable address sanitization if
	-fsanitize-address-use-after-scope is enabled.
	(warn_switch_unreachable_r): Return iterator
	instead of gimple *.
	(maybe_warn_switch_unreachable): Release the iterator.
	(gimplify_switch_expr): In case of enabled sanitization, put switch
	gimple statement to a place where all variables in BIND_EXPR
	are sanitized.
	* params.def (PARAM_USE_AFTER_SCOPE_DIRECT_EMISSION_THRESHOLD):
	New parameter.
	* params.h: Likewise.
	* sancov.c (pass_sanopt::execute): Handle IFN_ASAN_MARK.
	sanitizer.def: Define __asan_poison_stack_memory and
	__asan_unpoison_stack_memory functions.
	* tree-eh.c (verify_norecord_switch_expr): Verify switch expr
	only when not doing sanitization.
---
 gcc/asan.c| 287 +-
 gcc/asan.h|  66 ++--
 gcc/c-family/c-warn.c |   9 +-
 gcc/cfgexpand.c   |  18 +---
 gcc/common.opt|   3 +
 gcc/doc/invoke.texi   |  15 ++-
 gcc/gimplify.c| 239 +
 gcc/internal-fn.c |   9 ++
 gcc/internal-fn.def   |   1 +
 gcc/opts.c|  27 -
 gcc/params.def|   6 ++
 gcc/params.h  |   2 +
 gcc/sanitizer.def |   4 +
 gcc/sanopt.c  |   3 +
 gcc/tree-eh.c |   4 +
 15 files changed, 592 insertions(+), 101 deletions(-)

diff --git a/gcc/asan.c b/gcc/asan.c
index c6d9240..95495d2 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -245,6 +245,13 @@ static unsigned HOST_WIDE_INT asan_shadow_offset_value;
 static bool asan_shadow_offset_computed;
 static vec sanitized_sections;
 
+/* Set of variable 

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-03 Thread Martin Liška
On 11/03/2016 03:03 PM, Jakub Jelinek wrote:
> On Thu, Nov 03, 2016 at 03:02:21PM +0100, Martin Liška wrote:
>>> But how would you be able to find out if there isn't any return *ptr; after
>>> the scope or similar (as MEM_REF)?  With is_gimple_reg, they will be turned
>>> into SSA form and you can easily verify (uses of ASAN_POISON are a problem
>>> if they are encountered at runtime).  What would you do for the
>>> must_live_in_memory vars?  Add some pass that detects it, handle it somehow
>>> in addressable pass, handle it in SRA, ... ?
>>
>> If there's return of *ptr, there must be a _char, and it looks
>>   _4 = MEM[(char *)_char];
>>
>> properly identifies that my_char has address taken.
> 
> It doesn't.  MEM_REF's ADDR_EXPR isn't considered to be address taking.
> 
>   Jakub
> 

You are of course right, my mistake in the patch draft.

Thanks for clarification,
Martin


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-03 Thread Jakub Jelinek
On Thu, Nov 03, 2016 at 03:02:21PM +0100, Martin Liška wrote:
> > But how would you be able to find out if there isn't any return *ptr; after
> > the scope or similar (as MEM_REF)?  With is_gimple_reg, they will be turned
> > into SSA form and you can easily verify (uses of ASAN_POISON are a problem
> > if they are encountered at runtime).  What would you do for the
> > must_live_in_memory vars?  Add some pass that detects it, handle it somehow
> > in addressable pass, handle it in SRA, ... ?
> 
> If there's return of *ptr, there must be a _char, and it looks
>   _4 = MEM[(char *)_char];
> 
> properly identifies that my_char has address taken.

It doesn't.  MEM_REF's ADDR_EXPR isn't considered to be address taking.

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-03 Thread Martin Liška
On 11/03/2016 02:44 PM, Jakub Jelinek wrote:
> Hi!
> 
> FYI, I think it is much more important to get the initial patch in, so
> resolve the switch + declarations inside of its body stuff, add testcases
> for that and post for re-review and check in.
> These optimizations can be done even in stage3.

I know that it's much urgent to have it done first. I'm currently testing patch
for the switch + declaration. Hopefully I'll send it today.

> 
> On Thu, Nov 03, 2016 at 02:34:47PM +0100, Martin Liška wrote:
>> I'm having a semi-working patch that comes up with the ASAN_POISON built-in. 
>> Well, to be honest,
>> I still have a feeling that doing the magic with the parallel variable is 
>> bit overkill. Maybe
>> a new runtime call would make it easier for us.
>>
>> However, I still don't fully understand why we want to support just 
>> is_gimple_reg variables.
>> Let's consider following test-case:
>>
>> void foo()
>> {
>> char *ptr;
>>   {
>> char my_char[9];
>> ptr = _char[0];
>>   }
>> }
>>
>> Where I would expect to optimize out:
>>   :
>>   _5 = (unsigned long) 9;
>>   _4 = (unsigned long) _char;
>>   __builtin___asan_unpoison_stack_memory (_4, _5);
>>   _7 = (unsigned long) 9;
>>   _6 = (unsigned long) _char;
>>   __builtin___asan_poison_stack_memory (_6, _7);
>>   return;
>>
>> where address of my_char is taken in the original source code, while not 
>> during tree-ssa
>> optimization, where the address is used only by ASAN_MARK calls.
> 
> But how would you be able to find out if there isn't any return *ptr; after
> the scope or similar (as MEM_REF)?  With is_gimple_reg, they will be turned
> into SSA form and you can easily verify (uses of ASAN_POISON are a problem
> if they are encountered at runtime).  What would you do for the
> must_live_in_memory vars?  Add some pass that detects it, handle it somehow
> in addressable pass, handle it in SRA, ... ?

If there's return of *ptr, there must be a _char, and it looks
  _4 = MEM[(char *)_char];

properly identifies that my_char has address taken.

M.

> 
>>
>> Doing such transformation can rapidly decrease number of 
>> __builtin___asan_{un}poison_stack_memory
>> in tramp3d: from ~36K to ~22K.
>>
>> Thanks for clarification.
>> Martin
> 
>   Jakub
> 



Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-03 Thread Jakub Jelinek
Hi!

FYI, I think it is much more important to get the initial patch in, so
resolve the switch + declarations inside of its body stuff, add testcases
for that and post for re-review and check in.
These optimizations can be done even in stage3.

On Thu, Nov 03, 2016 at 02:34:47PM +0100, Martin Liška wrote:
> I'm having a semi-working patch that comes up with the ASAN_POISON built-in. 
> Well, to be honest,
> I still have a feeling that doing the magic with the parallel variable is bit 
> overkill. Maybe
> a new runtime call would make it easier for us.
> 
> However, I still don't fully understand why we want to support just 
> is_gimple_reg variables.
> Let's consider following test-case:
> 
> void foo()
> {
> char *ptr;
>   {
> char my_char[9];
> ptr = _char[0];
>   }
> }
> 
> Where I would expect to optimize out:
>   :
>   _5 = (unsigned long) 9;
>   _4 = (unsigned long) _char;
>   __builtin___asan_unpoison_stack_memory (_4, _5);
>   _7 = (unsigned long) 9;
>   _6 = (unsigned long) _char;
>   __builtin___asan_poison_stack_memory (_6, _7);
>   return;
> 
> where address of my_char is taken in the original source code, while not 
> during tree-ssa
> optimization, where the address is used only by ASAN_MARK calls.

But how would you be able to find out if there isn't any return *ptr; after
the scope or similar (as MEM_REF)?  With is_gimple_reg, they will be turned
into SSA form and you can easily verify (uses of ASAN_POISON are a problem
if they are encountered at runtime).  What would you do for the
must_live_in_memory vars?  Add some pass that detects it, handle it somehow
in addressable pass, handle it in SRA, ... ?

> 
> Doing such transformation can rapidly decrease number of 
> __builtin___asan_{un}poison_stack_memory
> in tramp3d: from ~36K to ~22K.
> 
> Thanks for clarification.
> Martin

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-03 Thread Martin Liška
On 11/02/2016 03:51 PM, Jakub Jelinek wrote:
> On Wed, Nov 02, 2016 at 03:38:25PM +0100, Martin Liška wrote:
>> it converts:
>> foo ()
>> {
>>   char a;
>>   char * p;
>>   char _1;
>>   int _2;
>>   int _8;
>>   int _9;
>>
>>   :
>>   ASAN_MARK (2, , 1);
>>   a = 0;
>>   p_6 = 
>>   ASAN_MARK (1, , 1);
>>   _1 = *p_6;
> 
> You shouldn't convert if a is addressable (when ignoring  in ASAN_MARK
> calls).  Only if there is  just in ASAN_MARK and MEM_REF, you can convert.
> 
>> to:
>>
>> foo ()
>> {
>>   char a;
>>   char * p;
>>   char _1;
>>   int _2;
>>
>>   :
>>   a_10 = 0;
>>   a_12 = ASAN_POISON ();
>>   _1 = a_12;
>>   if (_1 != 0)
>> goto ;
>>   else
>> goto ;
>>
>>   :
>>
>>   :
>>   # _2 = PHI <1(2), 0(3)>
>>   return _2;
>>
>> }
>>
>> and probably the last goal is to convert the newly added internal fn to a 
>> runtime call.
>> Hope sanopt pass is the right place where to it?
> 
> If ASAN_POISON is ECF_CONST and has any uses during sanopt, perhaps best
> would be to add an artificial variable you give the same name as the
> underlying var of the SSA_NAME (and alignment, locus etc.) and poison it
> right away (keep unpoisoning only to the function epilogue) and then
> ASAN_CHECK replace all uses of that SSA_NAME with ASAN_CHECK + use of
> (D) SSA_NAME.
> 
>   Jakub
> 

Hi.

I'm having a semi-working patch that comes up with the ASAN_POISON built-in. 
Well, to be honest,
I still have a feeling that doing the magic with the parallel variable is bit 
overkill. Maybe
a new runtime call would make it easier for us.

However, I still don't fully understand why we want to support just 
is_gimple_reg variables.
Let's consider following test-case:

void foo()
{
char *ptr;
  {
char my_char[9];
ptr = _char[0];
  }
}

Where I would expect to optimize out:
  :
  _5 = (unsigned long) 9;
  _4 = (unsigned long) _char;
  __builtin___asan_unpoison_stack_memory (_4, _5);
  _7 = (unsigned long) 9;
  _6 = (unsigned long) _char;
  __builtin___asan_poison_stack_memory (_6, _7);
  return;

where address of my_char is taken in the original source code, while not during 
tree-ssa
optimization, where the address is used only by ASAN_MARK calls.

Doing such transformation can rapidly decrease number of 
__builtin___asan_{un}poison_stack_memory
in tramp3d: from ~36K to ~22K.

Thanks for clarification.
Martin



Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Martin Liška
On 11/02/2016 03:51 PM, Jakub Jelinek wrote:
> On Wed, Nov 02, 2016 at 03:38:25PM +0100, Martin Liška wrote:
>> it converts:
>> foo ()
>> {
>>   char a;
>>   char * p;
>>   char _1;
>>   int _2;
>>   int _8;
>>   int _9;
>>
>>   :
>>   ASAN_MARK (2, , 1);
>>   a = 0;
>>   p_6 = 
>>   ASAN_MARK (1, , 1);
>>   _1 = *p_6;
> 
> You shouldn't convert if a is addressable (when ignoring  in ASAN_MARK
> calls).  Only if there is  just in ASAN_MARK and MEM_REF, you can convert.

Sure, which should be done in execute_update_addresses_taken via 
gimple_ior_addresses_taken.

> 
>> to:
>>
>> foo ()
>> {
>>   char a;
>>   char * p;
>>   char _1;
>>   int _2;
>>
>>   :
>>   a_10 = 0;
>>   a_12 = ASAN_POISON ();
>>   _1 = a_12;
>>   if (_1 != 0)
>> goto ;
>>   else
>> goto ;
>>
>>   :
>>
>>   :
>>   # _2 = PHI <1(2), 0(3)>
>>   return _2;
>>
>> }
>>
>> and probably the last goal is to convert the newly added internal fn to a 
>> runtime call.
>> Hope sanopt pass is the right place where to it?
> 
> If ASAN_POISON is ECF_CONST and has any uses during sanopt, perhaps best
> would be to add an artificial variable you give the same name as the
> underlying var of the SSA_NAME (and alignment, locus etc.) and poison it
> right away (keep unpoisoning only to the function epilogue) and then
> ASAN_CHECK replace all uses of that SSA_NAME with ASAN_CHECK + use of
> (D) SSA_NAME.

When I create an ASAN_POISON call in execute_update_addresses_taken, there 
would not
be any ASAN_CHECK generated as it's going to be rewritten to SSA form (like the 
previous
sample I sent).

I like the idea of having a parallel variable, which can be poisoned at the 
very beginning of
a function. Whenever we have a use of the SSA_NAME (like a_12 = ASAN_POISON 
()), we can simply
insert BUILT_IN_ASAN_REPORT_LOADx(_variable) statement. No change 
would be necessary
for ASAN runtime in such case.

Will it work?
Thanks,
Martin


> 
>   Jakub
> 



Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Jakub Jelinek
On Wed, Nov 02, 2016 at 03:38:25PM +0100, Martin Liška wrote:
> it converts:
> foo ()
> {
>   char a;
>   char * p;
>   char _1;
>   int _2;
>   int _8;
>   int _9;
> 
>   :
>   ASAN_MARK (2, , 1);
>   a = 0;
>   p_6 = 
>   ASAN_MARK (1, , 1);
>   _1 = *p_6;

You shouldn't convert if a is addressable (when ignoring  in ASAN_MARK
calls).  Only if there is  just in ASAN_MARK and MEM_REF, you can convert.

> to:
> 
> foo ()
> {
>   char a;
>   char * p;
>   char _1;
>   int _2;
> 
>   :
>   a_10 = 0;
>   a_12 = ASAN_POISON ();
>   _1 = a_12;
>   if (_1 != 0)
> goto ;
>   else
> goto ;
> 
>   :
> 
>   :
>   # _2 = PHI <1(2), 0(3)>
>   return _2;
> 
> }
> 
> and probably the last goal is to convert the newly added internal fn to a 
> runtime call.
> Hope sanopt pass is the right place where to it?

If ASAN_POISON is ECF_CONST and has any uses during sanopt, perhaps best
would be to add an artificial variable you give the same name as the
underlying var of the SSA_NAME (and alignment, locus etc.) and poison it
right away (keep unpoisoning only to the function epilogue) and then
ASAN_CHECK replace all uses of that SSA_NAME with ASAN_CHECK + use of
(D) SSA_NAME.

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Martin Liška
On 11/02/2016 02:16 PM, Richard Biener wrote:
> On Wed, Nov 2, 2016 at 2:06 PM, Jakub Jelinek  wrote:
>> On Wed, Nov 02, 2016 at 01:59:00PM +0100, Richard Biener wrote:
 Yeah, that is what I meant.  The issue is how to report uses of such
 SSA_NAME when there is no memory.  So, either we'd need a special runtime
 library entrypoint that would report uses after scope even when there is no
 underlying memory, or we'd need to force it at asan pass time into memory 
 again.
>>>
>>> Well, there can't be any uses outside the scope -- there are no (memory) 
>>> uses
>>> left if we rewrite the thing into SSA.  That is, the address can no
>>> longer "escape".
>>>
>>> Of course there could have been invalid uses before the rewrite into SSA.  
>>> But
>>> those can be diagnosed either immediately before or after re-writing into 
>>> SSA
>>> at compile-time (may be in dead code regions of course).
>>
>> Sure, we can warn on those at compile time, but we really should arrange to
>> error on those at runtime if they are ever executed, the UB happens only at
>> runtime, so in dead code isn't fatal.
> 
> Then we can replace those uses with a call into the asan runtime diagnosing 
> the
> issue instead?
> 
> Richard.
> 
>> Jakub

OK, thanks for the clarification, it's more clear to me. So we want to consider 
for
SSA transformation of ASAN_MARK only is_gimple_reg_types. I'm having a 
test-case where
it converts:
foo ()
{
  char a;
  char * p;
  char _1;
  int _2;
  int _8;
  int _9;

  :
  ASAN_MARK (2, , 1);
  a = 0;
  p_6 = 
  ASAN_MARK (1, , 1);
  _1 = *p_6;
  if (_1 != 0)
goto ;
  else
goto ;

  :
  _9 = 1;
  goto ;

  :
  _8 = 0;

  :
  # _2 = PHI <_9(3), _8(4)>
  return _2;

}

to:

foo ()
{
  char a;
  char * p;
  char _1;
  int _2;

  :
  a_10 = 0;
  a_12 = ASAN_POISON ();
  _1 = a_12;
  if (_1 != 0)
goto ;
  else
goto ;

  :

  :
  # _2 = PHI <1(2), 0(3)>
  return _2;

}

and probably the last goal is to convert the newly added internal fn to a 
runtime call.
Hope sanopt pass is the right place where to it?

Thanks,
Martin



Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Jakub Jelinek
On Wed, Nov 02, 2016 at 03:27:42PM +0100, Martin Liška wrote:
> > So is there anything I should do wrt -Wswitch-unreachable?
> > 
> > Marek
> > 
> 
> Probably not. I'm having a patch puts GIMPLE_SWITCH statement to a proper 
> place
> in GIMPLE_BIND. Let's see whether such patch can bootstrap and survive 
> regression
> tests.

Please do that only for -fsanitize-use-after-scope, it will likely affect at
least for -O0 the debugging experience.  For -O0 -fsanitize=address 
-fsanitize-use-after-scope
perhaps we could arrange for some extra stmt to have the locus of the
switch (where we still don't want the vars to appear in scope) and then
have no locus on the ASAN_MARK and actual GIMPLE_SWITCH or something
similar.

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Martin Liška
On 11/02/2016 03:20 PM, Marek Polacek wrote:
> On Wed, Nov 02, 2016 at 11:10:53AM +0100, Jakub Jelinek wrote:
>> On Wed, Nov 02, 2016 at 10:59:26AM +0100, Jakub Jelinek wrote:
 Which is gimplified as:

 int * ptr;

 switch (argc) , case 1: >
 {
   int a;

   try
 {
   ASAN_MARK (2, , 4);
   :
   goto ;
   :
   ptr = 
   goto ;
 }
   finally
 {
   ASAN_MARK (1, , 4);
 }
>>
>>> Shouldn't there be also ASAN_MARK on the implicit gotos from the switch
>>> statement?  Otherwise, consider this being done in a loop, after the first
>>> iteration you ASAN_MARK (1, , 4) (i.e. poison), then you iterate say with
>>> args 1 and have in case 1: a = 0;, won't that trigger runtime error?
>>
>> Wonder if for the variables declared inside of switch body, because we don't
>> care about uses before scope, it wouldn't be more efficient to arrange for
>> int *p, *q, *r;
>> switch (x)
>>   {
>> int a;
>>   case 1:
>> p = 
>> a = 5;
>> break;
>> int b;
>>   case 2:
>> int c;
>> q = 
>> r = 
>> b = 3;
>> c = 4;
>> break;
>>   }
>> to effectively ASAN_MARK (2, , 4); ASAN_MARK (2, , 4); ASAN_MARK (2, , 
>> 4);
>> before the GIMPLE_SWITCH, instead of unpoisoning them on every case label
>> where they might be in scope.  Though, of course, at least until lower pass
>> that is quite ugly, because it would refer to "not yet declared" variables.
>> Perhaps we'd need to move the ASAN_MARK and GIMPLE_SWITCH (but of course not
>> the expression evaluation of the switch control expression) inside of the
>> switches' GIMPLE_BIND.
> 
> So is there anything I should do wrt -Wswitch-unreachable?
> 
>   Marek
> 

Probably not. I'm having a patch puts GIMPLE_SWITCH statement to a proper place
in GIMPLE_BIND. Let's see whether such patch can bootstrap and survive 
regression
tests.

Thanks,
Martin


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Marek Polacek
On Wed, Nov 02, 2016 at 11:10:53AM +0100, Jakub Jelinek wrote:
> On Wed, Nov 02, 2016 at 10:59:26AM +0100, Jakub Jelinek wrote:
> > > Which is gimplified as:
> > > 
> > > int * ptr;
> > > 
> > > switch (argc) , case 1: >
> > > {
> > >   int a;
> > > 
> > >   try
> > > {
> > >   ASAN_MARK (2, , 4);
> > >   :
> > >   goto ;
> > >   :
> > >   ptr = 
> > >   goto ;
> > > }
> > >   finally
> > > {
> > >   ASAN_MARK (1, , 4);
> > > }
> 
> > Shouldn't there be also ASAN_MARK on the implicit gotos from the switch
> > statement?  Otherwise, consider this being done in a loop, after the first
> > iteration you ASAN_MARK (1, , 4) (i.e. poison), then you iterate say with
> > args 1 and have in case 1: a = 0;, won't that trigger runtime error?
> 
> Wonder if for the variables declared inside of switch body, because we don't
> care about uses before scope, it wouldn't be more efficient to arrange for
> int *p, *q, *r;
> switch (x)
>   {
> int a;
>   case 1:
> p = 
> a = 5;
> break;
> int b;
>   case 2:
> int c;
> q = 
> r = 
> b = 3;
> c = 4;
> break;
>   }
> to effectively ASAN_MARK (2, , 4); ASAN_MARK (2, , 4); ASAN_MARK (2, , 
> 4);
> before the GIMPLE_SWITCH, instead of unpoisoning them on every case label
> where they might be in scope.  Though, of course, at least until lower pass
> that is quite ugly, because it would refer to "not yet declared" variables.
> Perhaps we'd need to move the ASAN_MARK and GIMPLE_SWITCH (but of course not
> the expression evaluation of the switch control expression) inside of the
> switches' GIMPLE_BIND.

So is there anything I should do wrt -Wswitch-unreachable?

Marek


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Richard Biener
On Wed, Nov 2, 2016 at 2:06 PM, Jakub Jelinek  wrote:
> On Wed, Nov 02, 2016 at 01:59:00PM +0100, Richard Biener wrote:
>> > Yeah, that is what I meant.  The issue is how to report uses of such
>> > SSA_NAME when there is no memory.  So, either we'd need a special runtime
>> > library entrypoint that would report uses after scope even when there is no
>> > underlying memory, or we'd need to force it at asan pass time into memory 
>> > again.
>>
>> Well, there can't be any uses outside the scope -- there are no (memory) uses
>> left if we rewrite the thing into SSA.  That is, the address can no
>> longer "escape".
>>
>> Of course there could have been invalid uses before the rewrite into SSA.  
>> But
>> those can be diagnosed either immediately before or after re-writing into SSA
>> at compile-time (may be in dead code regions of course).
>
> Sure, we can warn on those at compile time, but we really should arrange to
> error on those at runtime if they are ever executed, the UB happens only at
> runtime, so in dead code isn't fatal.

Then we can replace those uses with a call into the asan runtime diagnosing the
issue instead?

Richard.

> Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Jakub Jelinek
On Wed, Nov 02, 2016 at 01:59:00PM +0100, Richard Biener wrote:
> > Yeah, that is what I meant.  The issue is how to report uses of such
> > SSA_NAME when there is no memory.  So, either we'd need a special runtime
> > library entrypoint that would report uses after scope even when there is no
> > underlying memory, or we'd need to force it at asan pass time into memory 
> > again.
> 
> Well, there can't be any uses outside the scope -- there are no (memory) uses
> left if we rewrite the thing into SSA.  That is, the address can no
> longer "escape".
> 
> Of course there could have been invalid uses before the rewrite into SSA.  But
> those can be diagnosed either immediately before or after re-writing into SSA
> at compile-time (may be in dead code regions of course).

Sure, we can warn on those at compile time, but we really should arrange to
error on those at runtime if they are ever executed, the UB happens only at
runtime, so in dead code isn't fatal.

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Richard Biener
On Wed, Nov 2, 2016 at 1:56 PM, Jakub Jelinek  wrote:
> On Wed, Nov 02, 2016 at 01:36:31PM +0100, Richard Biener wrote:
>> > Unless I'm missing something we either need to perform further analysis
>> > during the addressable subpass - this variable could be made
>> > non-addressable, but is used in ASAN_MARK, would if we rewrote it into SSA
>> > form there be any SSA uses of the poisoning ASAN_MARK?  If yes, keep it
>> > addressable, otherwise rewrite into SSA.
>> > Or, just rewrite it into SSA always, but turn the poisoning ASAN_MARK into
>> > some new internal ECF_CONST call var_5 = ASAN_POISON (); and if we have any
>> > uses of those, rewrite it back into addressable immediately or later or
>> > something.
>>
>> Or just give up optimizing (asan has a penalty anyway)?  Or
>
> Well, asan has a penalty and -fsanitize-use-after-scope even bigger penalty,
> but the point is to make that penalty bearable.
>
>> re-structure ASAN_POISON ()
>> similar to clobbers:
>>
>>   var = ASAN_POISION ();
>>
>> that avoids the address-taking and thus should be less heavy-weight on
>> optimizations.
>
> Yeah, that is what I meant.  The issue is how to report uses of such
> SSA_NAME when there is no memory.  So, either we'd need a special runtime
> library entrypoint that would report uses after scope even when there is no
> underlying memory, or we'd need to force it at asan pass time into memory 
> again.

Well, there can't be any uses outside the scope -- there are no (memory) uses
left if we rewrite the thing into SSA.  That is, the address can no
longer "escape".

Of course there could have been invalid uses before the rewrite into SSA.  But
those can be diagnosed either immediately before or after re-writing into SSA
at compile-time (may be in dead code regions of course).

Richard.

> Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Jakub Jelinek
On Wed, Nov 02, 2016 at 01:36:31PM +0100, Richard Biener wrote:
> > Unless I'm missing something we either need to perform further analysis
> > during the addressable subpass - this variable could be made
> > non-addressable, but is used in ASAN_MARK, would if we rewrote it into SSA
> > form there be any SSA uses of the poisoning ASAN_MARK?  If yes, keep it
> > addressable, otherwise rewrite into SSA.
> > Or, just rewrite it into SSA always, but turn the poisoning ASAN_MARK into
> > some new internal ECF_CONST call var_5 = ASAN_POISON (); and if we have any
> > uses of those, rewrite it back into addressable immediately or later or
> > something.
> 
> Or just give up optimizing (asan has a penalty anyway)?  Or

Well, asan has a penalty and -fsanitize-use-after-scope even bigger penalty,
but the point is to make that penalty bearable.

> re-structure ASAN_POISON ()
> similar to clobbers:
> 
>   var = ASAN_POISION ();
> 
> that avoids the address-taking and thus should be less heavy-weight on
> optimizations.

Yeah, that is what I meant.  The issue is how to report uses of such
SSA_NAME when there is no memory.  So, either we'd need a special runtime
library entrypoint that would report uses after scope even when there is no
underlying memory, or we'd need to force it at asan pass time into memory again.

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Richard Biener
On Wed, Nov 2, 2016 at 10:52 AM, Jakub Jelinek  wrote:
> On Wed, Nov 02, 2016 at 10:40:35AM +0100, Richard Biener wrote:
>> > I wonder if the sanitize_asan_mark wouldn't better be some PROP_* property
>> > set during the asan pass and kept on until end of compilation of that
>> > function.  That means even if a var only addressable because of ASAN_MARK 
>> > is
>> > discovered after this pass we'd still be able to rewrite it into SSA.
>>
>> Note that being TREE_ADDRESSABLE also has effects on alias analysis
>> (didn't follow the patches to see whether you handle ASAN_MARK specially
>> in points-to analysis and/or alias analysis).
>>
>> Generally in update-address-taken you can handle ASAN_MARK similar to
>> MEM_REF (and drop it in the rewrite phase?).
>
> That is the intent, but we can't do that before the asan pass, because
> otherwise as Martin explained we don't diagnose at runtime bugs where
> a variable is used outside of its scope only through a MEM_REF.
> So we need to wait for asan pass to actually add a real builtin call that
> takes the address in that case.  Except now I really don't see how that
> can work for the case where the var is used only properly when it is in the
> scope, because the asan pass will still see those being addressable.
>
> Unless I'm missing something we either need to perform further analysis
> during the addressable subpass - this variable could be made
> non-addressable, but is used in ASAN_MARK, would if we rewrote it into SSA
> form there be any SSA uses of the poisoning ASAN_MARK?  If yes, keep it
> addressable, otherwise rewrite into SSA.
> Or, just rewrite it into SSA always, but turn the poisoning ASAN_MARK into
> some new internal ECF_CONST call var_5 = ASAN_POISON (); and if we have any
> uses of those, rewrite it back into addressable immediately or later or
> something.

Or just give up optimizing (asan has a penalty anyway)?  Or
re-structure ASAN_POISON ()
similar to clobbers:

  var = ASAN_POISION ();

that avoids the address-taking and thus should be less heavy-weight on
optimizations.

Richard.

>
> Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Jakub Jelinek
On Wed, Nov 02, 2016 at 10:59:26AM +0100, Jakub Jelinek wrote:
> > Which is gimplified as:
> > 
> > int * ptr;
> > 
> > switch (argc) , case 1: >
> > {
> >   int a;
> > 
> >   try
> > {
> >   ASAN_MARK (2, , 4);
> >   :
> >   goto ;
> >   :
> >   ptr = 
> >   goto ;
> > }
> >   finally
> > {
> >   ASAN_MARK (1, , 4);
> > }

> Shouldn't there be also ASAN_MARK on the implicit gotos from the switch
> statement?  Otherwise, consider this being done in a loop, after the first
> iteration you ASAN_MARK (1, , 4) (i.e. poison), then you iterate say with
> args 1 and have in case 1: a = 0;, won't that trigger runtime error?

Wonder if for the variables declared inside of switch body, because we don't
care about uses before scope, it wouldn't be more efficient to arrange for
int *p, *q, *r;
switch (x)
  {
int a;
  case 1:
p = 
a = 5;
break;
int b;
  case 2:
int c;
q = 
r = 
b = 3;
c = 4;
break;
  }
to effectively ASAN_MARK (2, , 4); ASAN_MARK (2, , 4); ASAN_MARK (2, , 4);
before the GIMPLE_SWITCH, instead of unpoisoning them on every case label
where they might be in scope.  Though, of course, at least until lower pass
that is quite ugly, because it would refer to "not yet declared" variables.
Perhaps we'd need to move the ASAN_MARK and GIMPLE_SWITCH (but of course not
the expression evaluation of the switch control expression) inside of the
switches' GIMPLE_BIND.

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Martin Liška
On 11/02/2016 10:59 AM, Jakub Jelinek wrote:
> On Wed, Nov 02, 2016 at 10:36:44AM +0100, Martin Liška wrote:
>> On 11/01/2016 03:53 PM, Jakub Jelinek wrote:
>>> What kind of false positives it is for each case?  Is it with normal
>>> asan-bootstrap (without your -fsanitize-use-after-scope changes), or
>>> only with those changes, or only with those changes and
>>> -fsanitize-use-after-scope used during bootstrap?
>>
>> Ok, the situation is simpler than I thought:
> 
> CCing also Marek.
>>
>> #include 
>>
>> int main(int argc, char **argv)
>> {
>>   int *ptr;
>>
>>   switch (argc)
>> {
>>   int a;
>>
>> case 1:
>>   break;
>>
>> default:
>>   ptr = 
>>   break;
>> }
>>
>>   fprintf (stderr, "v: %d\n", *ptr);
>>   return 0;
>> }
>>
>> Which is gimplified as:
>>
>> int * ptr;
>>
>> switch (argc) , case 1: >
>> {
>>   int a;
>>
>>   try
>> {
>>   ASAN_MARK (2, , 4);
>>   :
>>   goto ;
>>   :
>>   ptr = 
>>   goto ;
>> }
>>   finally
>> {
>>   ASAN_MARK (1, , 4);
>> }
>> }
>> :
>> _1 = *ptr;
>> stderr.0_2 = stderr;
>> fprintf (stderr.0_2, "v: %d\n", _1);
>> D.2577 = 0;
>> return D.2577;
>>   }
>>   D.2577 = 0;
>>   return D.2577;
>>
>> and thus we get:
>> /tmp/switch-case.c:9:11: warning: statement will never be executed 
>> [-Wswitch-unreachable]
>>int a;
>>
>> I'm wondering where properly fix that, we can either find all these 
>> ASAN_MARKs in gimplify_switch_expr
>> and distribute it to all labels (which are gimplified). Or we can put such 
>> variables to asan_poisoned_variables
>> if we have information that we're gimplifing statements before a first 
>> label. Do we know that from gimple context?
>> If so, these variables will be unpoisoned at the very beginning of each 
>> label and the ASAN_MARK call in between
>> switch statement and a first label can be removed.
> 
> Wouldn't it be easiest if -Wswitch-unreachable warning just ignored
> the ASAN_MARK internal calls altogether?
> Do you emit there any other statements, or just ASAN_MARK and nothing else?

Yep, skipping warning can be done easily, however gimplified code is wrong as
un-poisoning is not done for variable (even for a valid program).

> 
> Shouldn't there be also ASAN_MARK on the implicit gotos from the switch
> statement?  Otherwise, consider this being done in a loop, after the first
> iteration you ASAN_MARK (1, , 4) (i.e. poison), then you iterate say with
> args 1 and have in case 1: a = 0;, won't that trigger runtime error?

Hopefully having the un-poisoning call be encapsulated in finally block would 
properly
clean up the variable. Or am I wrong?

Martin

> 
>   Jakub
> 



Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Jakub Jelinek
On Wed, Nov 02, 2016 at 10:36:44AM +0100, Martin Liška wrote:
> On 11/01/2016 03:53 PM, Jakub Jelinek wrote:
> > What kind of false positives it is for each case?  Is it with normal
> > asan-bootstrap (without your -fsanitize-use-after-scope changes), or
> > only with those changes, or only with those changes and
> > -fsanitize-use-after-scope used during bootstrap?
> 
> Ok, the situation is simpler than I thought:

CCing also Marek.
> 
> #include 
> 
> int main(int argc, char **argv)
> {
>   int *ptr;
> 
>   switch (argc)
> {
>   int a;
> 
> case 1:
>   break;
> 
> default:
>   ptr = 
>   break;
> }
> 
>   fprintf (stderr, "v: %d\n", *ptr);
>   return 0;
> }
> 
> Which is gimplified as:
> 
> int * ptr;
> 
> switch (argc) , case 1: >
> {
>   int a;
> 
>   try
> {
>   ASAN_MARK (2, , 4);
>   :
>   goto ;
>   :
>   ptr = 
>   goto ;
> }
>   finally
> {
>   ASAN_MARK (1, , 4);
> }
> }
> :
> _1 = *ptr;
> stderr.0_2 = stderr;
> fprintf (stderr.0_2, "v: %d\n", _1);
> D.2577 = 0;
> return D.2577;
>   }
>   D.2577 = 0;
>   return D.2577;
> 
> and thus we get:
> /tmp/switch-case.c:9:11: warning: statement will never be executed 
> [-Wswitch-unreachable]
>int a;
> 
> I'm wondering where properly fix that, we can either find all these 
> ASAN_MARKs in gimplify_switch_expr
> and distribute it to all labels (which are gimplified). Or we can put such 
> variables to asan_poisoned_variables
> if we have information that we're gimplifing statements before a first label. 
> Do we know that from gimple context?
> If so, these variables will be unpoisoned at the very beginning of each label 
> and the ASAN_MARK call in between
> switch statement and a first label can be removed.

Wouldn't it be easiest if -Wswitch-unreachable warning just ignored
the ASAN_MARK internal calls altogether?
Do you emit there any other statements, or just ASAN_MARK and nothing else?

Shouldn't there be also ASAN_MARK on the implicit gotos from the switch
statement?  Otherwise, consider this being done in a loop, after the first
iteration you ASAN_MARK (1, , 4) (i.e. poison), then you iterate say with
args 1 and have in case 1: a = 0;, won't that trigger runtime error?

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Jakub Jelinek
On Wed, Nov 02, 2016 at 10:40:35AM +0100, Richard Biener wrote:
> > I wonder if the sanitize_asan_mark wouldn't better be some PROP_* property
> > set during the asan pass and kept on until end of compilation of that
> > function.  That means even if a var only addressable because of ASAN_MARK is
> > discovered after this pass we'd still be able to rewrite it into SSA.
> 
> Note that being TREE_ADDRESSABLE also has effects on alias analysis
> (didn't follow the patches to see whether you handle ASAN_MARK specially
> in points-to analysis and/or alias analysis).
> 
> Generally in update-address-taken you can handle ASAN_MARK similar to
> MEM_REF (and drop it in the rewrite phase?).

That is the intent, but we can't do that before the asan pass, because
otherwise as Martin explained we don't diagnose at runtime bugs where
a variable is used outside of its scope only through a MEM_REF.
So we need to wait for asan pass to actually add a real builtin call that
takes the address in that case.  Except now I really don't see how that
can work for the case where the var is used only properly when it is in the
scope, because the asan pass will still see those being addressable.

Unless I'm missing something we either need to perform further analysis
during the addressable subpass - this variable could be made
non-addressable, but is used in ASAN_MARK, would if we rewrote it into SSA
form there be any SSA uses of the poisoning ASAN_MARK?  If yes, keep it
addressable, otherwise rewrite into SSA.
Or, just rewrite it into SSA always, but turn the poisoning ASAN_MARK into
some new internal ECF_CONST call var_5 = ASAN_POISON (); and if we have any
uses of those, rewrite it back into addressable immediately or later or
something.

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Martin Liška
On 11/01/2016 04:12 PM, Jakub Jelinek wrote:
> On Tue, Nov 01, 2016 at 03:53:46PM +0100, Martin Liška wrote:
>> @@ -1504,7 +1505,7 @@ non_rewritable_lvalue_p (tree lhs)
>>  
>>  static void
>>  maybe_optimize_var (tree var, bitmap addresses_taken, bitmap not_reg_needs,
>> -bitmap suitable_for_renaming)
>> +bitmap suitable_for_renaming, bitmap marked_nonaddressable)
>>  {
>>/* Global Variables, result decls cannot be changed.  */
>>if (is_global_var (var)
>> @@ -1522,6 +1523,7 @@ maybe_optimize_var (tree var, bitmap addresses_taken, 
>> bitmap not_reg_needs,
>>|| !bitmap_bit_p (not_reg_needs, DECL_UID (var
>>  {
>>TREE_ADDRESSABLE (var) = 0;
>> +  bitmap_set_bit (marked_nonaddressable, DECL_UID (var));
> 
> Why do you need the marked_nonaddressable bitmap?

Because the later loop (which visits every gimple statement) iterates only
if there's an entry in suitable_for_renaming.

> 
>>if (is_gimple_reg (var))
>>  bitmap_set_bit (suitable_for_renaming, DECL_UID (var));
>>if (dump_file)
>> @@ -1550,20 +1552,43 @@ maybe_optimize_var (tree var, bitmap 
>> addresses_taken, bitmap not_reg_needs,
>>  }
>>  }
>>  
>> -/* Compute TREE_ADDRESSABLE and DECL_GIMPLE_REG_P for local variables.  */
>> +/* Return true when STMT is ASAN mark where second argument is an address
>> +   of a local variable.  */
>>  
>> -void
>> -execute_update_addresses_taken (void)
>> +static bool
>> +is_asan_mark_p (gimple *stmt)
>> +{
>> +  if (!gimple_call_internal_p (stmt, IFN_ASAN_MARK))
>> +return false;
>> +
>> +  tree addr = get_base_address (gimple_call_arg (stmt, 1));
>> +  if (TREE_CODE (addr) == ADDR_EXPR
>> +  && TREE_CODE (TREE_OPERAND (addr, 0)) == VAR_DECL)
> 
> Just check here if dropping TREE_ADDRESSABLE from the VAR (use VAR_P btw)
> would turn it into is_gimple_reg), and don't return true if not.

Well, the predicate is called once before maybe_optimize_var, thus I need to 
have
it conservative and not consider TREE_ADDRESSABLE flag. Having argument would 
work
for that?

> 
>> +return true;
>> +
>> +  return false;
>> +}
>> +
>> +/* Compute TREE_ADDRESSABLE and DECL_GIMPLE_REG_P for local variables.
>> +   If SANITIZE_ASAN_MARK is set to true, sanitize also ASAN_MARK built-ins. 
>>  */
>> +
>> +
>> +static void
>> +execute_update_addresses_taken (bool sanitize_asan_mark = false)
> 
> I wonder if the sanitize_asan_mark wouldn't better be some PROP_* property
> set during the asan pass and kept on until end of compilation of that
> function.  That means even if a var only addressable because of ASAN_MARK is
> discovered after this pass we'd still be able to rewrite it into SSA.

It's doable (please see attached patch) and also nicer. However, one would need 
to
extend curr_properties to long type as we already use 16 existing values.

Martin

> 
>   Jakub
> 

>From ad5f68a010674118fac7ca8b6953f7b99fd3c2a8 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 1 Nov 2016 11:21:20 +0100
Subject: [PATCH] Use-after-scope: do not mark variables that are no longer
 addressable

gcc/ChangeLog:

2016-11-02  Martin Liska  

	* asan.c: Update properties_provided and todo_flags_finish.
	* function.h (struct GTY): Change int to long as there's not
	enough space for a new value.
	* tree-pass.h: Define PROP_asan_check_done.
	* tree-ssa.c (maybe_optimize_var): Add new argument.
	(is_asan_mark_p): New function.
	(execute_update_addresses_taken): Handle ASAN_MARK internal fns.
---
 gcc/asan.c  |  5 +++--
 gcc/function.h  |  2 +-
 gcc/tree-pass.h |  1 +
 gcc/tree-ssa.c  | 69 +
 4 files changed, 60 insertions(+), 17 deletions(-)

diff --git a/gcc/asan.c b/gcc/asan.c
index 95495d2..94ee877 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -59,6 +59,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "builtins.h"
 #include "fnmatch.h"
+#include "tree-ssa.h"
 
 /* AddressSanitizer finds out-of-bounds and use-after-free bugs
with <2x slowdown on average.
@@ -2993,10 +2994,10 @@ const pass_data pass_data_asan =
   OPTGROUP_NONE, /* optinfo_flags */
   TV_NONE, /* tv_id */
   ( PROP_ssa | PROP_cfg | PROP_gimple_leh ), /* properties_required */
-  0, /* properties_provided */
+  PROP_asan_check_done, /* properties_provided */
   0, /* properties_destroyed */
   0, /* todo_flags_start */
-  TODO_update_ssa, /* todo_flags_finish */
+  TODO_update_ssa | TODO_update_address_taken, /* todo_flags_finish */
 };
 
 class pass_asan : public gimple_opt_pass
diff --git a/gcc/function.h b/gcc/function.h
index e854c7f..5600488 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -289,7 +289,7 @@ struct GTY(()) function {
   location_t function_end_locus;
 
   /* Properties used by the pass manager.  */
-  unsigned int curr_properties;
+  unsigned long curr_properties;
   unsigned int last_verified;
 
   /* Non-null if the function does 

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Martin Liška
On 11/02/2016 10:40 AM, Richard Biener wrote:
> On Tue, Nov 1, 2016 at 4:12 PM, Jakub Jelinek  wrote:
>> On Tue, Nov 01, 2016 at 03:53:46PM +0100, Martin Liška wrote:
>>> @@ -1504,7 +1505,7 @@ non_rewritable_lvalue_p (tree lhs)
>>>
>>>  static void
>>>  maybe_optimize_var (tree var, bitmap addresses_taken, bitmap not_reg_needs,
>>> - bitmap suitable_for_renaming)
>>> + bitmap suitable_for_renaming, bitmap 
>>> marked_nonaddressable)
>>>  {
>>>/* Global Variables, result decls cannot be changed.  */
>>>if (is_global_var (var)
>>> @@ -1522,6 +1523,7 @@ maybe_optimize_var (tree var, bitmap addresses_taken, 
>>> bitmap not_reg_needs,
>>> || !bitmap_bit_p (not_reg_needs, DECL_UID (var
>>>  {
>>>TREE_ADDRESSABLE (var) = 0;
>>> +  bitmap_set_bit (marked_nonaddressable, DECL_UID (var));
>>
>> Why do you need the marked_nonaddressable bitmap?
>>
>>>if (is_gimple_reg (var))
>>>   bitmap_set_bit (suitable_for_renaming, DECL_UID (var));
>>>if (dump_file)
>>> @@ -1550,20 +1552,43 @@ maybe_optimize_var (tree var, bitmap 
>>> addresses_taken, bitmap not_reg_needs,
>>>  }
>>>  }
>>>
>>> -/* Compute TREE_ADDRESSABLE and DECL_GIMPLE_REG_P for local variables.  */
>>> +/* Return true when STMT is ASAN mark where second argument is an address
>>> +   of a local variable.  */
>>>
>>> -void
>>> -execute_update_addresses_taken (void)
>>> +static bool
>>> +is_asan_mark_p (gimple *stmt)
>>> +{
>>> +  if (!gimple_call_internal_p (stmt, IFN_ASAN_MARK))
>>> +return false;
>>> +
>>> +  tree addr = get_base_address (gimple_call_arg (stmt, 1));
>>> +  if (TREE_CODE (addr) == ADDR_EXPR
>>> +  && TREE_CODE (TREE_OPERAND (addr, 0)) == VAR_DECL)
>>
>> Just check here if dropping TREE_ADDRESSABLE from the VAR (use VAR_P btw)
>> would turn it into is_gimple_reg), and don't return true if not.
>>
>>> +return true;
>>> +
>>> +  return false;
>>> +}
>>> +
>>> +/* Compute TREE_ADDRESSABLE and DECL_GIMPLE_REG_P for local variables.
>>> +   If SANITIZE_ASAN_MARK is set to true, sanitize also ASAN_MARK 
>>> built-ins.  */
>>> +
>>> +
>>> +static void
>>> +execute_update_addresses_taken (bool sanitize_asan_mark = false)
>>
>> I wonder if the sanitize_asan_mark wouldn't better be some PROP_* property
>> set during the asan pass and kept on until end of compilation of that
>> function.  That means even if a var only addressable because of ASAN_MARK is
>> discovered after this pass we'd still be able to rewrite it into SSA.
> 
> Note that being TREE_ADDRESSABLE also has effects on alias analysis
> (didn't follow the patches to see whether you handle ASAN_MARK specially
> in points-to analysis and/or alias analysis).

Currently all manipulation with shadow memory is done via a pointer type
which has created a separate aliasing set:

static void
asan_init_shadow_ptr_types (void)
{
  asan_shadow_set = new_alias_set ();
  tree types[3] = { signed_char_type_node, short_integer_type_node,
integer_type_node };

  for (unsigned i = 0; i < 3; i++)
{
  shadow_ptr_types[i] = build_distinct_type_copy (types[i]);
  TYPE_ALIAS_SET (shadow_ptr_types[i]) = asan_shadow_set;
  shadow_ptr_types[i] = build_pointer_type (shadow_ptr_types[i]);
}
...

Martin

> 
> Generally in update-address-taken you can handle ASAN_MARK similar to
> MEM_REF (and drop it in the rewrite phase?).
> 
> As said, I didnt look at the patches and just came by here seeing
> tree-ssa.c pieces...
> 
> Richard.
> 
>> Jakub



Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Richard Biener
On Tue, Nov 1, 2016 at 4:12 PM, Jakub Jelinek  wrote:
> On Tue, Nov 01, 2016 at 03:53:46PM +0100, Martin Liška wrote:
>> @@ -1504,7 +1505,7 @@ non_rewritable_lvalue_p (tree lhs)
>>
>>  static void
>>  maybe_optimize_var (tree var, bitmap addresses_taken, bitmap not_reg_needs,
>> - bitmap suitable_for_renaming)
>> + bitmap suitable_for_renaming, bitmap marked_nonaddressable)
>>  {
>>/* Global Variables, result decls cannot be changed.  */
>>if (is_global_var (var)
>> @@ -1522,6 +1523,7 @@ maybe_optimize_var (tree var, bitmap addresses_taken, 
>> bitmap not_reg_needs,
>> || !bitmap_bit_p (not_reg_needs, DECL_UID (var
>>  {
>>TREE_ADDRESSABLE (var) = 0;
>> +  bitmap_set_bit (marked_nonaddressable, DECL_UID (var));
>
> Why do you need the marked_nonaddressable bitmap?
>
>>if (is_gimple_reg (var))
>>   bitmap_set_bit (suitable_for_renaming, DECL_UID (var));
>>if (dump_file)
>> @@ -1550,20 +1552,43 @@ maybe_optimize_var (tree var, bitmap 
>> addresses_taken, bitmap not_reg_needs,
>>  }
>>  }
>>
>> -/* Compute TREE_ADDRESSABLE and DECL_GIMPLE_REG_P for local variables.  */
>> +/* Return true when STMT is ASAN mark where second argument is an address
>> +   of a local variable.  */
>>
>> -void
>> -execute_update_addresses_taken (void)
>> +static bool
>> +is_asan_mark_p (gimple *stmt)
>> +{
>> +  if (!gimple_call_internal_p (stmt, IFN_ASAN_MARK))
>> +return false;
>> +
>> +  tree addr = get_base_address (gimple_call_arg (stmt, 1));
>> +  if (TREE_CODE (addr) == ADDR_EXPR
>> +  && TREE_CODE (TREE_OPERAND (addr, 0)) == VAR_DECL)
>
> Just check here if dropping TREE_ADDRESSABLE from the VAR (use VAR_P btw)
> would turn it into is_gimple_reg), and don't return true if not.
>
>> +return true;
>> +
>> +  return false;
>> +}
>> +
>> +/* Compute TREE_ADDRESSABLE and DECL_GIMPLE_REG_P for local variables.
>> +   If SANITIZE_ASAN_MARK is set to true, sanitize also ASAN_MARK built-ins. 
>>  */
>> +
>> +
>> +static void
>> +execute_update_addresses_taken (bool sanitize_asan_mark = false)
>
> I wonder if the sanitize_asan_mark wouldn't better be some PROP_* property
> set during the asan pass and kept on until end of compilation of that
> function.  That means even if a var only addressable because of ASAN_MARK is
> discovered after this pass we'd still be able to rewrite it into SSA.

Note that being TREE_ADDRESSABLE also has effects on alias analysis
(didn't follow the patches to see whether you handle ASAN_MARK specially
in points-to analysis and/or alias analysis).

Generally in update-address-taken you can handle ASAN_MARK similar to
MEM_REF (and drop it in the rewrite phase?).

As said, I didnt look at the patches and just came by here seeing
tree-ssa.c pieces...

Richard.

> Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Martin Liška
On 11/01/2016 03:53 PM, Jakub Jelinek wrote:
> What kind of false positives it is for each case?  Is it with normal
> asan-bootstrap (without your -fsanitize-use-after-scope changes), or
> only with those changes, or only with those changes and
> -fsanitize-use-after-scope used during bootstrap?

Ok, the situation is simpler than I thought:

#include 

int main(int argc, char **argv)
{
  int *ptr;

  switch (argc)
{
  int a;

case 1:
  break;

default:
  ptr = 
  break;
}

  fprintf (stderr, "v: %d\n", *ptr);
  return 0;
}

Which is gimplified as:

int * ptr;

switch (argc) , case 1: >
{
  int a;

  try
{
  ASAN_MARK (2, , 4);
  :
  goto ;
  :
  ptr = 
  goto ;
}
  finally
{
  ASAN_MARK (1, , 4);
}
}
:
_1 = *ptr;
stderr.0_2 = stderr;
fprintf (stderr.0_2, "v: %d\n", _1);
D.2577 = 0;
return D.2577;
  }
  D.2577 = 0;
  return D.2577;

and thus we get:
/tmp/switch-case.c:9:11: warning: statement will never be executed 
[-Wswitch-unreachable]
   int a;

I'm wondering where properly fix that, we can either find all these ASAN_MARKs 
in gimplify_switch_expr
and distribute it to all labels (which are gimplified). Or we can put such 
variables to asan_poisoned_variables
if we have information that we're gimplifing statements before a first label. 
Do we know that from gimple context?
If so, these variables will be unpoisoned at the very beginning of each label 
and the ASAN_MARK call in between
switch statement and a first label can be removed.

Thoughts?
Thanks,
Martin



Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-01 Thread Jakub Jelinek
On Tue, Nov 01, 2016 at 03:53:46PM +0100, Martin Liška wrote:
> @@ -1504,7 +1505,7 @@ non_rewritable_lvalue_p (tree lhs)
>  
>  static void
>  maybe_optimize_var (tree var, bitmap addresses_taken, bitmap not_reg_needs,
> - bitmap suitable_for_renaming)
> + bitmap suitable_for_renaming, bitmap marked_nonaddressable)
>  {
>/* Global Variables, result decls cannot be changed.  */
>if (is_global_var (var)
> @@ -1522,6 +1523,7 @@ maybe_optimize_var (tree var, bitmap addresses_taken, 
> bitmap not_reg_needs,
> || !bitmap_bit_p (not_reg_needs, DECL_UID (var
>  {
>TREE_ADDRESSABLE (var) = 0;
> +  bitmap_set_bit (marked_nonaddressable, DECL_UID (var));

Why do you need the marked_nonaddressable bitmap?

>if (is_gimple_reg (var))
>   bitmap_set_bit (suitable_for_renaming, DECL_UID (var));
>if (dump_file)
> @@ -1550,20 +1552,43 @@ maybe_optimize_var (tree var, bitmap addresses_taken, 
> bitmap not_reg_needs,
>  }
>  }
>  
> -/* Compute TREE_ADDRESSABLE and DECL_GIMPLE_REG_P for local variables.  */
> +/* Return true when STMT is ASAN mark where second argument is an address
> +   of a local variable.  */
>  
> -void
> -execute_update_addresses_taken (void)
> +static bool
> +is_asan_mark_p (gimple *stmt)
> +{
> +  if (!gimple_call_internal_p (stmt, IFN_ASAN_MARK))
> +return false;
> +
> +  tree addr = get_base_address (gimple_call_arg (stmt, 1));
> +  if (TREE_CODE (addr) == ADDR_EXPR
> +  && TREE_CODE (TREE_OPERAND (addr, 0)) == VAR_DECL)

Just check here if dropping TREE_ADDRESSABLE from the VAR (use VAR_P btw)
would turn it into is_gimple_reg), and don't return true if not.

> +return true;
> +
> +  return false;
> +}
> +
> +/* Compute TREE_ADDRESSABLE and DECL_GIMPLE_REG_P for local variables.
> +   If SANITIZE_ASAN_MARK is set to true, sanitize also ASAN_MARK built-ins.  
> */
> +
> +
> +static void
> +execute_update_addresses_taken (bool sanitize_asan_mark = false)

I wonder if the sanitize_asan_mark wouldn't better be some PROP_* property
set during the asan pass and kept on until end of compilation of that
function.  That means even if a var only addressable because of ASAN_MARK is
discovered after this pass we'd still be able to rewrite it into SSA.

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-01 Thread Martin Liška
On 11/01/2016 03:53 PM, Jakub Jelinek wrote:
> On Tue, Nov 01, 2016 at 03:47:54PM +0100, Martin Liška wrote:
>> On 10/27/2016 07:23 PM, Jakub Jelinek wrote:
>>> Ok for trunk with that change.
>>>
>>> Jakub
>>
>> Hello.
>>
>> I'll commit the patch as soon as following patch would be accepted. The patch
>> fixes false positives when running asan-bootstrap.
> 
> What kind of false positives it is for each case?  Is it with normal
> asan-bootstrap (without your -fsanitize-use-after-scope changes), or
> only with those changes, or only with those changes and
> -fsanitize-use-after-scope used during bootstrap?

It's only with those changes as -fsanitize-address-use-after-scope is enabled by
default with -fsanitize=address. Current bootstrap-asan works fine. I'll 
re-trigger
the bootstrap again, but IIRC the main culprit was ASAN_MARK poisoning done at 
switch labels
(which should be partially fixed with a newer version of the patch).

Martin

> 
>> >From b62e4d7ffe659ec338ef83e84ccb572a07264283 Mon Sep 17 00:00:00 2001
>> From: marxin 
>> Date: Tue, 20 Sep 2016 10:31:25 +0200
>> Subject: [PATCH 1/4] Fix ASAN bootstrap uninitialized warning.
>>
>> gcc/ChangeLog:
>>
>> 2016-09-26  Martin Liska  
>>
>>  * ipa-devirt.c (record_targets_from_bases): Initialize a
>>  variable.
>>  * omp-low.c (lower_omp_target): Remove a variable from
>>  scope defined by a switch statement.
>>  * tree-dump.c (dequeue_and_dump): Likewise.
>>
>> gcc/java/ChangeLog:
>>
>> 2016-09-26  Martin Liska  
>>
>>  * mangle.c (mangle_type): Remove a variable from
>>  scope defined by a switch statement.
>> ---
>>  gcc/ipa-devirt.c |  2 +-
>>  gcc/omp-low.c| 11 ---
>>  gcc/tree-dump.c  |  8 +++-
>>  3 files changed, 8 insertions(+), 13 deletions(-)
>>
>> diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c
>> index 49e2195..5c0ae72 100644
>> --- a/gcc/ipa-devirt.c
>> +++ b/gcc/ipa-devirt.c
>> @@ -2862,7 +2862,7 @@ record_targets_from_bases (tree otr_type,
>>  {
>>while (true)
>>  {
>> -  HOST_WIDE_INT pos, size;
>> +  HOST_WIDE_INT pos = 0, size;
>>tree base_binfo;
>>tree fld;
>>  
>> diff --git a/gcc/omp-low.c b/gcc/omp-low.c
>> index e5b9e4c..62c9e5c 100644
>> --- a/gcc/omp-low.c
>> +++ b/gcc/omp-low.c
>> @@ -15803,11 +15803,10 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
>> omp_context *ctx)
>>push_gimplify_context ();
>>fplist = NULL;
>>  
>> +  tree var, x;
>>for (c = clauses; c ; c = OMP_CLAUSE_CHAIN (c))
>>  switch (OMP_CLAUSE_CODE (c))
>>{
>> -tree var, x;
>> -
>>default:
>>  break;
>>case OMP_CLAUSE_MAP:
>> @@ -16066,12 +16065,11 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
>> omp_context *ctx)
>>vec_alloc (vkind, map_cnt);
>>unsigned int map_idx = 0;
>>  
>> +  tree ovar, nc, s, purpose, var, x, type;
>> +  unsigned int talign;
>>for (c = clauses; c ; c = OMP_CLAUSE_CHAIN (c))
>>  switch (OMP_CLAUSE_CODE (c))
>>{
>> -tree ovar, nc, s, purpose, var, x, type;
>> -unsigned int talign;
>> -
>>default:
>>  break;
>>  
>> @@ -16442,10 +16440,10 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
>> omp_context *ctx)
>>if (offloaded || data_region)
>>  {
>>tree prev = NULL_TREE;
>> +  tree var, x;
>>for (c = clauses; c ; c = OMP_CLAUSE_CHAIN (c))
>>  switch (OMP_CLAUSE_CODE (c))
>>{
>> -tree var, x;
>>default:
>>  break;
>>case OMP_CLAUSE_FIRSTPRIVATE:
>> @@ -16594,7 +16592,6 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
>> omp_context *ctx)
>>for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
>>  switch (OMP_CLAUSE_CODE (c))
>>{
>> -tree var;
>>default:
>>  break;
>>case OMP_CLAUSE_MAP:
>> diff --git a/gcc/tree-dump.c b/gcc/tree-dump.c
>> index df47181..89f72a0 100644
>> --- a/gcc/tree-dump.c
>> +++ b/gcc/tree-dump.c
>> @@ -420,8 +420,6 @@ dequeue_and_dump (dump_info_p di)
>>/* Now handle the various kinds of nodes.  */
>>switch (code)
>>  {
>> -  int i;
>> -
>>  case IDENTIFIER_NODE:
>>dump_string_field (di, "strg", IDENTIFIER_POINTER (t));
>>dump_int (di, "lngt", IDENTIFIER_LENGTH (t));
>> @@ -435,6 +433,7 @@ dequeue_and_dump (dump_info_p di)
>>  
>>  case STATEMENT_LIST:
>>{
>> +int i;
>>  tree_stmt_iterator it;
>>  for (i = 0, it = tsi_start (t); !tsi_end_p (it); tsi_next (), i++)
>>{
>> @@ -447,7 +446,7 @@ dequeue_and_dump (dump_info_p di)
>>  
>>  case TREE_VEC:
>>dump_int (di, "lngt", TREE_VEC_LENGTH (t));
>> -  for (i = 0; i < TREE_VEC_LENGTH (t); ++i)
>> +  for (int i = 0; i < TREE_VEC_LENGTH (t); ++i)
>>  {
>>char buffer[32];
>>sprintf (buffer, "%u", i);
>> @@ -707,9 +706,8 @@ dequeue_and_dump (dump_info_p di)
>>break;
>>  case OMP_CLAUSE:

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-01 Thread Martin Liška
On 10/27/2016 07:23 PM, Jakub Jelinek wrote:
> On Thu, Oct 27, 2016 at 04:40:30PM +0200, Martin Liška wrote:
>> On 10/21/2016 04:26 PM, Jakub Jelinek wrote:
>>> On Wed, Oct 12, 2016 at 04:07:53PM +0200, Martin Liška wrote:
> Ok, first let me list some needed follow-ups that don't need to be handled
> right away:
> - r237814-like changes for ASAN_MARK
>>
>> I've spent quite some on that and that's what I begin 
>> (use-after-scope-addressable.patch).
>> Problem is that as I ignore all ASAN_MARK internal fns, the code does not 
>> detect having address
>> taken in:
>>
>> _2 = MEM[(char *)_char + 8B];
>>
>>   char *ptr;
>>   {
>> char my_char[9];
>> ptr = _char[0];
>>   }
>>
>>   return *(ptr+8);
>>
>> and thus the code in tree-ssa.c (maybe_optimize_var) sets TREE_ADDRESSABLE 
>> (var) = 0.
> 
> Perhaps we should do that only if the var's type is_gimple_reg_type,
> then we'd rewrite that into SSA at that time, right?  So, in theory if we
> turned the ASAN_MARK poisoning call into another internal function
> (var_5 = ASAN_POISON ()) and then after converting it into SSA looked at
> all the uses of such an lhs and perhaps at sanopt part or when marked all
> the use places with a library call that would complain at runtime?
> Or turn those back at sanopt time into addressable memory loads which would
> be poisoned or similar?  Or alternatively, immediately before turning
> variables addressable just because of ASAN_MARK into non-addressable use
> the same framework into-ssa uses to find out if there are any poisoned
> accesses, and just not optimize it in that case.
> Anyway, this can be done incrementally.

I've done a patch candidate (not tested yet) which is capable of ASAN_MARK 
removal
for local variables that can be rewritten into SSA. This is done by running 
execute_update_addresses_taken
after we create ASAN_CHECK internal fns and skipping all ASAN_MARK for having 
address taken considerations.
This removes significant number of ASAN_MARK fns in tramp3d (due to C++ 
temporaries).

> 
>> Second question I have is whether we want to handle just TREE_ADDRESSABLE 
>> stuff during gimplification?
>> Basically in a way that the current patch is doing?
> 
> How could variables that aren't TREE_ADDRESSABLE during gimplification be
> accessed out of scope?

Yep, TREE_ADDRESSABLE guard does what it should do.

I'm going to test the patch which can be installed incrementally.

Martin

> 
>> +/* Return true if DECL should be guarded on the stack.  */
>> +
>> +static inline bool
>> +asan_protect_stack_decl (tree decl)
>> +{
>> +  return DECL_P (decl)
>> +&& (!DECL_ARTIFICIAL (decl)
>> +|| (asan_sanitize_use_after_scope () && TREE_ADDRESSABLE (decl)));
> 
> Bad formatting.  Should be:
> 
>   return (DECL_P (decl)
> && (!DECL_ARTIFICIAL (decl)
> || (asan_sanitize_use_after_scope ()
> && TREE_ADDRESSABLE (decl;
> 
> Ok for trunk with that change.
> 
>   Jakub
> 

>From cf860324da41244745f04a16b184fabe343ac5d9 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 1 Nov 2016 11:21:20 +0100
Subject: [PATCH 4/4] Use-after-scope: do not mark variables that are no longer
 addressable

gcc/ChangeLog:

2016-11-01  Martin Liska  

	* asan.c (asan_instrument): Call
	execute_update_addresses_taken_asan_sanitize right after
	sanitization.
	* tree-ssa.c (maybe_optimize_var): Mark all variables set to
	non-addressable.
	(is_asan_mark_p): New function.
	(execute_update_addresses_taken): Likewise.
	(execute_update_addresses_taken_asan_sanitize): Likewise.
	* tree-ssa.h (execute_update_addresses_taken_asan_sanitize):
	Declare new function.
---
 gcc/asan.c |  4 +++
 gcc/tree-ssa.c | 96 +++---
 gcc/tree-ssa.h |  1 +
 3 files changed, 84 insertions(+), 17 deletions(-)

diff --git a/gcc/asan.c b/gcc/asan.c
index 95495d2..5cb37c8 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -59,6 +59,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "builtins.h"
 #include "fnmatch.h"
+#include "tree-ssa.h"
 
 /* AddressSanitizer finds out-of-bounds and use-after-free bugs
with <2x slowdown on average.
@@ -2973,6 +2974,9 @@ asan_instrument (void)
   if (shadow_ptr_types[0] == NULL_TREE)
 asan_init_shadow_ptr_types ();
   transform_statements ();
+
+  if (optimize)
+execute_update_addresses_taken_asan_sanitize ();
   return 0;
 }
 
diff --git a/gcc/tree-ssa.c b/gcc/tree-ssa.c
index 135952b..0633b21 100644
--- a/gcc/tree-ssa.c
+++ b/gcc/tree-ssa.c
@@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgexpand.h"
 #include "tree-cfg.h"
 #include "tree-dfa.h"
+#include "asan.h"
 
 /* Pointer map of variable mappings, keyed by edge.  */
 static hash_map *edge_var_maps;
@@ -1504,7 +1505,7 @@ non_rewritable_lvalue_p (tree lhs)
 
 static void
 maybe_optimize_var (tree var, bitmap addresses_taken, bitmap not_reg_needs,
-		 

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-01 Thread Jakub Jelinek
On Tue, Nov 01, 2016 at 03:47:54PM +0100, Martin Liška wrote:
> On 10/27/2016 07:23 PM, Jakub Jelinek wrote:
> > Ok for trunk with that change.
> > 
> > Jakub
> 
> Hello.
> 
> I'll commit the patch as soon as following patch would be accepted. The patch
> fixes false positives when running asan-bootstrap.

What kind of false positives it is for each case?  Is it with normal
asan-bootstrap (without your -fsanitize-use-after-scope changes), or
only with those changes, or only with those changes and
-fsanitize-use-after-scope used during bootstrap?

> >From b62e4d7ffe659ec338ef83e84ccb572a07264283 Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Tue, 20 Sep 2016 10:31:25 +0200
> Subject: [PATCH 1/4] Fix ASAN bootstrap uninitialized warning.
> 
> gcc/ChangeLog:
> 
> 2016-09-26  Martin Liska  
> 
>   * ipa-devirt.c (record_targets_from_bases): Initialize a
>   variable.
>   * omp-low.c (lower_omp_target): Remove a variable from
>   scope defined by a switch statement.
>   * tree-dump.c (dequeue_and_dump): Likewise.
> 
> gcc/java/ChangeLog:
> 
> 2016-09-26  Martin Liska  
> 
>   * mangle.c (mangle_type): Remove a variable from
>   scope defined by a switch statement.
> ---
>  gcc/ipa-devirt.c |  2 +-
>  gcc/omp-low.c| 11 ---
>  gcc/tree-dump.c  |  8 +++-
>  3 files changed, 8 insertions(+), 13 deletions(-)
> 
> diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c
> index 49e2195..5c0ae72 100644
> --- a/gcc/ipa-devirt.c
> +++ b/gcc/ipa-devirt.c
> @@ -2862,7 +2862,7 @@ record_targets_from_bases (tree otr_type,
>  {
>while (true)
>  {
> -  HOST_WIDE_INT pos, size;
> +  HOST_WIDE_INT pos = 0, size;
>tree base_binfo;
>tree fld;
>  
> diff --git a/gcc/omp-low.c b/gcc/omp-low.c
> index e5b9e4c..62c9e5c 100644
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -15803,11 +15803,10 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
> omp_context *ctx)
>push_gimplify_context ();
>fplist = NULL;
>  
> +  tree var, x;
>for (c = clauses; c ; c = OMP_CLAUSE_CHAIN (c))
>  switch (OMP_CLAUSE_CODE (c))
>{
> - tree var, x;
> -
>default:
>   break;
>case OMP_CLAUSE_MAP:
> @@ -16066,12 +16065,11 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
> omp_context *ctx)
>vec_alloc (vkind, map_cnt);
>unsigned int map_idx = 0;
>  
> +  tree ovar, nc, s, purpose, var, x, type;
> +  unsigned int talign;
>for (c = clauses; c ; c = OMP_CLAUSE_CHAIN (c))
>   switch (OMP_CLAUSE_CODE (c))
> {
> - tree ovar, nc, s, purpose, var, x, type;
> - unsigned int talign;
> -
> default:
>   break;
>  
> @@ -16442,10 +16440,10 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
> omp_context *ctx)
>if (offloaded || data_region)
>  {
>tree prev = NULL_TREE;
> +  tree var, x;
>for (c = clauses; c ; c = OMP_CLAUSE_CHAIN (c))
>   switch (OMP_CLAUSE_CODE (c))
> {
> - tree var, x;
> default:
>   break;
> case OMP_CLAUSE_FIRSTPRIVATE:
> @@ -16594,7 +16592,6 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
> omp_context *ctx)
>for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
>   switch (OMP_CLAUSE_CODE (c))
> {
> - tree var;
> default:
>   break;
> case OMP_CLAUSE_MAP:
> diff --git a/gcc/tree-dump.c b/gcc/tree-dump.c
> index df47181..89f72a0 100644
> --- a/gcc/tree-dump.c
> +++ b/gcc/tree-dump.c
> @@ -420,8 +420,6 @@ dequeue_and_dump (dump_info_p di)
>/* Now handle the various kinds of nodes.  */
>switch (code)
>  {
> -  int i;
> -
>  case IDENTIFIER_NODE:
>dump_string_field (di, "strg", IDENTIFIER_POINTER (t));
>dump_int (di, "lngt", IDENTIFIER_LENGTH (t));
> @@ -435,6 +433,7 @@ dequeue_and_dump (dump_info_p di)
>  
>  case STATEMENT_LIST:
>{
> + int i;
>   tree_stmt_iterator it;
>   for (i = 0, it = tsi_start (t); !tsi_end_p (it); tsi_next (), i++)
> {
> @@ -447,7 +446,7 @@ dequeue_and_dump (dump_info_p di)
>  
>  case TREE_VEC:
>dump_int (di, "lngt", TREE_VEC_LENGTH (t));
> -  for (i = 0; i < TREE_VEC_LENGTH (t); ++i)
> +  for (int i = 0; i < TREE_VEC_LENGTH (t); ++i)
>   {
> char buffer[32];
> sprintf (buffer, "%u", i);
> @@ -707,9 +706,8 @@ dequeue_and_dump (dump_info_p di)
>break;
>  case OMP_CLAUSE:
>{
> - int i;
>   fprintf (di->stream, "%s\n", omp_clause_code_name[OMP_CLAUSE_CODE (t)]);
> - for (i = 0; i < omp_clause_num_ops[OMP_CLAUSE_CODE (t)]; i++)
> + for (int i = 0; i < omp_clause_num_ops[OMP_CLAUSE_CODE (t)]; i++)
> dump_child ("op: ", OMP_CLAUSE_OPERAND (t, i));
>}
>break;
> -- 
> 2.10.1
> 


Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-01 Thread Martin Liška
On 10/27/2016 07:23 PM, Jakub Jelinek wrote:
> Ok for trunk with that change.
> 
>   Jakub

Hello.

I'll commit the patch as soon as following patch would be accepted. The patch
fixes false positives when running asan-bootstrap.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From b62e4d7ffe659ec338ef83e84ccb572a07264283 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 20 Sep 2016 10:31:25 +0200
Subject: [PATCH 1/4] Fix ASAN bootstrap uninitialized warning.

gcc/ChangeLog:

2016-09-26  Martin Liska  

	* ipa-devirt.c (record_targets_from_bases): Initialize a
	variable.
	* omp-low.c (lower_omp_target): Remove a variable from
	scope defined by a switch statement.
	* tree-dump.c (dequeue_and_dump): Likewise.

gcc/java/ChangeLog:

2016-09-26  Martin Liska  

	* mangle.c (mangle_type): Remove a variable from
	scope defined by a switch statement.
---
 gcc/ipa-devirt.c |  2 +-
 gcc/omp-low.c| 11 ---
 gcc/tree-dump.c  |  8 +++-
 3 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c
index 49e2195..5c0ae72 100644
--- a/gcc/ipa-devirt.c
+++ b/gcc/ipa-devirt.c
@@ -2862,7 +2862,7 @@ record_targets_from_bases (tree otr_type,
 {
   while (true)
 {
-  HOST_WIDE_INT pos, size;
+  HOST_WIDE_INT pos = 0, size;
   tree base_binfo;
   tree fld;
 
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index e5b9e4c..62c9e5c 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -15803,11 +15803,10 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   push_gimplify_context ();
   fplist = NULL;
 
+  tree var, x;
   for (c = clauses; c ; c = OMP_CLAUSE_CHAIN (c))
 switch (OMP_CLAUSE_CODE (c))
   {
-	tree var, x;
-
   default:
 	break;
   case OMP_CLAUSE_MAP:
@@ -16066,12 +16065,11 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   vec_alloc (vkind, map_cnt);
   unsigned int map_idx = 0;
 
+  tree ovar, nc, s, purpose, var, x, type;
+  unsigned int talign;
   for (c = clauses; c ; c = OMP_CLAUSE_CHAIN (c))
 	switch (OMP_CLAUSE_CODE (c))
 	  {
-	tree ovar, nc, s, purpose, var, x, type;
-	unsigned int talign;
-
 	  default:
 	break;
 
@@ -16442,10 +16440,10 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   if (offloaded || data_region)
 {
   tree prev = NULL_TREE;
+  tree var, x;
   for (c = clauses; c ; c = OMP_CLAUSE_CHAIN (c))
 	switch (OMP_CLAUSE_CODE (c))
 	  {
-	tree var, x;
 	  default:
 	break;
 	  case OMP_CLAUSE_FIRSTPRIVATE:
@@ -16594,7 +16592,6 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
 	switch (OMP_CLAUSE_CODE (c))
 	  {
-	tree var;
 	  default:
 	break;
 	  case OMP_CLAUSE_MAP:
diff --git a/gcc/tree-dump.c b/gcc/tree-dump.c
index df47181..89f72a0 100644
--- a/gcc/tree-dump.c
+++ b/gcc/tree-dump.c
@@ -420,8 +420,6 @@ dequeue_and_dump (dump_info_p di)
   /* Now handle the various kinds of nodes.  */
   switch (code)
 {
-  int i;
-
 case IDENTIFIER_NODE:
   dump_string_field (di, "strg", IDENTIFIER_POINTER (t));
   dump_int (di, "lngt", IDENTIFIER_LENGTH (t));
@@ -435,6 +433,7 @@ dequeue_and_dump (dump_info_p di)
 
 case STATEMENT_LIST:
   {
+	int i;
 	tree_stmt_iterator it;
 	for (i = 0, it = tsi_start (t); !tsi_end_p (it); tsi_next (), i++)
 	  {
@@ -447,7 +446,7 @@ dequeue_and_dump (dump_info_p di)
 
 case TREE_VEC:
   dump_int (di, "lngt", TREE_VEC_LENGTH (t));
-  for (i = 0; i < TREE_VEC_LENGTH (t); ++i)
+  for (int i = 0; i < TREE_VEC_LENGTH (t); ++i)
 	{
 	  char buffer[32];
 	  sprintf (buffer, "%u", i);
@@ -707,9 +706,8 @@ dequeue_and_dump (dump_info_p di)
   break;
 case OMP_CLAUSE:
   {
-	int i;
 	fprintf (di->stream, "%s\n", omp_clause_code_name[OMP_CLAUSE_CODE (t)]);
-	for (i = 0; i < omp_clause_num_ops[OMP_CLAUSE_CODE (t)]; i++)
+	for (int i = 0; i < omp_clause_num_ops[OMP_CLAUSE_CODE (t)]; i++)
 	  dump_child ("op: ", OMP_CLAUSE_OPERAND (t, i));
   }
   break;
-- 
2.10.1



Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-10-27 Thread Jakub Jelinek
On Thu, Oct 27, 2016 at 04:40:30PM +0200, Martin Liška wrote:
> On 10/21/2016 04:26 PM, Jakub Jelinek wrote:
> > On Wed, Oct 12, 2016 at 04:07:53PM +0200, Martin Liška wrote:
> >>> Ok, first let me list some needed follow-ups that don't need to be handled
> >>> right away:
> >>> - r237814-like changes for ASAN_MARK
> 
> I've spent quite some on that and that's what I begin 
> (use-after-scope-addressable.patch).
> Problem is that as I ignore all ASAN_MARK internal fns, the code does not 
> detect having address
> taken in:
> 
> _2 = MEM[(char *)_char + 8B];
> 
>   char *ptr;
>   {
> char my_char[9];
> ptr = _char[0];
>   }
> 
>   return *(ptr+8);
> 
> and thus the code in tree-ssa.c (maybe_optimize_var) sets TREE_ADDRESSABLE 
> (var) = 0.

Perhaps we should do that only if the var's type is_gimple_reg_type,
then we'd rewrite that into SSA at that time, right?  So, in theory if we
turned the ASAN_MARK poisoning call into another internal function
(var_5 = ASAN_POISON ()) and then after converting it into SSA looked at
all the uses of such an lhs and perhaps at sanopt part or when marked all
the use places with a library call that would complain at runtime?
Or turn those back at sanopt time into addressable memory loads which would
be poisoned or similar?  Or alternatively, immediately before turning
variables addressable just because of ASAN_MARK into non-addressable use
the same framework into-ssa uses to find out if there are any poisoned
accesses, and just not optimize it in that case.
Anyway, this can be done incrementally.

> Second question I have is whether we want to handle just TREE_ADDRESSABLE 
> stuff during gimplification?
> Basically in a way that the current patch is doing?

How could variables that aren't TREE_ADDRESSABLE during gimplification be
accessed out of scope?

> +/* Return true if DECL should be guarded on the stack.  */
> +
> +static inline bool
> +asan_protect_stack_decl (tree decl)
> +{
> +  return DECL_P (decl)
> +&& (!DECL_ARTIFICIAL (decl)
> + || (asan_sanitize_use_after_scope () && TREE_ADDRESSABLE (decl)));

Bad formatting.  Should be:

  return (DECL_P (decl)
  && (!DECL_ARTIFICIAL (decl)
  || (asan_sanitize_use_after_scope ()
  && TREE_ADDRESSABLE (decl;

Ok for trunk with that change.

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-10-27 Thread Martin Liška
On 10/21/2016 04:26 PM, Jakub Jelinek wrote:
> On Wed, Oct 12, 2016 at 04:07:53PM +0200, Martin Liška wrote:
>>> Ok, first let me list some needed follow-ups that don't need to be handled
>>> right away:
>>> - r237814-like changes for ASAN_MARK

I've spent quite some on that and that's what I begin 
(use-after-scope-addressable.patch).
Problem is that as I ignore all ASAN_MARK internal fns, the code does not 
detect having address
taken in:

_2 = MEM[(char *)_char + 8B];

  char *ptr;
  {
char my_char[9];
ptr = _char[0];
  }

  return *(ptr+8);

and thus the code in tree-ssa.c (maybe_optimize_var) sets TREE_ADDRESSABLE 
(var) = 0.

Second question I have is whether we want to handle just TREE_ADDRESSABLE stuff 
during gimplification?
Basically in a way that the current patch is doing?

>>
>> Yes, that's missing. I'm wondering whether the same approach would be viable 
>> as
>> the {un}poisoning happens during gimplification. Thus it generates  
>> expressions
>> which verifier won't be happy about?
> 
> Sure, it uses   The trick is that the addressable sub-pass then ignores
> the taking of the address just in ASAN_MARK, and if some var is determined
> to be addressable solely because of ASAN_MARK  uses and no other reason,
> the ASAN_MARK would be dropped and variable rewritten into SSA form.
> 
>>> - optimization to remove ASAN_MARK unpoisoning at the start of the function
>>
>> As current implementation does not poison variables at the very beginning of
>> a functions (RTL stack frame emission), it won't be needed.
> 
> But you still ASAN_MARK unpoison the vars when they get into scope, right?
> And those can be removed if the optimizers could prove that the area has not
> been poisoned yet since the beginning of the function.
> 
>>> - optimization to remove ASAN_MARK poisoning at the end of function (and
>>>   remove epilogue unpoisoning)
>>> - optimization to remove ASAN_MARK unpoisoning followed by ASAN_MARK 
>>> poisoning
>>>   or vice versa if there are no memory accesses in between
>>
>> Yep, both are not handled and are very similar from my perspective: pairing
>> poisoning and unpoisoning pair which are not interfered by a memory operation
>> in between (in dominator meaning of word).
>> I'm wondering whether can be done effectively as we would need to visit all 
>> BBs
>> in a dominance domain (get_all_dominated_blocks) and check for the memory 
>> operations.
>> One improvement can be set of BBs that do not have any memory operations 
>> (excluding
>> ASAN_CHECK, ASAN_MARK builtins), but it can be still expensive to simplify 
>> poisoning
>> for all local variables. Or am I wrong?
> 
> My memory is weak, but isn't this something the sanopt pass
> (sanopt_optimize) is already doing similarly for e.g. ASAN_CHECK, UBSAN_NULL
> and UBSAN_VPTR checks?  For ASAN_MARK, you actually don't care about any
> calls, those shouldn't unpoison or poison again the vars under compiler's
> back.
> 
>>> - try to improve the goto handling
>>
>> Works for me to be target for stage3.
> 
> Sure.
> 
>> 2016-09-27  Martin Liska  
> 
> Likely newer date :)
> 
>>  * c-common.c (warn_for_unused_label): Save all labels used
>>  in goto or in 
> 
> 
> instead?
> 
>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>> +{
>> +  const char *n = DECL_NAME (decl)
>> +? IDENTIFIER_POINTER (DECL_NAME (decl)) : "";
> 
> Bad formatting.
> 
> const char *n = (DECL_NAME (decl)
>  ? IDENTIFIER_POINTER (DECL_NAME (decl))
>  : "");
> or
> const char *n
>   = (DECL_NAME (decl)
>  ? IDENTIFIER_POINTER (DECL_NAME (decl)) : "");
> 
>>  /* Return true if DECL should be guarded on the stack.  */
>>  
>>  static inline bool
>>  asan_protect_stack_decl (tree decl)
>>  {
>> -  return DECL_P (decl) && !DECL_ARTIFICIAL (decl);
>> +  return DECL_P (decl) && TREE_ADDRESSABLE (decl);
>>  }
> 
> Can you explain this change?  It won't affect just
> -fsanitize-use-after-scope, and goes in both directions, adds some
> decls that weren't previously protected and removes others that were
> previously protected.
> 
> Is the removal of !DECL_ARTIFICIAL so that you can use-after-scope
> verify the C++ TARGET_EXPR temporaries?  Do we need to stack protect
> those even for -fno-sanitize-use-after-scope?
> I'm not 100% sure if we keep TREE_ADDRESSABLE meaningful for variables that
> need to live in memory for other reasons until expansion.


Yes, it's connected to C++ temporaries that are DECL_ARTIFICIAL.

> 
> So, I wonder if we don't want && (TREE_ADDRESSABLE (decl) || !DECL_ARTIFICAL 
> (decl))
> or just DECL_P (decl); or conditionalize something on
> -fsanitize-use-after-scope.  Perhaps the change is good as is, just want to
> point out that it affects also other -fsanitize=address modes in both
> ways (instruments something that hasn't been before, 

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-10-25 Thread Martin Liška
On 10/21/2016 04:26 PM, Jakub Jelinek wrote:
> My memory is weak, but isn't this something the sanopt pass
> (sanopt_optimize) is already doing similarly for e.g. ASAN_CHECK, UBSAN_NULL
> and UBSAN_VPTR checks?  For ASAN_MARK, you actually don't care about any
> calls, those shouldn't unpoison or poison again the vars under compiler's
> back.

Hi Jakub.

Your memory is not weak, it's exactly what to pass does. I've spent quite some
time reading and understanding the optimization (and finding PR78106) and looks
very similar to what we have for ASAN_MARK. However it's more complicated due
to existence of poisoning and unpoisoning calls.

In the previous email, you summarized what "patterns" can be optimized out:

1) - optimization to remove ASAN_MARK unpoisoning at the start of the function
2) - optimization to remove ASAN_MARK poisoning at the end of function (and
  remove epilogue unpoisoning)
3) - optimization to remove ASAN_MARK unpoisoning followed by ASAN_MARK 
poisoning
  or vice versa if there are no memory accesses in between

I'll come with more generalization:

Ia) ASAN_MARK poisonining followed by ASAN_MARK unpoisoning with no memory 
access
in between => ASAN_MARK poisoning can be removed; 
Ib) ASAN_MARK unpoisonining followed by ASAN_MARK poisoning with no memory 
access
in between => ASAN_MARK unpoisoning can be removed; 

Ia + Ib match exactly with 3). As function epilogue contains unpoisoning, Ia) 
== 2)

IIa) ASAN_MARK poisoning followed by ASAN_MARK poisoning => second ASAN_MARK 
can be removed
IIb) ASAN_MARK unpoisoning followed by ASAN_MARK unpoisoning => second 
ASAN_MARK can be removed

IIb matches 1)

III) ASAN_CHECK() followed by ASAN_CHECK() with no ASAN_MARK poisoning 
in between =>
 second ASAN_CHECK can be removed (this is more aggressive than what current
 can do maybe_optimize_asan_check_ifn).

Notes:
- Compared to current imm_dom_path_with_freeing_call_p, we would need 3 such 
predicates:
  x) imm_dom_path_with_asan_check () - needed by Ia, Ib
  y) imm_dom_path_with_asan_mark_poisoning () - this should return set of 
variables which
 can be potentially poisoned - needed by IIb and III
  z) imm_dom_path_with_asan_mark_unpoisoning () - this should return set of 
variables
 which can be potentially unpoisoned - needed by IIa

I would appreciate a feedback about my brainstorming I did about sanopt porting 
to use-after-scope,
maybe my cases are more generic that we would eventually need. And I'm not 100% 
convinced that all my
patterns described above would be doable in a similar way as current sanopt 
algorithm is done?

Thanks,
Martin


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-10-21 Thread Jakub Jelinek
On Wed, Oct 12, 2016 at 04:07:53PM +0200, Martin Liška wrote:
> > Ok, first let me list some needed follow-ups that don't need to be handled
> > right away:
> > - r237814-like changes for ASAN_MARK
> 
> Yes, that's missing. I'm wondering whether the same approach would be viable 
> as
> the {un}poisoning happens during gimplification. Thus it generates  
> expressions
> which verifier won't be happy about?

Sure, it uses   The trick is that the addressable sub-pass then ignores
the taking of the address just in ASAN_MARK, and if some var is determined
to be addressable solely because of ASAN_MARK  uses and no other reason,
the ASAN_MARK would be dropped and variable rewritten into SSA form.

> > - optimization to remove ASAN_MARK unpoisoning at the start of the function
> 
> As current implementation does not poison variables at the very beginning of
> a functions (RTL stack frame emission), it won't be needed.

But you still ASAN_MARK unpoison the vars when they get into scope, right?
And those can be removed if the optimizers could prove that the area has not
been poisoned yet since the beginning of the function.

> > - optimization to remove ASAN_MARK poisoning at the end of function (and
> >   remove epilogue unpoisoning)
> > - optimization to remove ASAN_MARK unpoisoning followed by ASAN_MARK 
> > poisoning
> >   or vice versa if there are no memory accesses in between
> 
> Yep, both are not handled and are very similar from my perspective: pairing
> poisoning and unpoisoning pair which are not interfered by a memory operation
> in between (in dominator meaning of word).
> I'm wondering whether can be done effectively as we would need to visit all 
> BBs
> in a dominance domain (get_all_dominated_blocks) and check for the memory 
> operations.
> One improvement can be set of BBs that do not have any memory operations 
> (excluding
> ASAN_CHECK, ASAN_MARK builtins), but it can be still expensive to simplify 
> poisoning
> for all local variables. Or am I wrong?

My memory is weak, but isn't this something the sanopt pass
(sanopt_optimize) is already doing similarly for e.g. ASAN_CHECK, UBSAN_NULL
and UBSAN_VPTR checks?  For ASAN_MARK, you actually don't care about any
calls, those shouldn't unpoison or poison again the vars under compiler's
back.

> > - try to improve the goto handling
> 
> Works for me to be target for stage3.

Sure.

> 2016-09-27  Martin Liska  

Likely newer date :)

>   * c-common.c (warn_for_unused_label): Save all labels used
>   in goto or in 


instead?

> +   if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> +   const char *n = DECL_NAME (decl)
> + ? IDENTIFIER_POINTER (DECL_NAME (decl)) : "";

Bad formatting.

  const char *n = (DECL_NAME (decl)
   ? IDENTIFIER_POINTER (DECL_NAME (decl))
   : "");
or
  const char *n
= (DECL_NAME (decl)
   ? IDENTIFIER_POINTER (DECL_NAME (decl)) : "");

>  /* Return true if DECL should be guarded on the stack.  */
>  
>  static inline bool
>  asan_protect_stack_decl (tree decl)
>  {
> -  return DECL_P (decl) && !DECL_ARTIFICIAL (decl);
> +  return DECL_P (decl) && TREE_ADDRESSABLE (decl);
>  }

Can you explain this change?  It won't affect just
-fsanitize-use-after-scope, and goes in both directions, adds some
decls that weren't previously protected and removes others that were
previously protected.

Is the removal of !DECL_ARTIFICIAL so that you can use-after-scope
verify the C++ TARGET_EXPR temporaries?  Do we need to stack protect
those even for -fno-sanitize-use-after-scope?
I'm not 100% sure if we keep TREE_ADDRESSABLE meaningful for variables that
need to live in memory for other reasons until expansion.

So, I wonder if we don't want && (TREE_ADDRESSABLE (decl) || !DECL_ARTIFICAL 
(decl))
or just DECL_P (decl); or conditionalize something on
-fsanitize-use-after-scope.  Perhaps the change is good as is, just want to
point out that it affects also other -fsanitize=address modes in both
ways (instruments something that hasn't been before, and stops instrumenting
something that has been before).

> @@ -1514,7 +1503,8 @@ defer_stack_allocation (tree var, bool toplevel)
>/* If stack protection is enabled, *all* stack variables must be deferred,
>   so that we can re-order the strings to the top of the frame.
>   Similarly for Address Sanitizer.  */
> -  if (flag_stack_protect || asan_sanitize_stack_p ())
> +  if (flag_stack_protect
> +  || asan_sanitize_stack_p ())
>  return true;

This hunk isn't needed, if all the conditions fit on one line,
one line is better.

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-10-12 Thread Martin Liška
On 10/07/2016 01:13 PM, Jakub Jelinek wrote:
> Hi!
> 

Hi Jakub.

Again thanks for detailed review, you found many improvements.

> Ok, first let me list some needed follow-ups that don't need to be handled
> right away:
> - r237814-like changes for ASAN_MARK

Yes, that's missing. I'm wondering whether the same approach would be viable as
the {un}poisoning happens during gimplification. Thus it generates  
expressions
which verifier won't be happy about?

> - optimization to remove ASAN_MARK unpoisoning at the start of the function

As current implementation does not poison variables at the very beginning of
a functions (RTL stack frame emission), it won't be needed.

> - optimization to remove ASAN_MARK poisoning at the end of function (and
>   remove epilogue unpoisoning)
> - optimization to remove ASAN_MARK unpoisoning followed by ASAN_MARK poisoning
>   or vice versa if there are no memory accesses in between

Yep, both are not handled and are very similar from my perspective: pairing
poisoning and unpoisoning pair which are not interfered by a memory operation
in between (in dominator meaning of word).
I'm wondering whether can be done effectively as we would need to visit all BBs
in a dominance domain (get_all_dominated_blocks) and check for the memory 
operations.
One improvement can be set of BBs that do not have any memory operations 
(excluding
ASAN_CHECK, ASAN_MARK builtins), but it can be still expensive to simplify 
poisoning
for all local variables. Or am I wrong?

> - try to improve the goto handling

Works for me to be target for stage3.

I'm sending a new version where all mentioned notes should be fixed.

Thanks,
Martin


> 
> On Mon, Oct 03, 2016 at 11:27:38AM +0200, Martin Liška wrote:
>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>> +fprintf (dump_file, "Skipping stack emission for variable: %s "
>> + "(%ldB)\n",
> 
> This message is weird.  I would have thought you are informing the user
> that you are unpoisoning the stack for the specified variable.
> 
>> + IDENTIFIER_POINTER (DECL_NAME (decl)),
> 
> Can't DECL_NAME be NULL?
> 
>> + var_end_offset - var_offset);
>> +  for (unsigned i = 0; i < size; ++i)
>> +{
>> +  unsigned char shadow_c = c;
>> +  if (i == size- 1 && last_chunk_size && !is_clobber)
> 
> Wrong formatting, space before - missing.
> 
>> +  g = gimple_build_assign (make_ssa_name (pointer_sized_int_node),
>> +  NOP_EXPR, base);
> 
> Incorrect indentation.
> 
>> +
>> +/* Return TRUE if we should instrument for use-after-scope sanity checking. 
>>  */
>> +
>> +static inline bool
>> +asan_sanitize_use_after_scope (void)
>> +{
>> +  return ((flag_sanitize & SANITIZE_ADDRESS_USE_AFTER_SCOPE)
>> +  == SANITIZE_ADDRESS_USE_AFTER_SCOPE
>> +  && ASAN_STACK);
>> +}
> 
> This looks wrong.  IMHO you really don't want to use ASAN_STACK in asan.h, 
> because it
> requires params.h.  I think what would be best is to prototype
> asan_sanitize_stack_p in asan.h, remove static inline from its definition,
> and have asan.h
> static inline bool
> asan_sanitize_use_after_scope (void)
> {
>   return (flag_sanitize_use_after_scope && asan_sanitize_stack_p ();
> }
> That way, for the common case of no sanitization it would be inline and
> fast.
> 
>> +static inline bool
>> +asan_no_sanitize_address_p (void)
>> +{
>> +  return lookup_attribute ("no_sanitize_address",
>> +   DECL_ATTRIBUTES (current_function_decl));
>> +}
> 
> And you could avoid doing this.
> 
>> +  if ((asan_sanitize_stack_p () || asan_sanitize_use_after_scope ())
> 
> And then just drop asan_sanitize_use_after_scope () from cases like this,
> because really asan_sanitize_use_after_scope () must imply
> asan_sanitize_stack_p ().
> 
>> @@ -1127,7 +1127,8 @@ expand_stack_vars (bool (*pred) (size_t), struct 
>> stack_vars_data *data)
>>if (alignb * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT)
>>  {
>>base = virtual_stack_vars_rtx;
>> -  if (asan_sanitize_stack_p () && pred)
>> +  if ((asan_sanitize_stack_p () || asan_sanitize_use_after_scope ())
> 
> Likewise.
>> +  && pred)
> 
>> @@ -1514,7 +1515,9 @@ defer_stack_allocation (tree var, bool toplevel)
>>/* If stack protection is enabled, *all* stack variables must be deferred,
>>   so that we can re-order the strings to the top of the frame.
>>   Similarly for Address Sanitizer.  */
>> -  if (flag_stack_protect || asan_sanitize_stack_p ())
>> +  if (flag_stack_protect
>> +  || asan_sanitize_stack_p ()
>> +  || asan_sanitize_use_after_scope ())
> 
> And again.
> 
>>  return true;
>>  
>>unsigned int align = TREE_CODE (var) == SSA_NAME
>> @@ -2212,7 +2215,7 @@ expand_used_vars (void)
>>  expand_stack_vars (stack_protect_decl_phase_2, );
>>  }
>>  
>> -  if (asan_sanitize_stack_p ())
>> +  if (asan_sanitize_stack_p () || 

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-10-07 Thread Jakub Jelinek
Hi!

Ok, first let me list some needed follow-ups that don't need to be handled
right away:
- r237814-like changes for ASAN_MARK
- optimization to remove ASAN_MARK unpoisoning at the start of the function
- optimization to remove ASAN_MARK poisoning at the end of function (and
  remove epilogue unpoisoning)
- optimization to remove ASAN_MARK unpoisoning followed by ASAN_MARK poisoning
  or vice versa if there are no memory accesses in between
- try to improve the goto handling

On Mon, Oct 03, 2016 at 11:27:38AM +0200, Martin Liška wrote:
> +   if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file, "Skipping stack emission for variable: %s "
> +  "(%ldB)\n",

This message is weird.  I would have thought you are informing the user
that you are unpoisoning the stack for the specified variable.

> +  IDENTIFIER_POINTER (DECL_NAME (decl)),

Can't DECL_NAME be NULL?

> +  var_end_offset - var_offset);
> +  for (unsigned i = 0; i < size; ++i)
> +{
> +  unsigned char shadow_c = c;
> +  if (i == size- 1 && last_chunk_size && !is_clobber)

Wrong formatting, space before - missing.

> +  g = gimple_build_assign (make_ssa_name (pointer_sized_int_node),
> +   NOP_EXPR, base);

Incorrect indentation.

> +
> +/* Return TRUE if we should instrument for use-after-scope sanity checking.  
> */
> +
> +static inline bool
> +asan_sanitize_use_after_scope (void)
> +{
> +  return ((flag_sanitize & SANITIZE_ADDRESS_USE_AFTER_SCOPE)
> +   == SANITIZE_ADDRESS_USE_AFTER_SCOPE
> +   && ASAN_STACK);
> +}

This looks wrong.  IMHO you really don't want to use ASAN_STACK in asan.h, 
because it
requires params.h.  I think what would be best is to prototype
asan_sanitize_stack_p in asan.h, remove static inline from its definition,
and have asan.h
static inline bool
asan_sanitize_use_after_scope (void)
{
  return (flag_sanitize_use_after_scope && asan_sanitize_stack_p ();
}
That way, for the common case of no sanitization it would be inline and
fast.

> +static inline bool
> +asan_no_sanitize_address_p (void)
> +{
> +  return lookup_attribute ("no_sanitize_address",
> +DECL_ATTRIBUTES (current_function_decl));
> +}

And you could avoid doing this.

> +   if ((asan_sanitize_stack_p () || asan_sanitize_use_after_scope ())

And then just drop asan_sanitize_use_after_scope () from cases like this,
because really asan_sanitize_use_after_scope () must imply
asan_sanitize_stack_p ().

> @@ -1127,7 +1127,8 @@ expand_stack_vars (bool (*pred) (size_t), struct 
> stack_vars_data *data)
>if (alignb * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT)
>   {
> base = virtual_stack_vars_rtx;
> -   if (asan_sanitize_stack_p () && pred)
> +   if ((asan_sanitize_stack_p () || asan_sanitize_use_after_scope ())

Likewise.
> +   && pred)

> @@ -1514,7 +1515,9 @@ defer_stack_allocation (tree var, bool toplevel)
>/* If stack protection is enabled, *all* stack variables must be deferred,
>   so that we can re-order the strings to the top of the frame.
>   Similarly for Address Sanitizer.  */
> -  if (flag_stack_protect || asan_sanitize_stack_p ())
> +  if (flag_stack_protect
> +  || asan_sanitize_stack_p ()
> +  || asan_sanitize_use_after_scope ())

And again.

>  return true;
>  
>unsigned int align = TREE_CODE (var) == SSA_NAME
> @@ -2212,7 +2215,7 @@ expand_used_vars (void)
>   expand_stack_vars (stack_protect_decl_phase_2, );
>   }
>  
> -  if (asan_sanitize_stack_p ())
> +  if (asan_sanitize_stack_p () || asan_sanitize_use_after_scope ())

And again.

> --- a/gcc/flag-types.h
> +++ b/gcc/flag-types.h
> @@ -229,6 +229,7 @@ enum sanitize_code {
>SANITIZE_OBJECT_SIZE = 1UL << 20,
>SANITIZE_VPTR = 1UL << 21,
>SANITIZE_BOUNDS_STRICT = 1UL << 22,
> +  SANITIZE_USE_AFTER_SCOPE = 1UL << 23,
>SANITIZE_UNDEFINED = SANITIZE_SHIFT | SANITIZE_DIVIDE | 
> SANITIZE_UNREACHABLE
>  | SANITIZE_VLA | SANITIZE_NULL | SANITIZE_RETURN
>  | SANITIZE_SI_OVERFLOW | SANITIZE_BOOL | SANITIZE_ENUM
> @@ -237,7 +238,9 @@ enum sanitize_code {
>  | SANITIZE_RETURNS_NONNULL_ATTRIBUTE
>  | SANITIZE_OBJECT_SIZE | SANITIZE_VPTR,
>SANITIZE_NONDEFAULT = SANITIZE_FLOAT_DIVIDE | SANITIZE_FLOAT_CAST
> - | SANITIZE_BOUNDS_STRICT
> + | SANITIZE_BOUNDS_STRICT,
> +  SANITIZE_ADDRESS_USE_AFTER_SCOPE = SANITIZE_ADDRESS
> + | SANITIZE_USE_AFTER_SCOPE

Looking at this, as -fsanitize-use-after-scope is a separate option, it
shouldn't mess up anything in the SANITIZE_* flags.  So just use a
flag_sanitize_use_after_scope var for that.  It isn't something where you'd
decide differently on it e.g. for -fno-sanitize= of -fsanitize-recover= etc.

> @@ -1228,6 +1306,14 @@ gimplify_bind_expr (tree *expr_p, 

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-10-03 Thread Jakub Jelinek
On Mon, Oct 03, 2016 at 11:27:38AM +0200, Martin Liška wrote:
> > Plus, as I've mentioned before, it would be nice to optimize - for ASAN_MARK
> > unpoison appearing strictly before (i.e. dominating) the first (non-shadow) 
> > memory read
> > or write in the function (explicit or possible through function calls etc.)
> > you really don't need to unpoison (depending on whether we follow LLVM as
> > mentioned above then it can be removed without anything, or the decl needs
> > to be somehow marked and tell asan_emit_stack_protection it shouldn't poison
> > it at the start), and for ASAN_MARK poisoning appearing after the last
> > load/store in the function (post dominating those, you don't care about
> > noreturn though) you can combine that (remove the ASAN_MARK) with letting
> > asan_emit_stack_protection know it doesn't need to unpoison.
> 
> Fully agree with that approach, however I would be happy to do that as a 
> follow-up as
> it's not going to so trivial..

Ok.

> >> +  char c = poison ? ASAN_STACK_MAGIC_USE_AFTER_SCOPE : 0;
> >> +  for (unsigned i = 0; i < shadow_size; ++i)
> >> +{
> >> +  emit_move_insn (var_mem, gen_int_mode (c, QImode));
> >> +  var_mem = adjust_address (var_mem, QImode, 1);
> > 
> > When you combine it with the loop, you can also use the infrastructure to
> > handle it 4 bytes at a time.
> 
> Current implementation can handle up to 4 bytes at a time. I'm wondering we 
> can
> do even better for targets with 64-bits memory stores? How can one get such
> info about a target?

It is not just the question of whether the target has fast 64-bit memory
stores, but also whether the constants you want to store are reasonably
cheap.  E.g. on x86_64, movabsq is kind of expensive, so storing 64-bit 0
is cheap, but storing 64-bit 0xfdfdfdfdfdfdfdfdULL might be better done as 2
32-bit stores, perhaps both for speed and size.

> > 
> > Another thing I've noticed is that the inline expansion of
> > __asan_unpoison_stack_memory you emit looks buggy.
> > In use-after-scope-1.c I see:
> >   _9 = (unsigned long) _char;
> >   _10 = _9 >> 3;
> >   _11 = _10 + 2147450880;
> >   _12 = (signed char *) _11;
> >   MEM[(short int *)_12] = 0;
> > 
> > That would be fine only for 16 byte long my_char, but we have instead 9 byte
> > one.  So I believe in that case we need to store
> > 0x00, 0x01 bytes, for little endian thus 0x0100.  You could use for it
> > a function similarly to asan_shadow_cst, just build INTEGER_CST rather than
> > CONST_INT out of it.  In general, poisioning is storing 0xf8 to all affected
> > shadow bytes, unpoisioning should restore the state what we would emit
> > without use-after-scope sanitization, which is all but the last byte 0, and
> > the last byte 0 only if the var size is a multiple of 8, otherwise number
> > of valid bytes (1-7).
> 
> Fixed in the newer patch.
> 
> > 
> > As for the option, it seems clang uses now
> > -fsanitize-address-use-after-scope option, while I don't like that much, if
> > they have already released some version with that option or if they are
> > unwilling to change, I'd go with their option.
> 
> I also do not like the option, but 3.9.0 has already the functionality. Thus,
> I'm copying LLVM behavior.
> 
> > 
> >> + if (flag_stack_reuse != SR_NONE
> >> + && flag_openacc
> >> + && oacc_declare_returns != NULL)
> > 
> > This actually looks like preexisting OpenACC bug, I doubt the OpenACC
> > behavior should depend on -fstack-reuse= setting.
> 
> The generated diff for this hunk is bit misleading, I simplified that
> in the second version.
> 
> > 
> > +  bool unpoison_var = asan_poisoned_variables.contains (t);
> > +  if (asan_sanitize_use_after_scope ()
> > + && unpoison_var)
> > +   asan_poisoned_variables.remove (t);
> > 
> > Similarly to asan_handled_variables, I'd prefer it to be a pointer to
> > hash_set or something similar, so that it costs as few as possible for the
> > general case (no sanitization).  Similarly, querying the hash_set even for
> > no use-after-scope sanitization looks wrong.
> 
> Sure, fixed.
> 
> > 
> > + if ((asan_sanitize_stack_p () || asan_sanitize_use_after_scope ())
> > 
> > I would say if asan_sanitize_stack_p () is false, then we should not be
> > doing use-after-scope sanitization (error if user requested that
> > explicitly).
> 
> Done by adding '&& ASAN_STACK' to asan_sanitize_use_after_scope.
> 
> > 
> > Don't remember if I've mentioned it earlier, but for vars that are
> > TREE_ADDRESSABLE only because of ASAN_MARK calls, we should probably turn
> > them non-addressable and remove those ASAN_MARK calls, those shouldn't leak.
> > You can have a look at the r237814 change for how similarly
> > compare and exchange is special cased for the
> > addressables discovery (though, the ASAN_MARK case would be easier, just
> > drop it rather than turn it into something different).
> 
> I like the approach to not to handle local 

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-10-03 Thread Martin Liška
On 08/18/2016 03:36 PM, Jakub Jelinek wrote:
> On Thu, May 12, 2016 at 04:12:21PM +0200, Martin Liška wrote:
>> --- a/gcc/asan.c
>> +++ b/gcc/asan.c
>> @@ -243,6 +243,11 @@ static unsigned HOST_WIDE_INT asan_shadow_offset_value;
>>  static bool asan_shadow_offset_computed;
>>  static vec sanitized_sections;
>>  
>> +/* Set of variable declarations that are going to be guarded by
>> +   use-after-scope sanitizer.  */
>> +
>> +static hash_set asan_handled_variables (13);
> 
> Doesn't this introduce yet another global ctor?  If yes (and we
> unfortunately have already way too many), I'd strongly prefer to avoid that,
> use pointer to hash_set or something similar.

Hello

It does, I did pointer in second version of that patch.

> 
>> +/* Depending on POISON flag, emit a call to poison (or unpoison) stack 
>> memory
>> +   allocated for local variables, localted in OFFSETS.  LENGTH is number
>> +   of OFFSETS, BASE is the register holding the stack base,
>> +   against which OFFSETS array offsets are relative to.  BASE_OFFSET 
>> represents
>> +   an offset requested by alignment and similar stuff.  */
>> +
>> +static void
>> +asan_poison_stack_variables (rtx shadow_base, rtx base,
>> + HOST_WIDE_INT base_offset,
>> + HOST_WIDE_INT *offsets, int length,
>> + tree *decls, bool poison)
>> +{
>> +  if (asan_sanitize_use_after_scope ())
>> +for (int l = length - 2; l > 0; l -= 2)
>> +  {
> 
> I think this is unfortunate, it leads to:
> movl$-235802127, 2147450880(%rax)
> movl$-185335552, 2147450884(%rax)
> movl$-202116109, 2147450888(%rax)
> movb$-8, 2147450884(%rax)
> movb$-8, 2147450885(%rax)
> (e.g. on use-after-scope-1.c).
> The asan_emit_stack_protection function already walks all the
> entries in the offsets array in both of the
>   for (l = length; l; l -= 2)
> loops, so please handle the initial poisoning and final unpoisoning there
> as well.  The goal is that for variables that you want poison-after-scope
> at the start of the function (btw, I've noticed that current SVN LLVM
> doesn't bother with it and thus doesn't track "use before scope" (before the
> scope is entered for the first time, maybe we shouldn't either, that would
> catch only compiler bugs rather than user code bugs, right?)) have 0xf8
> on all corresponding bytes including the one that would otherwise have 0x01
> through 0x07.  When unpoisoning at the end of the function, again you should
> combine that with unpoisoning of the red zone and partial zone bytes plus
> the last 0x01 through 0x07, etc.

I also decided to not to handle "use before scope" issues and thus I do not 
poison
stack variables at the very beginning of a function.

As you noticed, the format stack poisoning/unpoisoning code was kind of ugly.
Current unpoisoning code (trunk version) basically clears the whole shadow 
memory
for a stack frame except local variables that are not touched by 
use-after-scope machinery.
That eventually leads to a bit easier code, producing the shadow clearing stuff.

> 
> Plus, as I've mentioned before, it would be nice to optimize - for ASAN_MARK
> unpoison appearing strictly before (i.e. dominating) the first (non-shadow) 
> memory read
> or write in the function (explicit or possible through function calls etc.)
> you really don't need to unpoison (depending on whether we follow LLVM as
> mentioned above then it can be removed without anything, or the decl needs
> to be somehow marked and tell asan_emit_stack_protection it shouldn't poison
> it at the start), and for ASAN_MARK poisoning appearing after the last
> load/store in the function (post dominating those, you don't care about
> noreturn though) you can combine that (remove the ASAN_MARK) with letting
> asan_emit_stack_protection know it doesn't need to unpoison.

Fully agree with that approach, however I would be happy to do that as a 
follow-up as
it's not going to so trivial..

> 
>> +char c = poison ? ASAN_STACK_MAGIC_USE_AFTER_SCOPE : 0;
>> +for (unsigned i = 0; i < shadow_size; ++i)
>> +  {
>> +emit_move_insn (var_mem, gen_int_mode (c, QImode));
>> +var_mem = adjust_address (var_mem, QImode, 1);
> 
> When you combine it with the loop, you can also use the infrastructure to
> handle it 4 bytes at a time.

Current implementation can handle up to 4 bytes at a time. I'm wondering we can
do even better for targets with 64-bits memory stores? How can one get such
info about a target?

> 
> Another thing I've noticed is that the inline expansion of
> __asan_unpoison_stack_memory you emit looks buggy.
> In use-after-scope-1.c I see:
>   _9 = (unsigned long) _char;
>   _10 = _9 >> 3;
>   _11 = _10 + 2147450880;
>   _12 = (signed char *) _11;
>   MEM[(short int *)_12] = 0;
> 
> That would be fine only for 16 byte long my_char, but we have instead 9 byte
> one.  So I believe in that case we need to 

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-09-03 Thread Jakub Jelinek
On Thu, May 12, 2016 at 04:12:21PM +0200, Martin Liška wrote:
> > Dunno, guess you need to do something in the FE for it already (talk to
> > Jason?).  At least in *.original dump there is already:
> >   < >   save ((const struct IntHolder &) _EXPR ) >;
> > int x = (int) saved->val;
> >   return  = x;
> > and the info on where the D.2263 temporary goes out of scope is lost.
> 
> Thanks for sample, I will ask Jason to help me with that.

Actually, I believe this is all available to the gimplifier.
Primarily look at gimplify_target_expr, which if
gimplify_ctxp->in_cleanup_point_expr
emits a D.N ={v} {CLOBBER};
stmt as cleanup to be added at that corresponding CLEANUP_POINT_EXPR.
And also study gimplify_cleanup_point_expr and gimple_push_cleanup.

I bet you want to emit for use-after-scope sanitization in
gimplify_target_expr next to the conditional which adds the clobber also
(for gimplify_ctxp->in_cleanup_point_expr only) also addition of ASAN_MASK
for the poisoning.  And with the same guard also (again, for if (init) case
only, i.e. the first time the TARGET_EXPR is encountered) before the
gimplification of the init the unpoisoning of the temporary.
Maybe initially ignore VLA temporaries, those would be harder to handle,
and probably have to be dealt together with user VLA/alloca address
sanitization.

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-08-18 Thread Jakub Jelinek
On Thu, May 12, 2016 at 04:12:21PM +0200, Martin Liška wrote:
> --- a/gcc/asan.c
> +++ b/gcc/asan.c
> @@ -243,6 +243,11 @@ static unsigned HOST_WIDE_INT asan_shadow_offset_value;
>  static bool asan_shadow_offset_computed;
>  static vec sanitized_sections;
>  
> +/* Set of variable declarations that are going to be guarded by
> +   use-after-scope sanitizer.  */
> +
> +static hash_set asan_handled_variables (13);

Doesn't this introduce yet another global ctor?  If yes (and we
unfortunately have already way too many), I'd strongly prefer to avoid that,
use pointer to hash_set or something similar.

> +/* Depending on POISON flag, emit a call to poison (or unpoison) stack memory
> +   allocated for local variables, localted in OFFSETS.  LENGTH is number
> +   of OFFSETS, BASE is the register holding the stack base,
> +   against which OFFSETS array offsets are relative to.  BASE_OFFSET 
> represents
> +   an offset requested by alignment and similar stuff.  */
> +
> +static void
> +asan_poison_stack_variables (rtx shadow_base, rtx base,
> +  HOST_WIDE_INT base_offset,
> +  HOST_WIDE_INT *offsets, int length,
> +  tree *decls, bool poison)
> +{
> +  if (asan_sanitize_use_after_scope ())
> +for (int l = length - 2; l > 0; l -= 2)
> +  {

I think this is unfortunate, it leads to:
movl$-235802127, 2147450880(%rax)
movl$-185335552, 2147450884(%rax)
movl$-202116109, 2147450888(%rax)
movb$-8, 2147450884(%rax)
movb$-8, 2147450885(%rax)
(e.g. on use-after-scope-1.c).
The asan_emit_stack_protection function already walks all the
entries in the offsets array in both of the
  for (l = length; l; l -= 2)
loops, so please handle the initial poisoning and final unpoisoning there
as well.  The goal is that for variables that you want poison-after-scope
at the start of the function (btw, I've noticed that current SVN LLVM
doesn't bother with it and thus doesn't track "use before scope" (before the
scope is entered for the first time, maybe we shouldn't either, that would
catch only compiler bugs rather than user code bugs, right?)) have 0xf8
on all corresponding bytes including the one that would otherwise have 0x01
through 0x07.  When unpoisoning at the end of the function, again you should
combine that with unpoisoning of the red zone and partial zone bytes plus
the last 0x01 through 0x07, etc.

Plus, as I've mentioned before, it would be nice to optimize - for ASAN_MARK
unpoison appearing strictly before (i.e. dominating) the first (non-shadow) 
memory read
or write in the function (explicit or possible through function calls etc.)
you really don't need to unpoison (depending on whether we follow LLVM as
mentioned above then it can be removed without anything, or the decl needs
to be somehow marked and tell asan_emit_stack_protection it shouldn't poison
it at the start), and for ASAN_MARK poisoning appearing after the last
load/store in the function (post dominating those, you don't care about
noreturn though) you can combine that (remove the ASAN_MARK) with letting
asan_emit_stack_protection know it doesn't need to unpoison.

> + char c = poison ? ASAN_STACK_MAGIC_USE_AFTER_SCOPE : 0;
> + for (unsigned i = 0; i < shadow_size; ++i)
> +   {
> + emit_move_insn (var_mem, gen_int_mode (c, QImode));
> + var_mem = adjust_address (var_mem, QImode, 1);

When you combine it with the loop, you can also use the infrastructure to
handle it 4 bytes at a time.

Another thing I've noticed is that the inline expansion of
__asan_unpoison_stack_memory you emit looks buggy.
In use-after-scope-1.c I see:
  _9 = (unsigned long) _char;
  _10 = _9 >> 3;
  _11 = _10 + 2147450880;
  _12 = (signed char *) _11;
  MEM[(short int *)_12] = 0;

That would be fine only for 16 byte long my_char, but we have instead 9 byte
one.  So I believe in that case we need to store
0x00, 0x01 bytes, for little endian thus 0x0100.  You could use for it
a function similarly to asan_shadow_cst, just build INTEGER_CST rather than
CONST_INT out of it.  In general, poisioning is storing 0xf8 to all affected
shadow bytes, unpoisioning should restore the state what we would emit
without use-after-scope sanitization, which is all but the last byte 0, and
the last byte 0 only if the var size is a multiple of 8, otherwise number
of valid bytes (1-7).

As for the option, it seems clang uses now
-fsanitize-address-use-after-scope option, while I don't like that much, if
they have already released some version with that option or if they are
unwilling to change, I'd go with their option.

> + if (flag_stack_reuse != SR_NONE
> + && flag_openacc
> + && oacc_declare_returns != NULL)

This actually looks like preexisting OpenACC bug, I doubt the OpenACC
behavior should depend on -fstack-reuse= setting.

+  bool unpoison_var = asan_poisoned_variables.contains 

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-08-12 Thread Martin Liška
PING^1

> On 05/12/2016 12:41 PM, Jakub Jelinek wrote:
>> On Wed, May 11, 2016 at 02:54:01PM +0200, Martin Liška wrote:
>>> On 05/06/2016 02:22 PM, Jakub Jelinek wrote:
 On Fri, May 06, 2016 at 01:04:30PM +0200, Martin Liška wrote:
> I've started working on the patch couple of month go, basically after
> a brief discussion with Jakub on IRC.
>
> I'm sending the initial version which can successfully run instrumented
> tramp3d, postgresql server and Inkscape. It catches the basic set of
> examples which are added in following patch.
>
> The implementation is quite straightforward as works in following steps:
>
> 1) Every local variable stack slot is poisoned at the very beginning of a 
> function (RTL emission)
> 2) In gimplifier, once we spot a DECL_EXPR, a variable is unpoisoned (by 
> emitting ASAN_MARK builtin)
> and the variable is marked as addressable

 Not all vars have DECL_EXPRs though.
>>
>> Just random comments from quick skim, need to find enough spare time to
>> actually try it and see how it works.
>>
>>> Yeah, I've spotted one interesting example which is part of LLVM's 
>>> testsuite:
>>>
>>> struct IntHolder {
>>>   int val;
>>> };
>>>
>>> const IntHolder *saved;
>>>
>>> void save(const IntHolder ) {
>>>   saved = 
>>> }
>>>
>>> int main(int argc, char *argv[]) {
>>>   save({10});
>>>   int x = saved->val;  // BOOM
>>>   return x;
>>> }
>>>
>>> It would be also good to handle such temporaries. Any suggestions how to 
>>> handle that in gimplifier?
>>
>> Dunno, guess you need to do something in the FE for it already (talk to
>> Jason?).  At least in *.original dump there is already:
>>   <>   save ((const struct IntHolder &) _EXPR ) >;
>> int x = (int) saved->val;
>>   return  = x;
>> and the info on where the D.2263 temporary goes out of scope is lost.
> 
> Thanks for sample, I will ask Jason to help me with that.
> 
>>
>>> Apart from that, second version of the patch changes:
>>> + fixed issues with missing stack unpoisoning; currently, I mark all 
>>> VAR_DECLs that
>>> are in ASAN_MARK internal fns and stack prologue/epilogue is emitted just 
>>> for these vars
>>> + removed unneeded hunks (tree-vect-patterns.c and asan_poisoning.cc)
>>> + LABEL unpoisoning code makes stable sort for variables that were already 
>>> used in the context
>>> + stack poisoning hasn't worked for -O1+ due to following guard in asan.c
>>>  /* Automatic vars in the current function will be always accessible.  */
>>> + direct shadow memory poisoning/unpoisoning code is introduced - in both 
>>> scenarios (RTL and GIMPLE),
>>> I would appreciate feedback if storing multiple bytes is fine? What is the 
>>> maximum memory wide
>>> store mode supported by a target? How can I get such information?
>>> + the maximum object size handled by a direct emission is guarded by 
>>> use-after-scope-direct-emission-threshold
>>> parameter; initial value (256B) should maximally emit store of 32B
>>
>> Would be better if user visible param was in bytes rather than bits IMHO.
>>
> 
> Well, the size of an object is in bytes, but as we map every 8 (yeah, that's 
> configurable, I'm quite curious about
> real respecting of ASAN_SHADOW_SHIFT) bytes of real memory to
> a single byte in shadow memory, thus the division by 8 is needed.
> 
>>> Yeah, depends because of:
>>>
>>> static inline bool
>>> asan_sanitize_use_after_scope (void)
>>> {
>>>   return ((flag_sanitize & SANITIZE_ADDRESS_USE_AFTER_SCOPE)
>>>   == SANITIZE_ADDRESS_USE_AFTER_SCOPE
>>>   && flag_stack_reuse == SR_NONE
>>>   && ASAN_STACK);
>>> }
>>>
>>> Where ASAN_STACK comes from params.h.
>>
>> I'd prefer just prototype the function in the header and define in asan.c
>> or some other source file.  Or maybe split it, do the important case
>> (flag_sanitize check) inline and call out of line function for the rest.
>> Why do you check flag_stack_reuse?  I thought you'd arrange for it to be
>> different when -fsanitize=use-after-scope?
> 
> Right, the sanitization does not relate to flag_stack, thus removing the 
> dependency,
> we can remove need of including params.h in various places.
> 
>>
>>> @@ -243,6 +243,11 @@ static unsigned HOST_WIDE_INT asan_shadow_offset_value;
>>>  static bool asan_shadow_offset_computed;
>>>  static vec sanitized_sections;
>>>  
>>> +/* Set of variable declarations that are going to be guarded by
>>> +   use-after-scope sanitizer.  */
>>> +
>>> +static hash_set  asan_handled_variables(13);
>>
>> Not sure about the formatting here, don't we use xxx instead of xxx 
>> 
>> ?  And I'd expect space before (.
> 
> Yeah, done.
> 
>>> @@ -1020,6 +1020,91 @@ asan_function_start (void)
>>>  current_function_funcdef_no);
>>>  }
>>>  
>>> +/* Return number of shadow bytes that are occupied by a local variable
>>> +   of SIZE bytes.  */
>>> +
>>> +static unsigned HOST_WIDE_INT
>>> +get_shadow_memory_size (unsigned HOST_WIDE_INT 

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-12 Thread Martin Liška
On 05/12/2016 12:41 PM, Jakub Jelinek wrote:
> On Wed, May 11, 2016 at 02:54:01PM +0200, Martin Liška wrote:
>> On 05/06/2016 02:22 PM, Jakub Jelinek wrote:
>>> On Fri, May 06, 2016 at 01:04:30PM +0200, Martin Liška wrote:
 I've started working on the patch couple of month go, basically after
 a brief discussion with Jakub on IRC.

 I'm sending the initial version which can successfully run instrumented
 tramp3d, postgresql server and Inkscape. It catches the basic set of
 examples which are added in following patch.

 The implementation is quite straightforward as works in following steps:

 1) Every local variable stack slot is poisoned at the very beginning of a 
 function (RTL emission)
 2) In gimplifier, once we spot a DECL_EXPR, a variable is unpoisoned (by 
 emitting ASAN_MARK builtin)
 and the variable is marked as addressable
>>>
>>> Not all vars have DECL_EXPRs though.
> 
> Just random comments from quick skim, need to find enough spare time to
> actually try it and see how it works.
> 
>> Yeah, I've spotted one interesting example which is part of LLVM's testsuite:
>>
>> struct IntHolder {
>>   int val;
>> };
>>
>> const IntHolder *saved;
>>
>> void save(const IntHolder ) {
>>   saved = 
>> }
>>
>> int main(int argc, char *argv[]) {
>>   save({10});
>>   int x = saved->val;  // BOOM
>>   return x;
>> }
>>
>> It would be also good to handle such temporaries. Any suggestions how to 
>> handle that in gimplifier?
> 
> Dunno, guess you need to do something in the FE for it already (talk to
> Jason?).  At least in *.original dump there is already:
>   <   save ((const struct IntHolder &) _EXPR ) >;
> int x = (int) saved->val;
>   return  = x;
> and the info on where the D.2263 temporary goes out of scope is lost.

Thanks for sample, I will ask Jason to help me with that.

> 
>> Apart from that, second version of the patch changes:
>> + fixed issues with missing stack unpoisoning; currently, I mark all 
>> VAR_DECLs that
>> are in ASAN_MARK internal fns and stack prologue/epilogue is emitted just 
>> for these vars
>> + removed unneeded hunks (tree-vect-patterns.c and asan_poisoning.cc)
>> + LABEL unpoisoning code makes stable sort for variables that were already 
>> used in the context
>> + stack poisoning hasn't worked for -O1+ due to following guard in asan.c
>>  /* Automatic vars in the current function will be always accessible.  */
>> + direct shadow memory poisoning/unpoisoning code is introduced - in both 
>> scenarios (RTL and GIMPLE),
>> I would appreciate feedback if storing multiple bytes is fine? What is the 
>> maximum memory wide
>> store mode supported by a target? How can I get such information?
>> + the maximum object size handled by a direct emission is guarded by 
>> use-after-scope-direct-emission-threshold
>> parameter; initial value (256B) should maximally emit store of 32B
> 
> Would be better if user visible param was in bytes rather than bits IMHO.
> 

Well, the size of an object is in bytes, but as we map every 8 (yeah, that's 
configurable, I'm quite curious about
real respecting of ASAN_SHADOW_SHIFT) bytes of real memory to
a single byte in shadow memory, thus the division by 8 is needed.

>> Yeah, depends because of:
>>
>> static inline bool
>> asan_sanitize_use_after_scope (void)
>> {
>>   return ((flag_sanitize & SANITIZE_ADDRESS_USE_AFTER_SCOPE)
>>== SANITIZE_ADDRESS_USE_AFTER_SCOPE
>>&& flag_stack_reuse == SR_NONE
>>&& ASAN_STACK);
>> }
>>
>> Where ASAN_STACK comes from params.h.
> 
> I'd prefer just prototype the function in the header and define in asan.c
> or some other source file.  Or maybe split it, do the important case
> (flag_sanitize check) inline and call out of line function for the rest.
> Why do you check flag_stack_reuse?  I thought you'd arrange for it to be
> different when -fsanitize=use-after-scope?

Right, the sanitization does not relate to flag_stack, thus removing the 
dependency,
we can remove need of including params.h in various places.

> 
>> @@ -243,6 +243,11 @@ static unsigned HOST_WIDE_INT asan_shadow_offset_value;
>>  static bool asan_shadow_offset_computed;
>>  static vec sanitized_sections;
>>  
>> +/* Set of variable declarations that are going to be guarded by
>> +   use-after-scope sanitizer.  */
>> +
>> +static hash_set  asan_handled_variables(13);
> 
> Not sure about the formatting here, don't we use xxx instead of xxx 
> ?  And I'd expect space before (.

Yeah, done.

>> @@ -1020,6 +1020,91 @@ asan_function_start (void)
>>   current_function_funcdef_no);
>>  }
>>  
>> +/* Return number of shadow bytes that are occupied by a local variable
>> +   of SIZE bytes.  */
>> +
>> +static unsigned HOST_WIDE_INT
>> +get_shadow_memory_size (unsigned HOST_WIDE_INT size)
>> +{
>> +  /* Round up size of object.  */
>> +  unsigned HOST_WIDE_INT r;
>> +  if ((r = size % BITS_PER_UNIT) != 0)
>> +size += BITS_PER_UNIT 

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-12 Thread Jakub Jelinek
On Wed, May 11, 2016 at 02:54:01PM +0200, Martin Liška wrote:
> On 05/06/2016 02:22 PM, Jakub Jelinek wrote:
> > On Fri, May 06, 2016 at 01:04:30PM +0200, Martin Liška wrote:
> >> I've started working on the patch couple of month go, basically after
> >> a brief discussion with Jakub on IRC.
> >>
> >> I'm sending the initial version which can successfully run instrumented
> >> tramp3d, postgresql server and Inkscape. It catches the basic set of
> >> examples which are added in following patch.
> >>
> >> The implementation is quite straightforward as works in following steps:
> >>
> >> 1) Every local variable stack slot is poisoned at the very beginning of a 
> >> function (RTL emission)
> >> 2) In gimplifier, once we spot a DECL_EXPR, a variable is unpoisoned (by 
> >> emitting ASAN_MARK builtin)
> >> and the variable is marked as addressable
> > 
> > Not all vars have DECL_EXPRs though.

Just random comments from quick skim, need to find enough spare time to
actually try it and see how it works.

> Yeah, I've spotted one interesting example which is part of LLVM's testsuite:
> 
> struct IntHolder {
>   int val;
> };
> 
> const IntHolder *saved;
> 
> void save(const IntHolder ) {
>   saved = 
> }
> 
> int main(int argc, char *argv[]) {
>   save({10});
>   int x = saved->val;  // BOOM
>   return x;
> }
> 
> It would be also good to handle such temporaries. Any suggestions how to 
> handle that in gimplifier?

Dunno, guess you need to do something in the FE for it already (talk to
Jason?).  At least in *.original dump there is already:
  <) >;
int x = (int) saved->val;
  return  = x;
and the info on where the D.2263 temporary goes out of scope is lost.

> Apart from that, second version of the patch changes:
> + fixed issues with missing stack unpoisoning; currently, I mark all 
> VAR_DECLs that
> are in ASAN_MARK internal fns and stack prologue/epilogue is emitted just for 
> these vars
> + removed unneeded hunks (tree-vect-patterns.c and asan_poisoning.cc)
> + LABEL unpoisoning code makes stable sort for variables that were already 
> used in the context
> + stack poisoning hasn't worked for -O1+ due to following guard in asan.c
>  /* Automatic vars in the current function will be always accessible.  */
> + direct shadow memory poisoning/unpoisoning code is introduced - in both 
> scenarios (RTL and GIMPLE),
> I would appreciate feedback if storing multiple bytes is fine? What is the 
> maximum memory wide
> store mode supported by a target? How can I get such information?
> + the maximum object size handled by a direct emission is guarded by 
> use-after-scope-direct-emission-threshold
> parameter; initial value (256B) should maximally emit store of 32B

Would be better if user visible param was in bytes rather than bits IMHO.

> Yeah, depends because of:
> 
> static inline bool
> asan_sanitize_use_after_scope (void)
> {
>   return ((flag_sanitize & SANITIZE_ADDRESS_USE_AFTER_SCOPE)
> == SANITIZE_ADDRESS_USE_AFTER_SCOPE
> && flag_stack_reuse == SR_NONE
> && ASAN_STACK);
> }
> 
> Where ASAN_STACK comes from params.h.

I'd prefer just prototype the function in the header and define in asan.c
or some other source file.  Or maybe split it, do the important case
(flag_sanitize check) inline and call out of line function for the rest.
Why do you check flag_stack_reuse?  I thought you'd arrange for it to be
different when -fsanitize=use-after-scope?

> @@ -243,6 +243,11 @@ static unsigned HOST_WIDE_INT asan_shadow_offset_value;
>  static bool asan_shadow_offset_computed;
>  static vec sanitized_sections;
>  
> +/* Set of variable declarations that are going to be guarded by
> +   use-after-scope sanitizer.  */
> +
> +static hash_set  asan_handled_variables(13);

Not sure about the formatting here, don't we use xxx instead of xxx 
?  And I'd expect space before (.
> @@ -1020,6 +1020,91 @@ asan_function_start (void)
>current_function_funcdef_no);
>  }
>  
> +/* Return number of shadow bytes that are occupied by a local variable
> +   of SIZE bytes.  */
> +
> +static unsigned HOST_WIDE_INT
> +get_shadow_memory_size (unsigned HOST_WIDE_INT size)
> +{
> +  /* Round up size of object.  */
> +  unsigned HOST_WIDE_INT r;
> +  if ((r = size % BITS_PER_UNIT) != 0)
> +size += BITS_PER_UNIT - r;

Isn't there a ROUND_UP macro?

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-11 Thread Martin Liška
On 05/06/2016 02:22 PM, Jakub Jelinek wrote:
> On Fri, May 06, 2016 at 01:04:30PM +0200, Martin Liška wrote:
>> I've started working on the patch couple of month go, basically after
>> a brief discussion with Jakub on IRC.
>>
>> I'm sending the initial version which can successfully run instrumented
>> tramp3d, postgresql server and Inkscape. It catches the basic set of
>> examples which are added in following patch.
>>
>> The implementation is quite straightforward as works in following steps:
>>
>> 1) Every local variable stack slot is poisoned at the very beginning of a 
>> function (RTL emission)
>> 2) In gimplifier, once we spot a DECL_EXPR, a variable is unpoisoned (by 
>> emitting ASAN_MARK builtin)
>> and the variable is marked as addressable
> 
> Not all vars have DECL_EXPRs though.

Yeah, I've spotted one interesting example which is part of LLVM's testsuite:

struct IntHolder {
  int val;
};

const IntHolder *saved;

void save(const IntHolder ) {
  saved = 
}

int main(int argc, char *argv[]) {
  save({10});
  int x = saved->val;  // BOOM
  return x;
}

It would be also good to handle such temporaries. Any suggestions how to handle 
that in gimplifier?

> 
>> 3) Similarly, BIND_EXPR is the place where we poison the variable (scope 
>> exit)
>> 4) At the very end of the function, we clean up the poisoned memory
>> 5) The builtins are expanded to call to libsanitizer run-time library 
>> (__asan_poison_stack_memory, __asan_unpoison_stack_memory)
>> 6) As the use-after-scope stuff is already included in libsanitizer, no 
>> change is needed for the library
> 
>> As mentioned, it's request for comment as it still has couple of limitations:
>> a) VLA are not supported, which should make sense as we are unable to 
>> allocate a stack slot for that
>> b) we can possibly strip some instrumentation in situations where a variable 
>> is introduced in a very first BB (RTL poisoning is superfluous).
>> Similarly for a very last BB of a function, we can strip end of scope 
>> poisoning (and RTL unpoisoning). I'll do that incrementally.
> 
> Yeah.
> 
>> c) We require -fstack-reuse=none option, maybe it worth to warn a user if 
>> -fsanitize=use-after-scope is provided without the option?
> 
> This should be implicitly set by -fsanitize=use-after-scope.  Only if some
> other -fstack-reuse= option is explicitly set together with
> -fsanitize=use-after-scope, we should warn and reset it anyway.

Handled in v2 of the patch.

> 
>> d) An instrumented binary is quite slow (~20x for tramp3d) as every function 
>> call produces many memory read/writes. I'm wondering whether
>> we should provide a faster alternative (like instrument just variables that 
>> have address taken) ?
> 
> I don't see any point in instrumenting !needs_to_live_in_memory vars,
> at least not those that don't need to live in memory at gimplification time.
> How could one use those after scope?  They can't be accessed by
> dereferencing some pointer, and the symbol itself should be unavailable for
> symbol lookup after the symbol goes out of scope.
> Plus obviously ~20x slowdown isn't acceptable.
> 
> Another thing is what to do with variables that are addressable at
> gimplification time, but generally are made non-addressable afterwards,
> such as due to optimizing away the taking of their address, inlining, etc.
> 
> Perhaps depending on how big slowdown you get after just instrumenting
> needs_to_live_in_memory vars from ~ gimplification time and/or with the
> possible inlining of the poisoning/unpoisoning (again, should be another
> knob), at least with small sized vars, there might be another knob,
> which would tell if vars that are made non-addressable during optimizations
> later on should be instrumented or not.
> E.g. if you ASAN_MARK all needs_to_live_in_memory vars early, you could
> during the addressable determination when the knob says stuff should be
> faster, but less precise, ignore the vars that are addressable just because
> of the ASAN_MARK calls, and if they'd then turn to be non-addressable,
> remove the corresponding ASAN_MARK calls.

Following the aforementioned instrumentation and utilizing direct shadow memory
instruction emission, I was able to reduce tramp3d slowdown to 3x and
postgresql server test-suite runs 2x slower.

Apart from that, second version of the patch changes:
+ fixed issues with missing stack unpoisoning; currently, I mark all VAR_DECLs 
that
are in ASAN_MARK internal fns and stack prologue/epilogue is emitted just for 
these vars
+ removed unneeded hunks (tree-vect-patterns.c and asan_poisoning.cc)
+ LABEL unpoisoning code makes stable sort for variables that were already used 
in the context
+ stack poisoning hasn't worked for -O1+ due to following guard in asan.c
 /* Automatic vars in the current function will be always accessible.  */
+ direct shadow memory poisoning/unpoisoning code is introduced - in both 
scenarios (RTL and GIMPLE),
I would appreciate feedback if storing multiple bytes is 

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-10 Thread Jakub Jelinek
On Tue, May 10, 2016 at 05:03:30PM +0200, Martin Liška wrote:
> On 05/06/2016 04:39 PM, Jakub Jelinek wrote:
> > Depends on how exactly it is defined.  It could be enabling just its own
> > sanitizer bit and nothing else, then users would need to use
> > -fsanitize=address,use-after-scope
> > or
> > -fsanitize=kernel-address,use-after-scope
> 
> I'm inclined to the second option, where the new option would be automatically
> added if a ADDRESS sanitizer is enabled (SANITIZE_{USER,KERNEL}_ADDRESS):
> 
> Is it acceptable behavior?

To me, yes.  But, the question is if it is acceptable to clang too.
Limiting it to a param means it will be command line option incompatible
between clang and gcc.

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-10 Thread Martin Liška
On 05/06/2016 04:39 PM, Jakub Jelinek wrote:
> Depends on how exactly it is defined.  It could be enabling just its own
> sanitizer bit and nothing else, then users would need to use
> -fsanitize=address,use-after-scope
> or
> -fsanitize=kernel-address,use-after-scope

I'm inclined to the second option, where the new option would be automatically
added if a ADDRESS sanitizer is enabled (SANITIZE_{USER,KERNEL}_ADDRESS):

Is it acceptable behavior?

> (order doesn't matter), or it could enable the SANITIZE_ADDRESS
> bit together with its own, and then we'd just post-option processing
> (where we e.g. reject address,kernel-address) default to
> SANITIZE_USER_ADDRESS if SANITIZE_ADDRESS is on together with
> SANITIZE_USE_AFTER_SCOPE, but neither SANITIZE_{USER,KERNEL}_ADDRESS
> is defined.
> -fsanitize=address -fno-sanitize=use-after-scope
> obviously shouldn't in any case disable SANITIZE_ADDRESS, similarly
> -fsanitize=kernel-address -fno-sanitize=use-after-scope
> 
>   Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-06 Thread Jakub Jelinek
On Fri, May 06, 2016 at 04:41:41PM +0200, Martin Liška wrote:
> On 05/06/2016 03:25 PM, Jakub Jelinek wrote:
> > Well, we already have the gimple poisoning/unpoisoning code on RTL (emitted
> > after the prologue and before the epilogue), so it shouldn't be that hard.
> > I'd only do the most common/easy cases inline though, like 1/2/4/8/16/32
> > bytes long variables.
> > 
> > Jakub
> 
> You are right, I didn't realize it earlier.
> As I've collected statistics for tramp3d, poisoning code has following 
> distribution:
> 
> 4:1.62%
> 8:3.53%
> 12:94.76%
> 
> which is quite interesting that 12B are such a common size :)
> Probably due to a lot of time spent in ::evaluate (MultiArgEvaluator and 
> MultiArgEvaluator).
> Considering just variables which needs_to_live_in_memory, tramp3d is still 
> ~15x slower.

Please look at other testcases, not just tramp3d - we in the end don't want
to tune it to just tramp3d.  Pick up some 3-4 C/C++ benchmarks, tramp3d can
be one of them ;)

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-06 Thread Martin Liška
On 05/06/2016 03:25 PM, Jakub Jelinek wrote:
> Well, we already have the gimple poisoning/unpoisoning code on RTL (emitted
> after the prologue and before the epilogue), so it shouldn't be that hard.
> I'd only do the most common/easy cases inline though, like 1/2/4/8/16/32
> bytes long variables.
> 
>   Jakub

You are right, I didn't realize it earlier.
As I've collected statistics for tramp3d, poisoning code has following 
distribution:

4:1.62%
8:3.53%
12:94.76%

which is quite interesting that 12B are such a common size :)
Probably due to a lot of time spent in ::evaluate (MultiArgEvaluator and 
MultiArgEvaluator).
Considering just variables which needs_to_live_in_memory, tramp3d is still ~15x 
slower.

Anyway profile report tells:
26.51%  a.outlibasan.so.3.0.0  [.] __asan::PoisonShadow
18.49%  a.outlibasan.so.3.0.0  [.] PoisonAlignedStackMemory
 5.61%  a.outlibc-2.22.so  [.] __memset_avx2
 5.41%  a.outa.out [.] 
MultiArgEvaluator::evaluate
 3.56%  a.outlibasan.so.3.0.0  [.] __asan_unpoison_stack_memory
 2.69%  a.outlibasan.so.3.0.0  [.] __asan_poison_stack_memory

I'll continue working on that after weekend.

Martin


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-06 Thread Jakub Jelinek
On Fri, May 06, 2016 at 05:22:46PM +0300, Yury Gribov wrote:
> On 05/06/2016 03:38 PM, Jakub Jelinek wrote:
> >On Fri, May 06, 2016 at 02:48:30PM +0300, Yury Gribov wrote:
> >>>6) As the use-after-scope stuff is already included in libsanitizer, no 
> >>>change is needed for the library
> >>
> >>Note that upstream seems to use a different cmdline interface. They don't
> >>have a dedicated -fsanitize=use-after-scope and instead consider it to be a
> >>part of -fsanitize=address (disabled by default, enabled via -mllvm
> >>-asan-use-after-scope=1). I'd suggest to keep this interface (or at least
> >>discuss with them) and use GCC's --param.
> >
> >I personally think -fsanitize=use-after-scope (which implies address
> >sanitization in it) is better, can upstream be convinved not to change it?
> 
> Will that work with -fsanitize=kernel-address?

Depends on how exactly it is defined.  It could be enabling just its own
sanitizer bit and nothing else, then users would need to use
-fsanitize=address,use-after-scope
or
-fsanitize=kernel-address,use-after-scope
(order doesn't matter), or it could enable the SANITIZE_ADDRESS
bit together with its own, and then we'd just post-option processing
(where we e.g. reject address,kernel-address) default to
SANITIZE_USER_ADDRESS if SANITIZE_ADDRESS is on together with
SANITIZE_USE_AFTER_SCOPE, but neither SANITIZE_{USER,KERNEL}_ADDRESS
is defined.
-fsanitize=address -fno-sanitize=use-after-scope
obviously shouldn't in any case disable SANITIZE_ADDRESS, similarly
-fsanitize=kernel-address -fno-sanitize=use-after-scope

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-06 Thread Yury Gribov

On 05/06/2016 03:38 PM, Jakub Jelinek wrote:

On Fri, May 06, 2016 at 02:48:30PM +0300, Yury Gribov wrote:

6) As the use-after-scope stuff is already included in libsanitizer, no change 
is needed for the library


Note that upstream seems to use a different cmdline interface. They don't
have a dedicated -fsanitize=use-after-scope and instead consider it to be a
part of -fsanitize=address (disabled by default, enabled via -mllvm
-asan-use-after-scope=1). I'd suggest to keep this interface (or at least
discuss with them) and use GCC's --param.


I personally think -fsanitize=use-after-scope (which implies address
sanitization in it) is better, can upstream be convinved not to change it?


Will that work with -fsanitize=kernel-address?




FTR here's the upstream work on this: http://reviews.llvm.org/D19347


Example:

int
main (void)
{
   char *ptr;
   {
 char my_char[9];
 ptr = _char[0];
   }

   *(ptr+9) = 'c';
}


Well, this testcase shows not just use after scope, but also out of bound
access.  Would be better not to combine it, at least in the majority of
testcases.

Jakub






Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-06 Thread Jakub Jelinek
On Fri, May 06, 2016 at 03:17:23PM +0200, Martin Liška wrote:
> On 05/06/2016 01:48 PM, Yury Gribov wrote:
> > On 05/06/2016 02:04 PM, Martin Liška wrote:
> >> I've started working on the patch couple of month go, basically after
> >> a brief discussion with Jakub on IRC.
> >>
> >> I'm sending the initial version which can successfully run instrumented
> >> tramp3d, postgresql server and Inkscape. It catches the basic set of
> >> examples which are added in following patch.
> >>
> >> The implementation is quite straightforward as works in following steps:
> >>
> >> 1) Every local variable stack slot is poisoned at the very beginning of a 
> >> function (RTL emission)
> >> 2) In gimplifier, once we spot a DECL_EXPR, a variable is unpoisoned (by 
> >> emitting ASAN_MARK builtin)
> >> and the variable is marked as addressable
> >> 3) Similarly, BIND_EXPR is the place where we poison the variable (scope 
> >> exit)
> >> 4) At the very end of the function, we clean up the poisoned memory
> >> 5) The builtins are expanded to call to libsanitizer run-time library 
> >> (__asan_poison_stack_memory, __asan_unpoison_stack_memory)
> > 
> > Can we inline these?
> 
> Currently not as libasan is a shared library that an instrumented executable 
> is linked with.
> Possible solution would be to directly emit gimple instruction that would 
> poison/unpoison the memory.
> But it's not a trivial job which is done in the poisoning code (ALWAYS_INLINE 
> void FastPoisonShadow(uptr aligned_beg, uptr aligned_size, u8 value)

Well, we already have the gimple poisoning/unpoisoning code on RTL (emitted
after the prologue and before the epilogue), so it shouldn't be that hard.
I'd only do the most common/easy cases inline though, like 1/2/4/8/16/32
bytes long variables.

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-06 Thread Martin Liška
On 05/06/2016 01:48 PM, Yury Gribov wrote:
> On 05/06/2016 02:04 PM, Martin Liška wrote:
>> Hello.
>>
>> I've started working on the patch couple of month go, basically after
>> a brief discussion with Jakub on IRC.
>>
>> I'm sending the initial version which can successfully run instrumented
>> tramp3d, postgresql server and Inkscape. It catches the basic set of
>> examples which are added in following patch.
>>
>> The implementation is quite straightforward as works in following steps:
>>
>> 1) Every local variable stack slot is poisoned at the very beginning of a 
>> function (RTL emission)
>> 2) In gimplifier, once we spot a DECL_EXPR, a variable is unpoisoned (by 
>> emitting ASAN_MARK builtin)
>> and the variable is marked as addressable
>> 3) Similarly, BIND_EXPR is the place where we poison the variable (scope 
>> exit)
>> 4) At the very end of the function, we clean up the poisoned memory
>> 5) The builtins are expanded to call to libsanitizer run-time library 
>> (__asan_poison_stack_memory, __asan_unpoison_stack_memory)
> 
> Can we inline these?

Currently not as libasan is a shared library that an instrumented executable is 
linked with.
Possible solution would be to directly emit gimple instruction that would 
poison/unpoison the memory.
But it's not a trivial job which is done in the poisoning code (ALWAYS_INLINE 
void FastPoisonShadow(uptr aligned_beg, uptr aligned_size, u8 value)

> 
>> 6) As the use-after-scope stuff is already included in libsanitizer, no 
>> change is needed for the library
> 
> Note that upstream seems to use a different cmdline interface. They don't 
> have a dedicated -fsanitize=use-after-scope and instead consider it to be a 
> part of -fsanitize=address (disabled by default, enabled via -mllvm 
> -asan-use-after-scope=1). I'd suggest to keep this interface (or at least 
> discuss with them) and use GCC's --param.
> 
> FTR here's the upstream work on this: http://reviews.llvm.org/D19347

Thanks for the link, I will adapt part of the test to our test-suite.
Some of them are really interesting.

Martin

> 
>> Example:
>>
>> int
>> main (void)
>> {
>>char *ptr;
>>{
>>  char my_char[9];
>>  ptr = _char[0];
>>}
>>
>>*(ptr+9) = 'c';
>> }
>>
>> ./a.out
>> =
>> ==12811==ERROR: AddressSanitizer: stack-use-after-scope on address 
>> 0x7ffec9bcff69 at pc 0x00400a73 bp 0x7ffec9bcfef0 sp 0x7ffec9bcfee8
>> WRITE of size 1 at 0x7ffec9bcff69 thread T0
>>  #0 0x400a72 in main (/tmp/a.out+0x400a72)
>>  #1 0x7f100824860f in __libc_start_main (/lib64/libc.so.6+0x2060f)
>>  #2 0x400868 in _start (/tmp/a.out+0x400868)
>>
>> Address 0x7ffec9bcff69 is located in stack of thread T0 at offset 105 in 
>> frame
>>  #0 0x400945 in main (/tmp/a.out+0x400945)
>>
>>This frame has 2 object(s):
>>  [32, 40) 'ptr'
>>  [96, 105) 'my_char' <== Memory access at offset 105 overflows this 
>> variable
>> HINT: this may be a false positive if your program uses some custom stack 
>> unwind mechanism or swapcontext
>>(longjmp and C++ exceptions *are* supported)
>> SUMMARY: AddressSanitizer: stack-use-after-scope (/tmp/a.out+0x400a72) in 
>> main
>> Shadow bytes around the buggy address:
>>0x100059371f90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>0x100059371fa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>0x100059371fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>0x100059371fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>0x100059371fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> =>0x100059371fe0: f1 f1 f1 f1 00 f4 f4 f4 f2 f2 f2 f2 f8[f8]f4 f4
>>0x100059371ff0: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
>>0x100059372000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>0x100059372010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>0x100059372020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>0x100059372030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> Shadow byte legend (one shadow byte represents 8 application bytes):
>>Addressable:   00
>>Partially addressable: 01 02 03 04 05 06 07
>>Heap left redzone:   fa
>>Heap right redzone:  fb
>>Freed heap region:   fd
>>Stack left redzone:  f1
>>Stack mid redzone:   f2
>>Stack right redzone: f3
>>Stack partial redzone:   f4
>>Stack after return:  f5
>>Stack use after scope:   f8
>>Global redzone:  f9
>>Global init order:   f6
>>Poisoned by user:f7
>>Container overflow:  fc
>>Array cookie:ac
>>Intra object redzone:bb
>>ASan internal:   fe
>>Left alloca redzone: ca
>>Right alloca redzone:cb
>> ==12811==ABORTING
>>
>> As mentioned, it's request for comment as it still has couple of limitations:
>> a) VLA are not supported, which should make sense as we are unable to 
>> allocate a 

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-06 Thread Martin Liška
On 05/06/2016 02:38 PM, Jakub Jelinek wrote:
> On Fri, May 06, 2016 at 02:48:30PM +0300, Yury Gribov wrote:
>>> 6) As the use-after-scope stuff is already included in libsanitizer, no 
>>> change is needed for the library
>>
>> Note that upstream seems to use a different cmdline interface. They don't
>> have a dedicated -fsanitize=use-after-scope and instead consider it to be a
>> part of -fsanitize=address (disabled by default, enabled via -mllvm
>> -asan-use-after-scope=1). I'd suggest to keep this interface (or at least
>> discuss with them) and use GCC's --param.
> 
> I personally think -fsanitize=use-after-scope (which implies address
> sanitization in it) is better, can upstream be convinved not to change it?

I also incline to the original -fsanitize=use-after-scope, which is compatible
to remaining -fsanitize=... options we have in the GCC.

> 
>> FTR here's the upstream work on this: http://reviews.llvm.org/D19347
>>
>>> Example:
>>>
>>> int
>>> main (void)
>>> {
>>>   char *ptr;
>>>   {
>>> char my_char[9];
>>> ptr = _char[0];
>>>   }
>>>
>>>   *(ptr+9) = 'c';
>>> }
> 
> Well, this testcase shows not just use after scope, but also out of bound
> access.  Would be better not to combine it, at least in the majority of
> testcases.

Sure, that's a typo, should be:
  *(ptr+8) = 'c';

with:
[96, 105) 'my_char' <== Memory access at offset 104 is inside this variable

Intention was to touch the second shadow byte for the array.

Martin

> 
>   Jakub
> 



Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-06 Thread Jakub Jelinek
On Fri, May 06, 2016 at 02:48:30PM +0300, Yury Gribov wrote:
> >6) As the use-after-scope stuff is already included in libsanitizer, no 
> >change is needed for the library
> 
> Note that upstream seems to use a different cmdline interface. They don't
> have a dedicated -fsanitize=use-after-scope and instead consider it to be a
> part of -fsanitize=address (disabled by default, enabled via -mllvm
> -asan-use-after-scope=1). I'd suggest to keep this interface (or at least
> discuss with them) and use GCC's --param.

I personally think -fsanitize=use-after-scope (which implies address
sanitization in it) is better, can upstream be convinved not to change it?

> FTR here's the upstream work on this: http://reviews.llvm.org/D19347
> 
> >Example:
> >
> >int
> >main (void)
> >{
> >   char *ptr;
> >   {
> > char my_char[9];
> > ptr = _char[0];
> >   }
> >
> >   *(ptr+9) = 'c';
> >}

Well, this testcase shows not just use after scope, but also out of bound
access.  Would be better not to combine it, at least in the majority of
testcases.

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-06 Thread Jakub Jelinek
On Fri, May 06, 2016 at 01:04:30PM +0200, Martin Liška wrote:
> I've started working on the patch couple of month go, basically after
> a brief discussion with Jakub on IRC.
> 
> I'm sending the initial version which can successfully run instrumented
> tramp3d, postgresql server and Inkscape. It catches the basic set of
> examples which are added in following patch.
> 
> The implementation is quite straightforward as works in following steps:
> 
> 1) Every local variable stack slot is poisoned at the very beginning of a 
> function (RTL emission)
> 2) In gimplifier, once we spot a DECL_EXPR, a variable is unpoisoned (by 
> emitting ASAN_MARK builtin)
> and the variable is marked as addressable

Not all vars have DECL_EXPRs though.

> 3) Similarly, BIND_EXPR is the place where we poison the variable (scope exit)
> 4) At the very end of the function, we clean up the poisoned memory
> 5) The builtins are expanded to call to libsanitizer run-time library 
> (__asan_poison_stack_memory, __asan_unpoison_stack_memory)
> 6) As the use-after-scope stuff is already included in libsanitizer, no 
> change is needed for the library

> As mentioned, it's request for comment as it still has couple of limitations:
> a) VLA are not supported, which should make sense as we are unable to 
> allocate a stack slot for that
> b) we can possibly strip some instrumentation in situations where a variable 
> is introduced in a very first BB (RTL poisoning is superfluous).
> Similarly for a very last BB of a function, we can strip end of scope 
> poisoning (and RTL unpoisoning). I'll do that incrementally.

Yeah.

> c) We require -fstack-reuse=none option, maybe it worth to warn a user if 
> -fsanitize=use-after-scope is provided without the option?

This should be implicitly set by -fsanitize=use-after-scope.  Only if some
other -fstack-reuse= option is explicitly set together with
-fsanitize=use-after-scope, we should warn and reset it anyway.

> d) An instrumented binary is quite slow (~20x for tramp3d) as every function 
> call produces many memory read/writes. I'm wondering whether
> we should provide a faster alternative (like instrument just variables that 
> have address taken) ?

I don't see any point in instrumenting !needs_to_live_in_memory vars,
at least not those that don't need to live in memory at gimplification time.
How could one use those after scope?  They can't be accessed by
dereferencing some pointer, and the symbol itself should be unavailable for
symbol lookup after the symbol goes out of scope.
Plus obviously ~20x slowdown isn't acceptable.

Another thing is what to do with variables that are addressable at
gimplification time, but generally are made non-addressable afterwards,
such as due to optimizing away the taking of their address, inlining, etc.

Perhaps depending on how big slowdown you get after just instrumenting
needs_to_live_in_memory vars from ~ gimplification time and/or with the
possible inlining of the poisoning/unpoisoning (again, should be another
knob), at least with small sized vars, there might be another knob,
which would tell if vars that are made non-addressable during optimizations
later on should be instrumented or not.
E.g. if you ASAN_MARK all needs_to_live_in_memory vars early, you could
during the addressable determination when the knob says stuff should be
faster, but less precise, ignore the vars that are addressable just because
of the ASAN_MARK calls, and if they'd then turn to be non-addressable,
remove the corresponding ASAN_MARK calls.

> 2016-05-04  Martin Liska  
> 
>   * asan/asan_poisoning.cc: Do not call PoisonShadow in case
>   of zero size of aligned size.

Generally, libsanitizer changes would need to go through upstream.

> --- a/gcc/asan.c
> +++ b/gcc/asan.c
> @@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "varasm.h"
>  #include "stor-layout.h"
>  #include "tree-iterator.h"
> +#include "params.h"
>  #include "asan.h"
>  #include "dojump.h"
>  #include "explow.h"
> @@ -54,7 +55,6 @@ along with GCC; see the file COPYING3.  If not see
>  #include "cfgloop.h"
>  #include "gimple-builder.h"
>  #include "ubsan.h"
> -#include "params.h"
>  #include "builtins.h"
>  #include "fnmatch.h"

Why do you need to move params.h around?  Does asan.h now depend on
params.h?

> +  gimplify_seq_add_stmt
> +(seq_p, gimple_build_call_internal (IFN_ASAN_MARK, 3,
> + build_int_cst (integer_type_node,
> +flags),
> + base, unit_size));

Formatting, better introduce some temporary variables, like
  gimple *g = gimple_build_call_internal (...);
  gimplify_seq_add_stmt (seq_p, g);

> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -3570,7 +3570,8 @@ vect_recog_mask_conversion_pattern (vec 
> *stmts, tree *type_in,
>  {
>gimple *last_stmt = stmts->pop ();
>enum tree_code rhs_code;

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-06 Thread Yury Gribov

On 05/06/2016 02:04 PM, Martin Liška wrote:

Hello.

I've started working on the patch couple of month go, basically after
a brief discussion with Jakub on IRC.

I'm sending the initial version which can successfully run instrumented
tramp3d, postgresql server and Inkscape. It catches the basic set of
examples which are added in following patch.

The implementation is quite straightforward as works in following steps:

1) Every local variable stack slot is poisoned at the very beginning of a 
function (RTL emission)
2) In gimplifier, once we spot a DECL_EXPR, a variable is unpoisoned (by 
emitting ASAN_MARK builtin)
and the variable is marked as addressable
3) Similarly, BIND_EXPR is the place where we poison the variable (scope exit)
4) At the very end of the function, we clean up the poisoned memory
5) The builtins are expanded to call to libsanitizer run-time library 
(__asan_poison_stack_memory, __asan_unpoison_stack_memory)


Can we inline these?


6) As the use-after-scope stuff is already included in libsanitizer, no change 
is needed for the library


Note that upstream seems to use a different cmdline interface. They 
don't have a dedicated -fsanitize=use-after-scope and instead consider 
it to be a part of -fsanitize=address (disabled by default, enabled via 
-mllvm -asan-use-after-scope=1). I'd suggest to keep this interface (or 
at least discuss with them) and use GCC's --param.


FTR here's the upstream work on this: http://reviews.llvm.org/D19347


Example:

int
main (void)
{
   char *ptr;
   {
 char my_char[9];
 ptr = _char[0];
   }

   *(ptr+9) = 'c';
}

./a.out
=
==12811==ERROR: AddressSanitizer: stack-use-after-scope on address 
0x7ffec9bcff69 at pc 0x00400a73 bp 0x7ffec9bcfef0 sp 0x7ffec9bcfee8
WRITE of size 1 at 0x7ffec9bcff69 thread T0
 #0 0x400a72 in main (/tmp/a.out+0x400a72)
 #1 0x7f100824860f in __libc_start_main (/lib64/libc.so.6+0x2060f)
 #2 0x400868 in _start (/tmp/a.out+0x400868)

Address 0x7ffec9bcff69 is located in stack of thread T0 at offset 105 in frame
 #0 0x400945 in main (/tmp/a.out+0x400945)

   This frame has 2 object(s):
 [32, 40) 'ptr'
 [96, 105) 'my_char' <== Memory access at offset 105 overflows this variable
HINT: this may be a false positive if your program uses some custom stack 
unwind mechanism or swapcontext
   (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-scope (/tmp/a.out+0x400a72) in main
Shadow bytes around the buggy address:
   0x100059371f90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   0x100059371fa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   0x100059371fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   0x100059371fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   0x100059371fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x100059371fe0: f1 f1 f1 f1 00 f4 f4 f4 f2 f2 f2 f2 f8[f8]f4 f4
   0x100059371ff0: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
   0x100059372000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   0x100059372010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   0x100059372020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   0x100059372030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
   Addressable:   00
   Partially addressable: 01 02 03 04 05 06 07
   Heap left redzone:   fa
   Heap right redzone:  fb
   Freed heap region:   fd
   Stack left redzone:  f1
   Stack mid redzone:   f2
   Stack right redzone: f3
   Stack partial redzone:   f4
   Stack after return:  f5
   Stack use after scope:   f8
   Global redzone:  f9
   Global init order:   f6
   Poisoned by user:f7
   Container overflow:  fc
   Array cookie:ac
   Intra object redzone:bb
   ASan internal:   fe
   Left alloca redzone: ca
   Right alloca redzone:cb
==12811==ABORTING

As mentioned, it's request for comment as it still has couple of limitations:
a) VLA are not supported, which should make sense as we are unable to allocate 
a stack slot for that


Note that we plan some work on VLA sanitization later this year 
(upstream ASan now sanitizes dynamic allocas and VLAs).



b) we can possibly strip some instrumentation in situations where a variable is 
introduced in a very first BB (RTL poisoning is superfluous).
Similarly for a very last BB of a function, we can strip end of scope poisoning 
(and RTL unpoisoning). I'll do that incrementally.
c) We require -fstack-reuse=none option, maybe it worth to warn a user if 
-fsanitize=use-after-scope is provided without the option?


As a user, I'd prefer it to be automatically disabled when 
use-after-scope is on (unless it has been set explicitly in cmdline in 
which case we should probably issue error).



d) An instrumented binary is quite slow (~20x for tramp3d) as every function 

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-06 Thread Martin Liška
Hello.

One more issue I forgot to mention in the previous email:
e) As one can come up with a source code which jumps to a label within
a block scope (use-after-scope-goto-1.c):

// { dg-do run }
// { dg-additional-options "-fsanitize=use-after-scope -fstack-reuse=none" }

int main(int argc, char **argv)
{
  int a = 123;

  if (argc == 0)
  {
int *ptr;
label:
  {
ptr = 
*ptr = 1;
return 0;
  }
  }
  else
goto label;

  return 0;
}

It's necessary to record all local variables in gimplifier and possibly
emit unpoisoning code when a LABEL_EXPR is seen. That results in following 
gimple
output:

label:
  _20 = (unsigned long) 
  _21 = (unsigned long) 4;
  __builtin___asan_unpoison_stack_memory (_20, _21);
  _22 = (unsigned long) 
  _23 = (unsigned long) 8;
  __builtin___asan_unpoison_stack_memory (_22, _23);
  ptr = 
  ptr.0_10 = ptr;
  _24 = (unsigned long) ptr.0_10;
  _25 = _24 >> 3;
  _26 = _25 + 2147450880;
  _27 = (signed char *) _26;
  _28 = *_27;
  _29 = _28 != 0;
  _30 = _24 & 7;
  _31 = (signed char) _30;
  _32 = _31 + 3;
  _33 = _32 >= _28;
  _34 = _29 & _33;
  if (_34 != 0)
goto ;
  else
goto ;

I know that the solution is a big hammer, but it works.

Martin




[PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-06 Thread Martin Liška
Hello.

I've started working on the patch couple of month go, basically after
a brief discussion with Jakub on IRC.

I'm sending the initial version which can successfully run instrumented
tramp3d, postgresql server and Inkscape. It catches the basic set of
examples which are added in following patch.

The implementation is quite straightforward as works in following steps:

1) Every local variable stack slot is poisoned at the very beginning of a 
function (RTL emission)
2) In gimplifier, once we spot a DECL_EXPR, a variable is unpoisoned (by 
emitting ASAN_MARK builtin)
and the variable is marked as addressable
3) Similarly, BIND_EXPR is the place where we poison the variable (scope exit)
4) At the very end of the function, we clean up the poisoned memory
5) The builtins are expanded to call to libsanitizer run-time library 
(__asan_poison_stack_memory, __asan_unpoison_stack_memory)
6) As the use-after-scope stuff is already included in libsanitizer, no change 
is needed for the library

Example:

int
main (void)
{
  char *ptr;
  {
char my_char[9];
ptr = _char[0];
  }

  *(ptr+9) = 'c';
}

./a.out 
=
==12811==ERROR: AddressSanitizer: stack-use-after-scope on address 
0x7ffec9bcff69 at pc 0x00400a73 bp 0x7ffec9bcfef0 sp 0x7ffec9bcfee8
WRITE of size 1 at 0x7ffec9bcff69 thread T0
#0 0x400a72 in main (/tmp/a.out+0x400a72)
#1 0x7f100824860f in __libc_start_main (/lib64/libc.so.6+0x2060f)
#2 0x400868 in _start (/tmp/a.out+0x400868)

Address 0x7ffec9bcff69 is located in stack of thread T0 at offset 105 in frame
#0 0x400945 in main (/tmp/a.out+0x400945)

  This frame has 2 object(s):
[32, 40) 'ptr'
[96, 105) 'my_char' <== Memory access at offset 105 overflows this variable
HINT: this may be a false positive if your program uses some custom stack 
unwind mechanism or swapcontext
  (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-scope (/tmp/a.out+0x400a72) in main
Shadow bytes around the buggy address:
  0x100059371f90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100059371fa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100059371fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100059371fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100059371fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x100059371fe0: f1 f1 f1 f1 00 f4 f4 f4 f2 f2 f2 f2 f8[f8]f4 f4
  0x100059371ff0: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
  0x100059372000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100059372010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100059372020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100059372030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:   00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:   fa
  Heap right redzone:  fb
  Freed heap region:   fd
  Stack left redzone:  f1
  Stack mid redzone:   f2
  Stack right redzone: f3
  Stack partial redzone:   f4
  Stack after return:  f5
  Stack use after scope:   f8
  Global redzone:  f9
  Global init order:   f6
  Poisoned by user:f7
  Container overflow:  fc
  Array cookie:ac
  Intra object redzone:bb
  ASan internal:   fe
  Left alloca redzone: ca
  Right alloca redzone:cb
==12811==ABORTING

As mentioned, it's request for comment as it still has couple of limitations:
a) VLA are not supported, which should make sense as we are unable to allocate 
a stack slot for that
b) we can possibly strip some instrumentation in situations where a variable is 
introduced in a very first BB (RTL poisoning is superfluous).
Similarly for a very last BB of a function, we can strip end of scope poisoning 
(and RTL unpoisoning). I'll do that incrementally.
c) We require -fstack-reuse=none option, maybe it worth to warn a user if 
-fsanitize=use-after-scope is provided without the option?
d) An instrumented binary is quite slow (~20x for tramp3d) as every function 
call produces many memory read/writes. I'm wondering whether
we should provide a faster alternative (like instrument just variables that 
have address taken) ?

Patch can survive bootstrap and regression tests on x86_64-linux-gnu.

Thanks for feedback.
Martin
>From 242bcaf2faded33291d05a5c4c5306f849de Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 3 May 2016 15:35:22 +0200
Subject: [PATCH 1/2] Introduce -fsanitize=use-after-scope

gcc/ChangeLog:

2016-05-03  Martin Liska  

	* asan.c (enum asan_check_flags): Cut the enum from here.
	(asan_poison_stack_variables): New function.
	(asan_emit_stack_protection): Poison stack variables.
	(asan_expand_mark_ifn): New function.
	* asan.h (enum asan_mark_flags): Paste here the enum from source
	file.
	(asan_sanitize_stack_p): Move function